From 1d683af0356098068a8231b988036d56918b542f Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Tue, 14 Jun 2016 11:17:14 -0600 Subject: [PATCH 001/521] coresight: Fix erroneous memset in tmc_read_unprepare_etr At the end of a trace collection, we try to clear the entire buffer and enable the ETR back if it was already enabled. But, we would have adjusted the drvdata->buf to point to the beginning of the trace data in the trace buffer @drvdata->vaddr. So, the following code which clears the buffer is dangerous and can cause crashes, like below : memset(drvdata->buf, 0, drvdata->size); Unable to handle kernel paging request at virtual address ffffff800a145000 pgd = ffffffc974726000 *pgd=00000009f3e91003, *pud=00000009f3e91003, *pmd=0000000000000000 PREEMPT SMP Modules linked in: CPU: 4 PID: 1692 Comm: dd Not tainted 4.7.0-rc2+ #1721 Hardware name: ARM Juno development board (r0) (DT) task: ffffffc9734a0080 ti: ffffffc974460000 task.ti: ffffffc974460000 PC is at __memset+0x1ac/0x200 LR is at tmc_read_unprepare_etr+0x144/0x1bc pc : [] lr : [] pstate: 200001c5 ... [] __memset+0x1ac/0x200 [] tmc_release+0x90/0x94 [] __fput+0xa8/0x1ec [] ____fput+0xc/0x14 [] task_work_run+0xb0/0xe4 [] do_notify_resume+0x64/0x6c [] work_pending+0x10/0x14 Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428) Since we clear the buffer anyway in the following call to tmc_etr_enable_hw(), remove the erroneous memset(). Fixes: commit de5461970b3e9e1 ("coresight: tmc: allocating memory when needed") Cc: Mathieu Poirier Signed-off-by: Suzuki K Poulose Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman (cherry picked from commit f3b8172fe15fbed0d0d33d99780e122213e00684) Signed-off-by: Alex Shi --- drivers/hwtracing/coresight/coresight-tmc-etr.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 847d1b5f2c13..44eaae6e7d29 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -300,13 +300,10 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata) if (local_read(&drvdata->mode) == CS_MODE_SYSFS) { /* * The trace run will continue with the same allocated trace - * buffer. As such zero-out the buffer so that we don't end - * up with stale data. - * - * Since the tracer is still enabled drvdata::buf - * can't be NULL. + * buffer. The trace buffer is cleared in tmc_etr_enable_hw(), + * so we don't have to explicitly clear it. Also, since the + * tracer is still enabled drvdata::buf can't be NULL. */ - memset(drvdata->buf, 0, drvdata->size); tmc_etr_enable_hw(drvdata); } else { /* From d3539486bbce11701e012428ce7690000dd34fa6 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 24 Aug 2016 10:07:14 +0100 Subject: [PATCH 002/521] perf/core: Use this_cpu_ptr() when stopping AUX events When tearing down an AUX buf for an event via perf_mmap_close(), __perf_event_output_stop() is called on the event's CPU to ensure that trace generation is halted before the process of unmapping and freeing the buffer pages begins. The callback is performed via cpu_function_call(), which ensures that it runs with interrupts disabled and is therefore not preemptible. Unfortunately, the current code grabs the per-cpu context pointer using get_cpu_ptr(), which unnecessarily disables preemption and doesn't pair the call with put_cpu_ptr(), leading to a preempt_count() imbalance and a BUG when freeing the AUX buffer later on: WARNING: CPU: 1 PID: 2249 at kernel/events/ring_buffer.c:539 __rb_free_aux+0x10c/0x120 Modules linked in: [...] Call Trace: [] dump_stack+0x4f/0x72 [] __warn+0xc6/0xe0 [] warn_slowpath_null+0x18/0x20 [] __rb_free_aux+0x10c/0x120 [] rb_free_aux+0x13/0x20 [] perf_mmap_close+0x29e/0x2f0 [] ? perf_iterate_ctx+0xe0/0xe0 [] remove_vma+0x25/0x60 [] exit_mmap+0x106/0x140 [] mmput+0x1c/0xd0 [] do_exit+0x253/0xbf0 [] do_group_exit+0x3e/0xb0 [] get_signal+0x249/0x640 [] do_signal+0x23/0x640 [] ? _raw_write_unlock_irq+0x12/0x30 [] ? _raw_spin_unlock_irq+0x9/0x10 [] ? __schedule+0x2c6/0x710 [] exit_to_usermode_loop+0x74/0x90 [] prepare_exit_to_usermode+0x26/0x30 [] retint_user+0x8/0x10 This patch uses this_cpu_ptr() instead of get_cpu_ptr(), since preemption is already disabled by the caller. Signed-off-by: Will Deacon Reviewed-by: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vince Weaver Fixes: 95ff4ca26c49 ("perf/core: Free AUX pages in unmap path") Link: http://lkml.kernel.org/r/20160824091905.GA16944@arm.com Signed-off-by: Ingo Molnar (cherry picked from commit 8b6a3fe8fab97716990a3abde1a01fb5a34552a3) Signed-off-by: Alex Shi --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index cda4b292135a..582c607417d0 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5797,7 +5797,7 @@ static int __perf_pmu_output_stop(void *info) { struct perf_event *event = info; struct pmu *pmu = event->pmu; - struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context); + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); struct remote_output ro = { .rb = event->rb, }; From de10b843a6031739406ab603b5bce2a678ffbf40 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Thu, 25 Aug 2016 15:18:54 -0600 Subject: [PATCH 003/521] coresight: Remove erroneous dma_free_coherent in tmc_probe commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed") removed the static allocation of buffer for the trace data in ETR mode in tmc_probe. However it failed to remove the "devm_free_coherent" in tmc_probe when the probe fails due to other reasons. This patch gets rid of the incorrect dma_free_coherent() call. Fixes: commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed") Cc: Mathieu Poirier Signed-off-by: Suzuki K Poulose Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 481e46fe7a88557b66330cbb047b25cc13eff4b9) Signed-off-by: Alex Shi --- drivers/hwtracing/coresight/coresight-tmc.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c index 9e02ac963cd0..3978cbb6b038 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.c +++ b/drivers/hwtracing/coresight/coresight-tmc.c @@ -388,9 +388,6 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id) err_misc_register: coresight_unregister(drvdata->csdev); err_devm_kzalloc: - if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) - dma_free_coherent(dev, drvdata->size, - drvdata->vaddr, drvdata->paddr); return ret; } From cac5bc4eaee18bad89a256aa62627508df7cec91 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 23 Sep 2016 17:55:05 +0100 Subject: [PATCH 004/521] arm64: fix dump_backtrace/unwind_frame with NULL tsk In some places, dump_backtrace() is called with a NULL tsk parameter, e.g. in bug_handler() in arch/arm64, or indirectly via show_stack() in core code. The expectation is that this is treated as if current were passed instead of NULL. Similar is true of unwind_frame(). Commit a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust") didn't take this into account. In dump_backtrace() it compares tsk against current *before* we check if tsk is NULL, and in unwind_frame() we never set tsk if it is NULL. Due to this, we won't initialise irq_stack_ptr in either function. In dump_backtrace() this results in calling dump_mem() for memory immediately above the IRQ stack range, rather than for the relevant range on the task stack. In unwind_frame we'll reject unwinding frames on the IRQ stack. In either case this results in incomplete or misleading backtrace information, but is not otherwise problematic. The initial percpu areas (including the IRQ stacks) are allocated in the linear map, and dump_mem uses __get_user(), so we shouldn't access anything with side-effects, and will handle holes safely. This patch fixes the issue by having both functions handle the NULL tsk case before doing anything else with tsk. Signed-off-by: Mark Rutland Fixes: a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust") Acked-by: James Morse Cc: Catalin Marinas Cc: Will Deacon Cc: Yang Shi Signed-off-by: Will Deacon (cherry picked from commit b5e7307d9d5a340d2c9fabbe1cee137d4c682c71) Signed-off-by: Alex Shi --- arch/arm64/kernel/stacktrace.c | 5 ++++- arch/arm64/kernel/traps.c | 10 +++++----- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index cfd46c227c8c..a99eff9afc1f 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -43,6 +43,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) unsigned long fp = frame->fp; unsigned long irq_stack_ptr; + if (!tsk) + tsk = current; + /* * Switching between stacks is valid when tracing current and in * non-preemptible context. @@ -67,7 +70,7 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) frame->pc = *(unsigned long *)(fp + 8); #ifdef CONFIG_FUNCTION_GRAPH_TRACER - if (tsk && tsk->ret_stack && + if (tsk->ret_stack && (frame->pc == (unsigned long)return_to_handler)) { /* * This is a case where function graph tracer has diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index c5392081b49b..0f35cbb22c0a 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -149,6 +149,11 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) unsigned long irq_stack_ptr; int skip; + pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk); + + if (!tsk) + tsk = current; + /* * Switching between stacks is valid when tracing current and in * non-preemptible context. @@ -158,11 +163,6 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) else irq_stack_ptr = 0; - pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk); - - if (!tsk) - tsk = current; - if (tsk == current) { frame.fp = (unsigned long)__builtin_frame_address(0); frame.sp = current_stack_pointer; From 7841412d4c9d4977e62adaf05e6b55461ee25847 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 13 Oct 2016 17:42:09 +0100 Subject: [PATCH 005/521] arm64: kaslr: fix breakage with CONFIG_MODVERSIONS=y As it turns out, the KASLR code breaks CONFIG_MODVERSIONS, since the kcrctab has an absolute address field that is relocated at runtime when the kernel offset is randomized. This has been fixed already for PowerPC in the past, so simply wire up the existing code dealing with this issue. Cc: Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR") Tested-by: Timur Tabi Signed-off-by: Ard Biesheuvel Signed-off-by: Will Deacon (cherry picked from commit 9c0e83c371cf4696926c95f9c8c77cd6ea803426) Signed-off-by: Alex Shi --- arch/arm64/include/asm/module.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h index e12af6754634..06ff7fd9e81f 100644 --- a/arch/arm64/include/asm/module.h +++ b/arch/arm64/include/asm/module.h @@ -17,6 +17,7 @@ #define __ASM_MODULE_H #include +#include #define MODULE_ARCH_VERMAGIC "aarch64" @@ -32,6 +33,10 @@ u64 module_emit_plt_entry(struct module *mod, const Elf64_Rela *rela, Elf64_Sym *sym); #ifdef CONFIG_RANDOMIZE_BASE +#ifdef CONFIG_MODVERSIONS +#define ARCH_RELOCATES_KCRCTAB +#define reloc_start (kimage_vaddr - KIMAGE_VADDR) +#endif extern u64 module_alloc_base; #else #define module_alloc_base ((u64)_etext - MODULES_VSIZE) From 7f42466b9d485dd9bba16004003e84c61e712391 Mon Sep 17 00:00:00 2001 From: Huang Shijie Date: Tue, 8 Nov 2016 13:44:38 +0800 Subject: [PATCH 006/521] arm64: hugetlb: remove the wrong pmd check in find_num_contig() The find_num_contig() will return 1 when the pmd is not present. It will cause a kernel dead loop in the following scenaro: 1.) pmd entry is not present. 2.) the page fault occurs: ... hugetlb_fault() --> hugetlb_no_page() --> set_huge_pte_at() 3.) set_huge_pte_at() will only set the first PMD entry, since the find_num_contig just return 1 in this case. So the PMD entries are all empty except the first one. 4.) when kernel accesses the address mapped by the second PMD entry, a new page fault occurs: ... hugetlb_fault() --> huge_ptep_set_access_flags() The second PMD entry is still empty now. 5.) When the kernel returns, the access will cause a page fault again. The kernel will run like the "4)" above. We will see a dead loop since here. The dead loop is caught in the 32M hugetlb page (2M PMD + Contiguous bit). This patch removes wrong pmd check, and fixes this dead loop. This patch also removes the redundant checks for PGD/PUD in the find_num_contig(). Acked-by: Steve Capper Signed-off-by: Huang Shijie Reviewed-by: Catalin Marinas Signed-off-by: Catalin Marinas (cherry picked from commit 20156ce2365d61beaa6f5a78a7a789044e0e7acc) Signed-off-by: Alex Shi --- arch/arm64/mm/hugetlbpage.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index da30529bb1f6..0a18f81e4910 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -51,20 +51,8 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, *pgsize = PAGE_SIZE; if (!pte_cont(pte)) return 1; - if (!pgd_present(*pgd)) { - VM_BUG_ON(!pgd_present(*pgd)); - return 1; - } pud = pud_offset(pgd, addr); - if (!pud_present(*pud)) { - VM_BUG_ON(!pud_present(*pud)); - return 1; - } pmd = pmd_offset(pud, addr); - if (!pmd_present(*pmd)) { - VM_BUG_ON(!pmd_present(*pmd)); - return 1; - } if ((pte_t *)pmd == ptep) { *pgsize = PMD_SIZE; return CONT_PMDS; From 4fc54a30664c50ccca342ec0a81c8c1efce0a0bf Mon Sep 17 00:00:00 2001 From: Huang Shijie Date: Tue, 8 Nov 2016 13:44:39 +0800 Subject: [PATCH 007/521] arm64: hugetlb: fix the wrong address for several functions The libhugetlbfs meets several failures since the following functions do not use the correct address: huge_ptep_get_and_clear() huge_ptep_set_access_flags() huge_ptep_set_wrprotect() huge_ptep_clear_flush() This patch fixes the wrong address for them. Signed-off-by: Huang Shijie Reviewed-by: Catalin Marinas Signed-off-by: Catalin Marinas (cherry picked from commit 0c2f0afe3582c58efeef93bc57bc07d502132618) Signed-off-by: Alex Shi --- arch/arm64/mm/hugetlbpage.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 0a18f81e4910..1af57cffcdec 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -200,7 +200,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, ncontig = find_num_contig(mm, addr, cpte, *cpte, &pgsize); /* save the 1st pte to return */ pte = ptep_get_and_clear(mm, addr, cpte); - for (i = 1; i < ncontig; ++i) { + for (i = 1, addr += pgsize; i < ncontig; ++i, addr += pgsize) { /* * If HW_AFDBM is enabled, then the HW could * turn on the dirty bit for any of the page @@ -238,7 +238,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, pfn = pte_pfn(*cpte); ncontig = find_num_contig(vma->vm_mm, addr, cpte, *cpte, &pgsize); - for (i = 0; i < ncontig; ++i, ++cpte) { + for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) { changed = ptep_set_access_flags(vma, addr, cpte, pfn_pte(pfn, hugeprot), @@ -261,7 +261,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, cpte = huge_pte_offset(mm, addr); ncontig = find_num_contig(mm, addr, cpte, *cpte, &pgsize); - for (i = 0; i < ncontig; ++i, ++cpte) + for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) ptep_set_wrprotect(mm, addr, cpte); } else { ptep_set_wrprotect(mm, addr, ptep); @@ -279,7 +279,7 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma, cpte = huge_pte_offset(vma->vm_mm, addr); ncontig = find_num_contig(vma->vm_mm, addr, cpte, *cpte, &pgsize); - for (i = 0; i < ncontig; ++i, ++cpte) + for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) ptep_clear_flush(vma, addr, cpte); } else { ptep_clear_flush(vma, addr, ptep); From 5a54b72529e96f3ca594e9ef3bfc9be68700900b Mon Sep 17 00:00:00 2001 From: Huang Shijie Date: Wed, 11 Jan 2017 14:02:00 +0800 Subject: [PATCH 008/521] arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags In current code, the @changed always returns the last one's status for the huge page with the contiguous bit set. This is really not what we want. Even one of the PTEs is changed, we should tell it to the caller. This patch fixes this issue. Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Cc: # 4.5.x- Signed-off-by: Huang Shijie Signed-off-by: Catalin Marinas (cherry picked from commit 69d012345a1a32d3f03957f14d972efccc106a98) Signed-off-by: Alex Shi --- arch/arm64/mm/hugetlbpage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 1af57cffcdec..019f13637fae 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -239,7 +239,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, ncontig = find_num_contig(vma->vm_mm, addr, cpte, *cpte, &pgsize); for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) { - changed = ptep_set_access_flags(vma, addr, cpte, + changed |= ptep_set_access_flags(vma, addr, cpte, pfn_pte(pfn, hugeprot), dirty); From 0c0be310ba29e4a053e8aac934aebe590c5da909 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Thu, 18 Feb 2016 15:03:24 +0100 Subject: [PATCH 009/521] netlink: remove mmapped netlink support commit d1b4c689d4130bcfd3532680b64db562300716b6 upstream. mmapped netlink has a number of unresolved issues: - TX zerocopy support had to be disabled more than a year ago via commit 4682a0358639b29cf ("netlink: Always copy on mmap TX.") because the content of the mmapped area can change after netlink attribute validation but before message processing. - RX support was implemented mainly to speed up nfqueue dumping packet payload to userspace. However, since commit ae08ce0021087a5d812d2 ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy with the socket-based interface too (via the skb_zerocopy helper). The other problem is that skbs attached to mmaped netlink socket behave different from normal skbs: - they don't have a shinfo area, so all functions that use skb_shinfo() (e.g. skb_clone) cannot be used. - reserving headroom prevents userspace from seeing the content as it expects message to start at skb->head. See for instance commit aa3a022094fa ("netlink: not trim skb for mmaped socket when dump"). - skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we crash because it needs the sk to check if a tx ring is attached. Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359 ("netfilter: nfnetlink: use original skbuff when acking batches"). mmaped netlink also didn't play nicely with the skb_zerocopy helper used by nfqueue and openvswitch. Daniel Borkmann fixed this via commit 6bb0fef489f6 ("netlink, mmap: fix edge-case leakages in nf queue zero-copy")' but at the cost of also needing to provide remaining length to the allocation function. nfqueue also has problems when used with mmaped rx netlink: - mmaped netlink doesn't allow use of nfqueue batch verdict messages. Problem is that in the mmap case, the allocation time also determines the ordering in which the frame will be seen by userspace (A allocating before B means that A is located in earlier ring slot, but this also means that B might get a lower sequence number then A since seqno is decided later. To fix this we would need to extend the spinlocked region to also cover the allocation and message setup which isn't desirable. - nfqueue can now be configured to queue large (GSO) skbs to userspace. Queing GSO packets is faster than having to force a software segmentation in the kernel, so this is a desirable option. However, with a mmap based ring one has to use 64kb per ring slot element, else mmap has to fall back to the socket path (NL_MMAP_STATUS_COPY) for all large packets. To use the mmap interface, userspace not only has to probe for mmap netlink support, it also has to implement a recv/socket receive path in order to handle messages that exceed the size of an rx ring element. Cc: Daniel Borkmann Cc: Ken-ichirou MATSUZAWA Cc: Pablo Neira Ayuso Cc: Patrick McHardy Cc: Thomas Graf Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Cc: Shi Yuejie Signed-off-by: Greg Kroah-Hartman --- Documentation/networking/netlink_mmap.txt | 332 ---------- include/uapi/linux/netlink.h | 4 + include/uapi/linux/netlink_diag.h | 2 + net/netlink/Kconfig | 9 - net/netlink/af_netlink.c | 751 +--------------------- net/netlink/af_netlink.h | 15 - net/netlink/diag.c | 39 -- 7 files changed, 14 insertions(+), 1138 deletions(-) delete mode 100644 Documentation/networking/netlink_mmap.txt diff --git a/Documentation/networking/netlink_mmap.txt b/Documentation/networking/netlink_mmap.txt deleted file mode 100644 index 54f10478e8e3..000000000000 --- a/Documentation/networking/netlink_mmap.txt +++ /dev/null @@ -1,332 +0,0 @@ -This file documents how to use memory mapped I/O with netlink. - -Author: Patrick McHardy - -Overview --------- - -Memory mapped netlink I/O can be used to increase throughput and decrease -overhead of unicast receive and transmit operations. Some netlink subsystems -require high throughput, these are mainly the netfilter subsystems -nfnetlink_queue and nfnetlink_log, but it can also help speed up large -dump operations of f.i. the routing database. - -Memory mapped netlink I/O used two circular ring buffers for RX and TX which -are mapped into the processes address space. - -The RX ring is used by the kernel to directly construct netlink messages into -user-space memory without copying them as done with regular socket I/O, -additionally as long as the ring contains messages no recvmsg() or poll() -syscalls have to be issued by user-space to get more message. - -The TX ring is used to process messages directly from user-space memory, the -kernel processes all messages contained in the ring using a single sendmsg() -call. - -Usage overview --------------- - -In order to use memory mapped netlink I/O, user-space needs three main changes: - -- ring setup -- conversion of the RX path to get messages from the ring instead of recvmsg() -- conversion of the TX path to construct messages into the ring - -Ring setup is done using setsockopt() to provide the ring parameters to the -kernel, then a call to mmap() to map the ring into the processes address space: - -- setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, ¶ms, sizeof(params)); -- setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, ¶ms, sizeof(params)); -- ring = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0) - -Usage of either ring is optional, but even if only the RX ring is used the -mapping still needs to be writable in order to update the frame status after -processing. - -Conversion of the reception path involves calling poll() on the file -descriptor, once the socket is readable the frames from the ring are -processed in order until no more messages are available, as indicated by -a status word in the frame header. - -On kernel side, in order to make use of memory mapped I/O on receive, the -originating netlink subsystem needs to support memory mapped I/O, otherwise -it will use an allocated socket buffer as usual and the contents will be - copied to the ring on transmission, nullifying most of the performance gains. -Dumps of kernel databases automatically support memory mapped I/O. - -Conversion of the transmit path involves changing message construction to -use memory from the TX ring instead of (usually) a buffer declared on the -stack and setting up the frame header appropriately. Optionally poll() can -be used to wait for free frames in the TX ring. - -Structured and definitions for using memory mapped I/O are contained in -. - -RX and TX rings ----------------- - -Each ring contains a number of continuous memory blocks, containing frames of -fixed size dependent on the parameters used for ring setup. - -Ring: [ block 0 ] - [ frame 0 ] - [ frame 1 ] - [ block 1 ] - [ frame 2 ] - [ frame 3 ] - ... - [ block n ] - [ frame 2 * n ] - [ frame 2 * n + 1 ] - -The blocks are only visible to the kernel, from the point of view of user-space -the ring just contains the frames in a continuous memory zone. - -The ring parameters used for setting up the ring are defined as follows: - -struct nl_mmap_req { - unsigned int nm_block_size; - unsigned int nm_block_nr; - unsigned int nm_frame_size; - unsigned int nm_frame_nr; -}; - -Frames are grouped into blocks, where each block is a continuous region of memory -and holds nm_block_size / nm_frame_size frames. The total number of frames in -the ring is nm_frame_nr. The following invariants hold: - -- frames_per_block = nm_block_size / nm_frame_size - -- nm_frame_nr = frames_per_block * nm_block_nr - -Some parameters are constrained, specifically: - -- nm_block_size must be a multiple of the architectures memory page size. - The getpagesize() function can be used to get the page size. - -- nm_frame_size must be equal or larger to NL_MMAP_HDRLEN, IOW a frame must be - able to hold at least the frame header - -- nm_frame_size must be smaller or equal to nm_block_size - -- nm_frame_size must be a multiple of NL_MMAP_MSG_ALIGNMENT - -- nm_frame_nr must equal the actual number of frames as specified above. - -When the kernel can't allocate physically continuous memory for a ring block, -it will fall back to use physically discontinuous memory. This might affect -performance negatively, in order to avoid this the nm_frame_size parameter -should be chosen to be as small as possible for the required frame size and -the number of blocks should be increased instead. - -Ring frames ------------- - -Each frames contain a frame header, consisting of a synchronization word and some -meta-data, and the message itself. - -Frame: [ header message ] - -The frame header is defined as follows: - -struct nl_mmap_hdr { - unsigned int nm_status; - unsigned int nm_len; - __u32 nm_group; - /* credentials */ - __u32 nm_pid; - __u32 nm_uid; - __u32 nm_gid; -}; - -- nm_status is used for synchronizing processing between the kernel and user- - space and specifies ownership of the frame as well as the operation to perform - -- nm_len contains the length of the message contained in the data area - -- nm_group specified the destination multicast group of message - -- nm_pid, nm_uid and nm_gid contain the netlink pid, UID and GID of the sending - process. These values correspond to the data available using SOCK_PASSCRED in - the SCM_CREDENTIALS cmsg. - -The possible values in the status word are: - -- NL_MMAP_STATUS_UNUSED: - RX ring: frame belongs to the kernel and contains no message - for user-space. Approriate action is to invoke poll() - to wait for new messages. - - TX ring: frame belongs to user-space and can be used for - message construction. - -- NL_MMAP_STATUS_RESERVED: - RX ring only: frame is currently used by the kernel for message - construction and contains no valid message yet. - Appropriate action is to invoke poll() to wait for - new messages. - -- NL_MMAP_STATUS_VALID: - RX ring: frame contains a valid message. Approriate action is - to process the message and release the frame back to - the kernel by setting the status to - NL_MMAP_STATUS_UNUSED or queue the frame by setting the - status to NL_MMAP_STATUS_SKIP. - - TX ring: the frame contains a valid message from user-space to - be processed by the kernel. After completing processing - the kernel will release the frame back to user-space by - setting the status to NL_MMAP_STATUS_UNUSED. - -- NL_MMAP_STATUS_COPY: - RX ring only: a message is ready to be processed but could not be - stored in the ring, either because it exceeded the - frame size or because the originating subsystem does - not support memory mapped I/O. Appropriate action is - to invoke recvmsg() to receive the message and release - the frame back to the kernel by setting the status to - NL_MMAP_STATUS_UNUSED. - -- NL_MMAP_STATUS_SKIP: - RX ring only: user-space queued the message for later processing, but - processed some messages following it in the ring. The - kernel should skip this frame when looking for unused - frames. - -The data area of a frame begins at a offset of NL_MMAP_HDRLEN relative to the -frame header. - -TX limitations --------------- - -As of Jan 2015 the message is always copied from the ring frame to an -allocated buffer due to unresolved security concerns. -See commit 4682a0358639b29cf ("netlink: Always copy on mmap TX."). - -Example -------- - -Ring setup: - - unsigned int block_size = 16 * getpagesize(); - struct nl_mmap_req req = { - .nm_block_size = block_size, - .nm_block_nr = 64, - .nm_frame_size = 16384, - .nm_frame_nr = 64 * block_size / 16384, - }; - unsigned int ring_size; - void *rx_ring, *tx_ring; - - /* Configure ring parameters */ - if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) - exit(1); - if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) - exit(1) - - /* Calculate size of each individual ring */ - ring_size = req.nm_block_nr * req.nm_block_size; - - /* Map RX/TX rings. The TX ring is located after the RX ring */ - rx_ring = mmap(NULL, 2 * ring_size, PROT_READ | PROT_WRITE, - MAP_SHARED, fd, 0); - if ((long)rx_ring == -1L) - exit(1); - tx_ring = rx_ring + ring_size: - -Message reception: - -This example assumes some ring parameters of the ring setup are available. - - unsigned int frame_offset = 0; - struct nl_mmap_hdr *hdr; - struct nlmsghdr *nlh; - unsigned char buf[16384]; - ssize_t len; - - while (1) { - struct pollfd pfds[1]; - - pfds[0].fd = fd; - pfds[0].events = POLLIN | POLLERR; - pfds[0].revents = 0; - - if (poll(pfds, 1, -1) < 0 && errno != -EINTR) - exit(1); - - /* Check for errors. Error handling omitted */ - if (pfds[0].revents & POLLERR) - - - /* If no new messages, poll again */ - if (!(pfds[0].revents & POLLIN)) - continue; - - /* Process all frames */ - while (1) { - /* Get next frame header */ - hdr = rx_ring + frame_offset; - - if (hdr->nm_status == NL_MMAP_STATUS_VALID) { - /* Regular memory mapped frame */ - nlh = (void *)hdr + NL_MMAP_HDRLEN; - len = hdr->nm_len; - - /* Release empty message immediately. May happen - * on error during message construction. - */ - if (len == 0) - goto release; - } else if (hdr->nm_status == NL_MMAP_STATUS_COPY) { - /* Frame queued to socket receive queue */ - len = recv(fd, buf, sizeof(buf), MSG_DONTWAIT); - if (len <= 0) - break; - nlh = buf; - } else - /* No more messages to process, continue polling */ - break; - - process_msg(nlh); -release: - /* Release frame back to the kernel */ - hdr->nm_status = NL_MMAP_STATUS_UNUSED; - - /* Advance frame offset to next frame */ - frame_offset = (frame_offset + frame_size) % ring_size; - } - } - -Message transmission: - -This example assumes some ring parameters of the ring setup are available. -A single message is constructed and transmitted, to send multiple messages -at once they would be constructed in consecutive frames before a final call -to sendto(). - - unsigned int frame_offset = 0; - struct nl_mmap_hdr *hdr; - struct nlmsghdr *nlh; - struct sockaddr_nl addr = { - .nl_family = AF_NETLINK, - }; - - hdr = tx_ring + frame_offset; - if (hdr->nm_status != NL_MMAP_STATUS_UNUSED) - /* No frame available. Use poll() to avoid. */ - exit(1); - - nlh = (void *)hdr + NL_MMAP_HDRLEN; - - /* Build message */ - build_message(nlh); - - /* Fill frame header: length and status need to be set */ - hdr->nm_len = nlh->nlmsg_len; - hdr->nm_status = NL_MMAP_STATUS_VALID; - - if (sendto(fd, NULL, 0, 0, &addr, sizeof(addr)) < 0) - exit(1); - - /* Advance frame offset to next frame */ - frame_offset = (frame_offset + frame_size) % ring_size; diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index f095155d8749..0dba4e4ed2be 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -107,8 +107,10 @@ struct nlmsgerr { #define NETLINK_PKTINFO 3 #define NETLINK_BROADCAST_ERROR 4 #define NETLINK_NO_ENOBUFS 5 +#ifndef __KERNEL__ #define NETLINK_RX_RING 6 #define NETLINK_TX_RING 7 +#endif #define NETLINK_LISTEN_ALL_NSID 8 #define NETLINK_LIST_MEMBERSHIPS 9 #define NETLINK_CAP_ACK 10 @@ -134,6 +136,7 @@ struct nl_mmap_hdr { __u32 nm_gid; }; +#ifndef __KERNEL__ enum nl_mmap_status { NL_MMAP_STATUS_UNUSED, NL_MMAP_STATUS_RESERVED, @@ -145,6 +148,7 @@ enum nl_mmap_status { #define NL_MMAP_MSG_ALIGNMENT NLMSG_ALIGNTO #define NL_MMAP_MSG_ALIGN(sz) __ALIGN_KERNEL(sz, NL_MMAP_MSG_ALIGNMENT) #define NL_MMAP_HDRLEN NL_MMAP_MSG_ALIGN(sizeof(struct nl_mmap_hdr)) +#endif #define NET_MAJOR 36 /* Major 36 is reserved for networking */ diff --git a/include/uapi/linux/netlink_diag.h b/include/uapi/linux/netlink_diag.h index f2159d30d1f5..d79399394b46 100644 --- a/include/uapi/linux/netlink_diag.h +++ b/include/uapi/linux/netlink_diag.h @@ -48,6 +48,8 @@ enum { #define NDIAG_SHOW_MEMINFO 0x00000001 /* show memory info of a socket */ #define NDIAG_SHOW_GROUPS 0x00000002 /* show groups of a netlink socket */ +#ifndef __KERNEL__ #define NDIAG_SHOW_RING_CFG 0x00000004 /* show ring configuration */ +#endif #endif diff --git a/net/netlink/Kconfig b/net/netlink/Kconfig index 2c5e95e9bfbd..5d6e8c05b3d4 100644 --- a/net/netlink/Kconfig +++ b/net/netlink/Kconfig @@ -2,15 +2,6 @@ # Netlink Sockets # -config NETLINK_MMAP - bool "NETLINK: mmaped IO" - ---help--- - This option enables support for memory mapped netlink IO. This - reduces overhead by avoiding copying data between kernel- and - userspace. - - If unsure, say N. - config NETLINK_DIAG tristate "NETLINK: socket monitoring interface" default n diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 360700a2f46c..8e33019d8e7b 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -225,7 +225,7 @@ static int __netlink_deliver_tap_skb(struct sk_buff *skb, dev_hold(dev); - if (netlink_skb_is_mmaped(skb) || is_vmalloc_addr(skb->head)) + if (is_vmalloc_addr(skb->head)) nskb = netlink_to_full_skb(skb, GFP_ATOMIC); else nskb = skb_clone(skb, GFP_ATOMIC); @@ -300,610 +300,8 @@ static void netlink_rcv_wake(struct sock *sk) wake_up_interruptible(&nlk->wait); } -#ifdef CONFIG_NETLINK_MMAP -static bool netlink_rx_is_mmaped(struct sock *sk) -{ - return nlk_sk(sk)->rx_ring.pg_vec != NULL; -} - -static bool netlink_tx_is_mmaped(struct sock *sk) -{ - return nlk_sk(sk)->tx_ring.pg_vec != NULL; -} - -static __pure struct page *pgvec_to_page(const void *addr) -{ - if (is_vmalloc_addr(addr)) - return vmalloc_to_page(addr); - else - return virt_to_page(addr); -} - -static void free_pg_vec(void **pg_vec, unsigned int order, unsigned int len) -{ - unsigned int i; - - for (i = 0; i < len; i++) { - if (pg_vec[i] != NULL) { - if (is_vmalloc_addr(pg_vec[i])) - vfree(pg_vec[i]); - else - free_pages((unsigned long)pg_vec[i], order); - } - } - kfree(pg_vec); -} - -static void *alloc_one_pg_vec_page(unsigned long order) -{ - void *buffer; - gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP | __GFP_ZERO | - __GFP_NOWARN | __GFP_NORETRY; - - buffer = (void *)__get_free_pages(gfp_flags, order); - if (buffer != NULL) - return buffer; - - buffer = vzalloc((1 << order) * PAGE_SIZE); - if (buffer != NULL) - return buffer; - - gfp_flags &= ~__GFP_NORETRY; - return (void *)__get_free_pages(gfp_flags, order); -} - -static void **alloc_pg_vec(struct netlink_sock *nlk, - struct nl_mmap_req *req, unsigned int order) -{ - unsigned int block_nr = req->nm_block_nr; - unsigned int i; - void **pg_vec; - - pg_vec = kcalloc(block_nr, sizeof(void *), GFP_KERNEL); - if (pg_vec == NULL) - return NULL; - - for (i = 0; i < block_nr; i++) { - pg_vec[i] = alloc_one_pg_vec_page(order); - if (pg_vec[i] == NULL) - goto err1; - } - - return pg_vec; -err1: - free_pg_vec(pg_vec, order, block_nr); - return NULL; -} - - -static void -__netlink_set_ring(struct sock *sk, struct nl_mmap_req *req, bool tx_ring, void **pg_vec, - unsigned int order) -{ - struct netlink_sock *nlk = nlk_sk(sk); - struct sk_buff_head *queue; - struct netlink_ring *ring; - - queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue; - ring = tx_ring ? &nlk->tx_ring : &nlk->rx_ring; - - spin_lock_bh(&queue->lock); - - ring->frame_max = req->nm_frame_nr - 1; - ring->head = 0; - ring->frame_size = req->nm_frame_size; - ring->pg_vec_pages = req->nm_block_size / PAGE_SIZE; - - swap(ring->pg_vec_len, req->nm_block_nr); - swap(ring->pg_vec_order, order); - swap(ring->pg_vec, pg_vec); - - __skb_queue_purge(queue); - spin_unlock_bh(&queue->lock); - - WARN_ON(atomic_read(&nlk->mapped)); - - if (pg_vec) - free_pg_vec(pg_vec, order, req->nm_block_nr); -} - -static int netlink_set_ring(struct sock *sk, struct nl_mmap_req *req, - bool tx_ring) -{ - struct netlink_sock *nlk = nlk_sk(sk); - struct netlink_ring *ring; - void **pg_vec = NULL; - unsigned int order = 0; - - ring = tx_ring ? &nlk->tx_ring : &nlk->rx_ring; - - if (atomic_read(&nlk->mapped)) - return -EBUSY; - if (atomic_read(&ring->pending)) - return -EBUSY; - - if (req->nm_block_nr) { - if (ring->pg_vec != NULL) - return -EBUSY; - - if ((int)req->nm_block_size <= 0) - return -EINVAL; - if (!PAGE_ALIGNED(req->nm_block_size)) - return -EINVAL; - if (req->nm_frame_size < NL_MMAP_HDRLEN) - return -EINVAL; - if (!IS_ALIGNED(req->nm_frame_size, NL_MMAP_MSG_ALIGNMENT)) - return -EINVAL; - - ring->frames_per_block = req->nm_block_size / - req->nm_frame_size; - if (ring->frames_per_block == 0) - return -EINVAL; - if (ring->frames_per_block * req->nm_block_nr != - req->nm_frame_nr) - return -EINVAL; - - order = get_order(req->nm_block_size); - pg_vec = alloc_pg_vec(nlk, req, order); - if (pg_vec == NULL) - return -ENOMEM; - } else { - if (req->nm_frame_nr) - return -EINVAL; - } - - mutex_lock(&nlk->pg_vec_lock); - if (atomic_read(&nlk->mapped) == 0) { - __netlink_set_ring(sk, req, tx_ring, pg_vec, order); - mutex_unlock(&nlk->pg_vec_lock); - return 0; - } - - mutex_unlock(&nlk->pg_vec_lock); - - if (pg_vec) - free_pg_vec(pg_vec, order, req->nm_block_nr); - - return -EBUSY; -} - -static void netlink_mm_open(struct vm_area_struct *vma) -{ - struct file *file = vma->vm_file; - struct socket *sock = file->private_data; - struct sock *sk = sock->sk; - - if (sk) - atomic_inc(&nlk_sk(sk)->mapped); -} - -static void netlink_mm_close(struct vm_area_struct *vma) -{ - struct file *file = vma->vm_file; - struct socket *sock = file->private_data; - struct sock *sk = sock->sk; - - if (sk) - atomic_dec(&nlk_sk(sk)->mapped); -} - -static const struct vm_operations_struct netlink_mmap_ops = { - .open = netlink_mm_open, - .close = netlink_mm_close, -}; - -static int netlink_mmap(struct file *file, struct socket *sock, - struct vm_area_struct *vma) -{ - struct sock *sk = sock->sk; - struct netlink_sock *nlk = nlk_sk(sk); - struct netlink_ring *ring; - unsigned long start, size, expected; - unsigned int i; - int err = -EINVAL; - - if (vma->vm_pgoff) - return -EINVAL; - - mutex_lock(&nlk->pg_vec_lock); - - expected = 0; - for (ring = &nlk->rx_ring; ring <= &nlk->tx_ring; ring++) { - if (ring->pg_vec == NULL) - continue; - expected += ring->pg_vec_len * ring->pg_vec_pages * PAGE_SIZE; - } - - if (expected == 0) - goto out; - - size = vma->vm_end - vma->vm_start; - if (size != expected) - goto out; - - start = vma->vm_start; - for (ring = &nlk->rx_ring; ring <= &nlk->tx_ring; ring++) { - if (ring->pg_vec == NULL) - continue; - - for (i = 0; i < ring->pg_vec_len; i++) { - struct page *page; - void *kaddr = ring->pg_vec[i]; - unsigned int pg_num; - - for (pg_num = 0; pg_num < ring->pg_vec_pages; pg_num++) { - page = pgvec_to_page(kaddr); - err = vm_insert_page(vma, start, page); - if (err < 0) - goto out; - start += PAGE_SIZE; - kaddr += PAGE_SIZE; - } - } - } - - atomic_inc(&nlk->mapped); - vma->vm_ops = &netlink_mmap_ops; - err = 0; -out: - mutex_unlock(&nlk->pg_vec_lock); - return err; -} - -static void netlink_frame_flush_dcache(const struct nl_mmap_hdr *hdr, unsigned int nm_len) -{ -#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 1 - struct page *p_start, *p_end; - - /* First page is flushed through netlink_{get,set}_status */ - p_start = pgvec_to_page(hdr + PAGE_SIZE); - p_end = pgvec_to_page((void *)hdr + NL_MMAP_HDRLEN + nm_len - 1); - while (p_start <= p_end) { - flush_dcache_page(p_start); - p_start++; - } -#endif -} - -static enum nl_mmap_status netlink_get_status(const struct nl_mmap_hdr *hdr) -{ - smp_rmb(); - flush_dcache_page(pgvec_to_page(hdr)); - return hdr->nm_status; -} - -static void netlink_set_status(struct nl_mmap_hdr *hdr, - enum nl_mmap_status status) -{ - smp_mb(); - hdr->nm_status = status; - flush_dcache_page(pgvec_to_page(hdr)); -} - -static struct nl_mmap_hdr * -__netlink_lookup_frame(const struct netlink_ring *ring, unsigned int pos) -{ - unsigned int pg_vec_pos, frame_off; - - pg_vec_pos = pos / ring->frames_per_block; - frame_off = pos % ring->frames_per_block; - - return ring->pg_vec[pg_vec_pos] + (frame_off * ring->frame_size); -} - -static struct nl_mmap_hdr * -netlink_lookup_frame(const struct netlink_ring *ring, unsigned int pos, - enum nl_mmap_status status) -{ - struct nl_mmap_hdr *hdr; - - hdr = __netlink_lookup_frame(ring, pos); - if (netlink_get_status(hdr) != status) - return NULL; - - return hdr; -} - -static struct nl_mmap_hdr * -netlink_current_frame(const struct netlink_ring *ring, - enum nl_mmap_status status) -{ - return netlink_lookup_frame(ring, ring->head, status); -} - -static void netlink_increment_head(struct netlink_ring *ring) -{ - ring->head = ring->head != ring->frame_max ? ring->head + 1 : 0; -} - -static void netlink_forward_ring(struct netlink_ring *ring) -{ - unsigned int head = ring->head; - const struct nl_mmap_hdr *hdr; - - do { - hdr = __netlink_lookup_frame(ring, ring->head); - if (hdr->nm_status == NL_MMAP_STATUS_UNUSED) - break; - if (hdr->nm_status != NL_MMAP_STATUS_SKIP) - break; - netlink_increment_head(ring); - } while (ring->head != head); -} - -static bool netlink_has_valid_frame(struct netlink_ring *ring) -{ - unsigned int head = ring->head, pos = head; - const struct nl_mmap_hdr *hdr; - - do { - hdr = __netlink_lookup_frame(ring, pos); - if (hdr->nm_status == NL_MMAP_STATUS_VALID) - return true; - pos = pos != 0 ? pos - 1 : ring->frame_max; - } while (pos != head); - - return false; -} - -static bool netlink_dump_space(struct netlink_sock *nlk) -{ - struct netlink_ring *ring = &nlk->rx_ring; - struct nl_mmap_hdr *hdr; - unsigned int n; - - hdr = netlink_current_frame(ring, NL_MMAP_STATUS_UNUSED); - if (hdr == NULL) - return false; - - n = ring->head + ring->frame_max / 2; - if (n > ring->frame_max) - n -= ring->frame_max; - - hdr = __netlink_lookup_frame(ring, n); - - return hdr->nm_status == NL_MMAP_STATUS_UNUSED; -} - -static unsigned int netlink_poll(struct file *file, struct socket *sock, - poll_table *wait) -{ - struct sock *sk = sock->sk; - struct netlink_sock *nlk = nlk_sk(sk); - unsigned int mask; - int err; - - if (nlk->rx_ring.pg_vec != NULL) { - /* Memory mapped sockets don't call recvmsg(), so flow control - * for dumps is performed here. A dump is allowed to continue - * if at least half the ring is unused. - */ - while (nlk->cb_running && netlink_dump_space(nlk)) { - err = netlink_dump(sk); - if (err < 0) { - sk->sk_err = -err; - sk->sk_error_report(sk); - break; - } - } - netlink_rcv_wake(sk); - } - - mask = datagram_poll(file, sock, wait); - - /* We could already have received frames in the normal receive - * queue, that will show up as NL_MMAP_STATUS_COPY in the ring, - * so if mask contains pollin/etc already, there's no point - * walking the ring. - */ - if ((mask & (POLLIN | POLLRDNORM)) != (POLLIN | POLLRDNORM)) { - spin_lock_bh(&sk->sk_receive_queue.lock); - if (nlk->rx_ring.pg_vec) { - if (netlink_has_valid_frame(&nlk->rx_ring)) - mask |= POLLIN | POLLRDNORM; - } - spin_unlock_bh(&sk->sk_receive_queue.lock); - } - - spin_lock_bh(&sk->sk_write_queue.lock); - if (nlk->tx_ring.pg_vec) { - if (netlink_current_frame(&nlk->tx_ring, NL_MMAP_STATUS_UNUSED)) - mask |= POLLOUT | POLLWRNORM; - } - spin_unlock_bh(&sk->sk_write_queue.lock); - - return mask; -} - -static struct nl_mmap_hdr *netlink_mmap_hdr(struct sk_buff *skb) -{ - return (struct nl_mmap_hdr *)(skb->head - NL_MMAP_HDRLEN); -} - -static void netlink_ring_setup_skb(struct sk_buff *skb, struct sock *sk, - struct netlink_ring *ring, - struct nl_mmap_hdr *hdr) -{ - unsigned int size; - void *data; - - size = ring->frame_size - NL_MMAP_HDRLEN; - data = (void *)hdr + NL_MMAP_HDRLEN; - - skb->head = data; - skb->data = data; - skb_reset_tail_pointer(skb); - skb->end = skb->tail + size; - skb->len = 0; - - skb->destructor = netlink_skb_destructor; - NETLINK_CB(skb).flags |= NETLINK_SKB_MMAPED; - NETLINK_CB(skb).sk = sk; -} - -static int netlink_mmap_sendmsg(struct sock *sk, struct msghdr *msg, - u32 dst_portid, u32 dst_group, - struct scm_cookie *scm) -{ - struct netlink_sock *nlk = nlk_sk(sk); - struct netlink_ring *ring; - struct nl_mmap_hdr *hdr; - struct sk_buff *skb; - unsigned int maxlen; - int err = 0, len = 0; - - mutex_lock(&nlk->pg_vec_lock); - - ring = &nlk->tx_ring; - maxlen = ring->frame_size - NL_MMAP_HDRLEN; - - do { - unsigned int nm_len; - - hdr = netlink_current_frame(ring, NL_MMAP_STATUS_VALID); - if (hdr == NULL) { - if (!(msg->msg_flags & MSG_DONTWAIT) && - atomic_read(&nlk->tx_ring.pending)) - schedule(); - continue; - } - - nm_len = ACCESS_ONCE(hdr->nm_len); - if (nm_len > maxlen) { - err = -EINVAL; - goto out; - } - - netlink_frame_flush_dcache(hdr, nm_len); - - skb = alloc_skb(nm_len, GFP_KERNEL); - if (skb == NULL) { - err = -ENOBUFS; - goto out; - } - __skb_put(skb, nm_len); - memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, nm_len); - netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED); - - netlink_increment_head(ring); - - NETLINK_CB(skb).portid = nlk->portid; - NETLINK_CB(skb).dst_group = dst_group; - NETLINK_CB(skb).creds = scm->creds; - - err = security_netlink_send(sk, skb); - if (err) { - kfree_skb(skb); - goto out; - } - - if (unlikely(dst_group)) { - atomic_inc(&skb->users); - netlink_broadcast(sk, skb, dst_portid, dst_group, - GFP_KERNEL); - } - err = netlink_unicast(sk, skb, dst_portid, - msg->msg_flags & MSG_DONTWAIT); - if (err < 0) - goto out; - len += err; - - } while (hdr != NULL || - (!(msg->msg_flags & MSG_DONTWAIT) && - atomic_read(&nlk->tx_ring.pending))); - - if (len > 0) - err = len; -out: - mutex_unlock(&nlk->pg_vec_lock); - return err; -} - -static void netlink_queue_mmaped_skb(struct sock *sk, struct sk_buff *skb) -{ - struct nl_mmap_hdr *hdr; - - hdr = netlink_mmap_hdr(skb); - hdr->nm_len = skb->len; - hdr->nm_group = NETLINK_CB(skb).dst_group; - hdr->nm_pid = NETLINK_CB(skb).creds.pid; - hdr->nm_uid = from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid); - hdr->nm_gid = from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid); - netlink_frame_flush_dcache(hdr, hdr->nm_len); - netlink_set_status(hdr, NL_MMAP_STATUS_VALID); - - NETLINK_CB(skb).flags |= NETLINK_SKB_DELIVERED; - kfree_skb(skb); -} - -static void netlink_ring_set_copied(struct sock *sk, struct sk_buff *skb) -{ - struct netlink_sock *nlk = nlk_sk(sk); - struct netlink_ring *ring = &nlk->rx_ring; - struct nl_mmap_hdr *hdr; - - spin_lock_bh(&sk->sk_receive_queue.lock); - hdr = netlink_current_frame(ring, NL_MMAP_STATUS_UNUSED); - if (hdr == NULL) { - spin_unlock_bh(&sk->sk_receive_queue.lock); - kfree_skb(skb); - netlink_overrun(sk); - return; - } - netlink_increment_head(ring); - __skb_queue_tail(&sk->sk_receive_queue, skb); - spin_unlock_bh(&sk->sk_receive_queue.lock); - - hdr->nm_len = skb->len; - hdr->nm_group = NETLINK_CB(skb).dst_group; - hdr->nm_pid = NETLINK_CB(skb).creds.pid; - hdr->nm_uid = from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid); - hdr->nm_gid = from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid); - netlink_set_status(hdr, NL_MMAP_STATUS_COPY); -} - -#else /* CONFIG_NETLINK_MMAP */ -#define netlink_rx_is_mmaped(sk) false -#define netlink_tx_is_mmaped(sk) false -#define netlink_mmap sock_no_mmap -#define netlink_poll datagram_poll -#define netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group, scm) 0 -#endif /* CONFIG_NETLINK_MMAP */ - static void netlink_skb_destructor(struct sk_buff *skb) { -#ifdef CONFIG_NETLINK_MMAP - struct nl_mmap_hdr *hdr; - struct netlink_ring *ring; - struct sock *sk; - - /* If a packet from the kernel to userspace was freed because of an - * error without being delivered to userspace, the kernel must reset - * the status. In the direction userspace to kernel, the status is - * always reset here after the packet was processed and freed. - */ - if (netlink_skb_is_mmaped(skb)) { - hdr = netlink_mmap_hdr(skb); - sk = NETLINK_CB(skb).sk; - - if (NETLINK_CB(skb).flags & NETLINK_SKB_TX) { - netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED); - ring = &nlk_sk(sk)->tx_ring; - } else { - if (!(NETLINK_CB(skb).flags & NETLINK_SKB_DELIVERED)) { - hdr->nm_len = 0; - netlink_set_status(hdr, NL_MMAP_STATUS_VALID); - } - ring = &nlk_sk(sk)->rx_ring; - } - - WARN_ON(atomic_read(&ring->pending) == 0); - atomic_dec(&ring->pending); - sock_put(sk); - - skb->head = NULL; - } -#endif if (is_vmalloc_addr(skb->head)) { if (!skb->cloned || !atomic_dec_return(&(skb_shinfo(skb)->dataref))) @@ -936,18 +334,6 @@ static void netlink_sock_destruct(struct sock *sk) } skb_queue_purge(&sk->sk_receive_queue); -#ifdef CONFIG_NETLINK_MMAP - if (1) { - struct nl_mmap_req req; - - memset(&req, 0, sizeof(req)); - if (nlk->rx_ring.pg_vec) - __netlink_set_ring(sk, &req, false, NULL, 0); - memset(&req, 0, sizeof(req)); - if (nlk->tx_ring.pg_vec) - __netlink_set_ring(sk, &req, true, NULL, 0); - } -#endif /* CONFIG_NETLINK_MMAP */ if (!sock_flag(sk, SOCK_DEAD)) { printk(KERN_ERR "Freeing alive netlink socket %p\n", sk); @@ -1201,9 +587,6 @@ static int __netlink_create(struct net *net, struct socket *sock, mutex_init(nlk->cb_mutex); } init_waitqueue_head(&nlk->wait); -#ifdef CONFIG_NETLINK_MMAP - mutex_init(&nlk->pg_vec_lock); -#endif sk->sk_destruct = netlink_sock_destruct; sk->sk_protocol = protocol; @@ -1745,8 +1128,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb, nlk = nlk_sk(sk); if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || - test_bit(NETLINK_S_CONGESTED, &nlk->state)) && - !netlink_skb_is_mmaped(skb)) { + test_bit(NETLINK_S_CONGESTED, &nlk->state))) { DECLARE_WAITQUEUE(wait, current); if (!*timeo) { if (!ssk || netlink_is_kernel(ssk)) @@ -1784,14 +1166,7 @@ static int __netlink_sendskb(struct sock *sk, struct sk_buff *skb) netlink_deliver_tap(skb); -#ifdef CONFIG_NETLINK_MMAP - if (netlink_skb_is_mmaped(skb)) - netlink_queue_mmaped_skb(sk, skb); - else if (netlink_rx_is_mmaped(sk)) - netlink_ring_set_copied(sk, skb); - else -#endif /* CONFIG_NETLINK_MMAP */ - skb_queue_tail(&sk->sk_receive_queue, skb); + skb_queue_tail(&sk->sk_receive_queue, skb); sk->sk_data_ready(sk); return len; } @@ -1815,9 +1190,6 @@ static struct sk_buff *netlink_trim(struct sk_buff *skb, gfp_t allocation) int delta; WARN_ON(skb->sk != NULL); - if (netlink_skb_is_mmaped(skb)) - return skb; - delta = skb->end - skb->tail; if (is_vmalloc_addr(skb->head) || delta * 2 < skb->truesize) return skb; @@ -1897,71 +1269,6 @@ struct sk_buff *__netlink_alloc_skb(struct sock *ssk, unsigned int size, unsigned int ldiff, u32 dst_portid, gfp_t gfp_mask) { -#ifdef CONFIG_NETLINK_MMAP - unsigned int maxlen, linear_size; - struct sock *sk = NULL; - struct sk_buff *skb; - struct netlink_ring *ring; - struct nl_mmap_hdr *hdr; - - sk = netlink_getsockbyportid(ssk, dst_portid); - if (IS_ERR(sk)) - goto out; - - ring = &nlk_sk(sk)->rx_ring; - /* fast-path without atomic ops for common case: non-mmaped receiver */ - if (ring->pg_vec == NULL) - goto out_put; - - /* We need to account the full linear size needed as a ring - * slot cannot have non-linear parts. - */ - linear_size = size + ldiff; - if (ring->frame_size - NL_MMAP_HDRLEN < linear_size) - goto out_put; - - skb = alloc_skb_head(gfp_mask); - if (skb == NULL) - goto err1; - - spin_lock_bh(&sk->sk_receive_queue.lock); - /* check again under lock */ - if (ring->pg_vec == NULL) - goto out_free; - - /* check again under lock */ - maxlen = ring->frame_size - NL_MMAP_HDRLEN; - if (maxlen < linear_size) - goto out_free; - - netlink_forward_ring(ring); - hdr = netlink_current_frame(ring, NL_MMAP_STATUS_UNUSED); - if (hdr == NULL) - goto err2; - - netlink_ring_setup_skb(skb, sk, ring, hdr); - netlink_set_status(hdr, NL_MMAP_STATUS_RESERVED); - atomic_inc(&ring->pending); - netlink_increment_head(ring); - - spin_unlock_bh(&sk->sk_receive_queue.lock); - return skb; - -err2: - kfree_skb(skb); - spin_unlock_bh(&sk->sk_receive_queue.lock); - netlink_overrun(sk); -err1: - sock_put(sk); - return NULL; - -out_free: - kfree_skb(skb); - spin_unlock_bh(&sk->sk_receive_queue.lock); -out_put: - sock_put(sk); -out: -#endif return alloc_skb(size, gfp_mask); } EXPORT_SYMBOL_GPL(__netlink_alloc_skb); @@ -2242,8 +1549,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname, if (level != SOL_NETLINK) return -ENOPROTOOPT; - if (optname != NETLINK_RX_RING && optname != NETLINK_TX_RING && - optlen >= sizeof(int) && + if (optlen >= sizeof(int) && get_user(val, (unsigned int __user *)optval)) return -EFAULT; @@ -2296,25 +1602,6 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname, } err = 0; break; -#ifdef CONFIG_NETLINK_MMAP - case NETLINK_RX_RING: - case NETLINK_TX_RING: { - struct nl_mmap_req req; - - /* Rings might consume more memory than queue limits, require - * CAP_NET_ADMIN. - */ - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - if (optlen < sizeof(req)) - return -EINVAL; - if (copy_from_user(&req, optval, sizeof(req))) - return -EFAULT; - err = netlink_set_ring(sk, &req, - optname == NETLINK_TX_RING); - break; - } -#endif /* CONFIG_NETLINK_MMAP */ case NETLINK_LISTEN_ALL_NSID: if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_BROADCAST)) return -EPERM; @@ -2484,18 +1771,6 @@ static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) smp_rmb(); } - /* It's a really convoluted way for userland to ask for mmaped - * sendmsg(), but that's what we've got... - */ - if (netlink_tx_is_mmaped(sk) && - iter_is_iovec(&msg->msg_iter) && - msg->msg_iter.nr_segs == 1 && - msg->msg_iter.iov->iov_base == NULL) { - err = netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group, - &scm); - goto out; - } - err = -EMSGSIZE; if (len > sk->sk_sndbuf - 32) goto out; @@ -2812,8 +2087,7 @@ static int netlink_dump(struct sock *sk) goto errout_skb; } - if (!netlink_rx_is_mmaped(sk) && - atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf) + if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf) goto errout_skb; /* NLMSG_GOODSIZE is small to avoid high order allocations being @@ -2902,16 +2176,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, struct netlink_sock *nlk; int ret; - /* Memory mapped dump requests need to be copied to avoid looping - * on the pending state in netlink_mmap_sendmsg() while the CB hold - * a reference to the skb. - */ - if (netlink_skb_is_mmaped(skb)) { - skb = skb_copy(skb, GFP_KERNEL); - if (skb == NULL) - return -ENOBUFS; - } else - atomic_inc(&skb->users); + atomic_inc(&skb->users); sk = netlink_lookup(sock_net(ssk), ssk->sk_protocol, NETLINK_CB(skb).portid); if (sk == NULL) { @@ -3255,7 +2520,7 @@ static const struct proto_ops netlink_ops = { .socketpair = sock_no_socketpair, .accept = sock_no_accept, .getname = netlink_getname, - .poll = netlink_poll, + .poll = datagram_poll, .ioctl = sock_no_ioctl, .listen = sock_no_listen, .shutdown = sock_no_shutdown, @@ -3263,7 +2528,7 @@ static const struct proto_ops netlink_ops = { .getsockopt = netlink_getsockopt, .sendmsg = netlink_sendmsg, .recvmsg = netlink_recvmsg, - .mmap = netlink_mmap, + .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, }; diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index df32cb92d9fc..ea4600aea6b0 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -45,12 +45,6 @@ struct netlink_sock { int (*netlink_bind)(struct net *net, int group); void (*netlink_unbind)(struct net *net, int group); struct module *module; -#ifdef CONFIG_NETLINK_MMAP - struct mutex pg_vec_lock; - struct netlink_ring rx_ring; - struct netlink_ring tx_ring; - atomic_t mapped; -#endif /* CONFIG_NETLINK_MMAP */ struct rhash_head node; struct rcu_head rcu; @@ -62,15 +56,6 @@ static inline struct netlink_sock *nlk_sk(struct sock *sk) return container_of(sk, struct netlink_sock, sk); } -static inline bool netlink_skb_is_mmaped(const struct sk_buff *skb) -{ -#ifdef CONFIG_NETLINK_MMAP - return NETLINK_CB(skb).flags & NETLINK_SKB_MMAPED; -#else - return false; -#endif /* CONFIG_NETLINK_MMAP */ -} - struct netlink_table { struct rhashtable hash; struct hlist_head mc_list; diff --git a/net/netlink/diag.c b/net/netlink/diag.c index 3ee63a3cff30..8dd836a8dd60 100644 --- a/net/netlink/diag.c +++ b/net/netlink/diag.c @@ -8,41 +8,6 @@ #include "af_netlink.h" -#ifdef CONFIG_NETLINK_MMAP -static int sk_diag_put_ring(struct netlink_ring *ring, int nl_type, - struct sk_buff *nlskb) -{ - struct netlink_diag_ring ndr; - - ndr.ndr_block_size = ring->pg_vec_pages << PAGE_SHIFT; - ndr.ndr_block_nr = ring->pg_vec_len; - ndr.ndr_frame_size = ring->frame_size; - ndr.ndr_frame_nr = ring->frame_max + 1; - - return nla_put(nlskb, nl_type, sizeof(ndr), &ndr); -} - -static int sk_diag_put_rings_cfg(struct sock *sk, struct sk_buff *nlskb) -{ - struct netlink_sock *nlk = nlk_sk(sk); - int ret; - - mutex_lock(&nlk->pg_vec_lock); - ret = sk_diag_put_ring(&nlk->rx_ring, NETLINK_DIAG_RX_RING, nlskb); - if (!ret) - ret = sk_diag_put_ring(&nlk->tx_ring, NETLINK_DIAG_TX_RING, - nlskb); - mutex_unlock(&nlk->pg_vec_lock); - - return ret; -} -#else -static int sk_diag_put_rings_cfg(struct sock *sk, struct sk_buff *nlskb) -{ - return 0; -} -#endif - static int sk_diag_dump_groups(struct sock *sk, struct sk_buff *nlskb) { struct netlink_sock *nlk = nlk_sk(sk); @@ -87,10 +52,6 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, sock_diag_put_meminfo(sk, skb, NETLINK_DIAG_MEMINFO)) goto out_nlmsg_trim; - if ((req->ndiag_show & NDIAG_SHOW_RING_CFG) && - sk_diag_put_rings_cfg(sk, skb)) - goto out_nlmsg_trim; - nlmsg_end(skb, nlh); return 0; From 51a219a1371ed26ce45acc8209d6064257d00f70 Mon Sep 17 00:00:00 2001 From: Matthias Schiffer Date: Thu, 23 Feb 2017 17:19:41 +0100 Subject: [PATCH 010/521] vxlan: correctly validate VXLAN ID against VXLAN_N_VID [ Upstream commit 4e37d6911f36545b286d15073f6f2222f840e81c ] The incorrect check caused an off-by-one error: the maximum VID 0xffffff was unusable. Fixes: d342894c5d2f ("vxlan: virtual extensible lan") Signed-off-by: Matthias Schiffer Acked-by: Jiri Benc Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/vxlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 6fa8e165878e..590750ab6564 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2600,7 +2600,7 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[]) if (data[IFLA_VXLAN_ID]) { __u32 id = nla_get_u32(data[IFLA_VXLAN_ID]); - if (id >= VXLAN_VID_MASK) + if (id >= VXLAN_N_VID) return -ERANGE; } From f1b3aae1f1bfdbec1956670aa3aa28d25f88d4b3 Mon Sep 17 00:00:00 2001 From: David Forster Date: Fri, 24 Feb 2017 14:20:32 +0000 Subject: [PATCH 011/521] vti6: return GRE_KEY for vti6 [ Upstream commit 7dcdf941cdc96692ab99fd790c8cc68945514851 ] Align vti6 with vti by returning GRE_KEY flag. This enables iproute2 to display tunnel keys on "ip -6 tunnel show" Signed-off-by: David Forster Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_vti.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index 0a8610b33d79..bdcc4d9cedd3 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -680,6 +680,10 @@ vti6_parm_to_user(struct ip6_tnl_parm2 *u, const struct __ip6_tnl_parm *p) u->link = p->link; u->i_key = p->i_key; u->o_key = p->o_key; + if (u->i_key) + u->i_flags |= GRE_KEY; + if (u->o_key) + u->o_flags |= GRE_KEY; u->proto = p->proto; memcpy(u->name, p->name, sizeof(u->name)); From 354f79125f12bcd7352704e770c0b10c4a4b424e Mon Sep 17 00:00:00 2001 From: Julian Anastasov Date: Sun, 26 Feb 2017 17:14:35 +0200 Subject: [PATCH 012/521] ipv4: mask tos for input route [ Upstream commit 6e28099d38c0e50d62c1afc054e37e573adf3d21 ] Restore the lost masking of TOS in input route code to allow ip rules to match it properly. Problem [1] noticed by Shmulik Ladkani [1] http://marc.info/?t=137331755300040&r=1&w=2 Fixes: 89aef8921bfb ("ipv4: Delete routing cache.") Signed-off-by: Julian Anastasov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/route.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index ef2f527a119b..da4d68d78590 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1958,6 +1958,7 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr, { int res; + tos &= IPTOS_RT_MASK; rcu_read_lock(); /* Multicast recognition logic is moved from route cache to here. From 2cd0afc64e333f2ef62444300418883cff0e79da Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Paul=20H=C3=BCber?= Date: Sun, 26 Feb 2017 17:58:19 +0100 Subject: [PATCH 013/521] l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 51fb60eb162ab84c5edf2ae9c63cf0b878e5547e ] l2tp_ip_backlog_recv may not return -1 if the packet gets dropped. The return value is passed up to ip_local_deliver_finish, which treats negative values as an IP protocol number for resubmission. Signed-off-by: Paul Hüber Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/l2tp/l2tp_ip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c index 445b7cd0826a..48ab93842322 100644 --- a/net/l2tp/l2tp_ip.c +++ b/net/l2tp/l2tp_ip.c @@ -383,7 +383,7 @@ static int l2tp_ip_backlog_recv(struct sock *sk, struct sk_buff *skb) drop: IP_INC_STATS(sock_net(sk), IPSTATS_MIB_INDISCARDS); kfree_skb(skb); - return -1; + return 0; } /* Userspace will call sendmsg() on the tunnel socket to send L2TP From f331d6445a3e4013428b06169acf3ae33614e69b Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Wed, 1 Mar 2017 12:57:20 +0100 Subject: [PATCH 014/521] net: don't call strlen() on the user buffer in packet_bind_spkt() [ Upstream commit 540e2894f7905538740aaf122bd8e0548e1c34a4 ] KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of uninitialized memory in packet_bind_spkt(): Acked-by: Eric Dumazet ================================================================== BUG: KMSAN: use of unitialized memory CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48 ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550 0000000000000000 0000000000000092 00000000ec400911 0000000000000002 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0x238/0x290 lib/dump_stack.c:51 [] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003 [] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424 [< inline >] strlen lib/string.c:484 [] strlcpy+0x9d/0x200 lib/string.c:144 [] packet_bind_spkt+0x144/0x230 net/packet/af_packet.c:3132 [] SYSC_bind+0x40d/0x5f0 net/socket.c:1370 [] SyS_bind+0x82/0xa0 net/socket.c:1356 [] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? chained origin: 00000000eba00911 [] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67 [< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322 [< inline >] kmsan_save_stack mm/kmsan/kmsan.c:334 [] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:527 [] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380 [] SYSC_bind+0x129/0x5f0 net/socket.c:1356 [] SyS_bind+0x82/0xa0 net/socket.c:1356 [] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? origin description: ----address@SYSC_bind (origin=00000000eb400911) ================================================================== (the line numbers are relative to 4.8-rc6, but the bug persists upstream) , when I run the following program as root: ===================================== #include #include #include #include int main() { struct sockaddr addr; memset(&addr, 0xff, sizeof(addr)); addr.sa_family = AF_PACKET; int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL)); bind(fd, &addr, sizeof(addr)); return 0; } ===================================== This happens because addr.sa_data copied from the userspace is not zero-terminated, and copying it with strlcpy() in packet_bind_spkt() results in calling strlen() on the kernel copy of that non-terminated buffer. Signed-off-by: Alexander Potapenko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index d805cd577a60..3975ac809934 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3021,7 +3021,7 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; - char name[15]; + char name[sizeof(uaddr->sa_data) + 1]; /* * Check legality @@ -3029,7 +3029,11 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr, if (addr_len != sizeof(struct sockaddr)) return -EINVAL; - strlcpy(name, uaddr->sa_data, sizeof(name)); + /* uaddr->sa_data comes from the userspace, it's not guaranteed to be + * zero-terminated. + */ + memcpy(name, uaddr->sa_data, sizeof(uaddr->sa_data)); + name[sizeof(uaddr->sa_data)] = 0; return packet_do_bind(sk, name, 0, pkt_sk(sk)->num); } From a70c328597045be2962098916c88ddd172caa054 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 1 Mar 2017 14:28:39 -0800 Subject: [PATCH 015/521] net: net_enable_timestamp() can be called from irq contexts [ Upstream commit 13baa00ad01bb3a9f893e3a08cbc2d072fc0c15d ] It is now very clear that silly TCP listeners might play with enabling/disabling timestamping while new children are added to their accept queue. Meaning net_enable_timestamp() can be called from BH context while current state of the static key is not enabled. Lets play safe and allow all contexts. The work queue is scheduled only under the problematic cases, which are the static key enable/disable transition, to not slow down critical paths. This extends and improves what we did in commit 5fa8bbda38c6 ("net: use a work queue to defer net_disable_timestamp() work") Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context") Signed-off-by: Eric Dumazet Reported-by: Dmitry Vyukov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/dev.c | 35 +++++++++++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 08215a85c742..48399d8ce614 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1677,27 +1677,54 @@ EXPORT_SYMBOL_GPL(net_dec_ingress_queue); static struct static_key netstamp_needed __read_mostly; #ifdef HAVE_JUMP_LABEL static atomic_t netstamp_needed_deferred; +static atomic_t netstamp_wanted; static void netstamp_clear(struct work_struct *work) { int deferred = atomic_xchg(&netstamp_needed_deferred, 0); + int wanted; - while (deferred--) - static_key_slow_dec(&netstamp_needed); + wanted = atomic_add_return(deferred, &netstamp_wanted); + if (wanted > 0) + static_key_enable(&netstamp_needed); + else + static_key_disable(&netstamp_needed); } static DECLARE_WORK(netstamp_work, netstamp_clear); #endif void net_enable_timestamp(void) { +#ifdef HAVE_JUMP_LABEL + int wanted; + + while (1) { + wanted = atomic_read(&netstamp_wanted); + if (wanted <= 0) + break; + if (atomic_cmpxchg(&netstamp_wanted, wanted, wanted + 1) == wanted) + return; + } + atomic_inc(&netstamp_needed_deferred); + schedule_work(&netstamp_work); +#else static_key_slow_inc(&netstamp_needed); +#endif } EXPORT_SYMBOL(net_enable_timestamp); void net_disable_timestamp(void) { #ifdef HAVE_JUMP_LABEL - /* net_disable_timestamp() can be called from non process context */ - atomic_inc(&netstamp_needed_deferred); + int wanted; + + while (1) { + wanted = atomic_read(&netstamp_wanted); + if (wanted <= 1) + break; + if (atomic_cmpxchg(&netstamp_wanted, wanted, wanted - 1) == wanted) + return; + } + atomic_dec(&netstamp_needed_deferred); schedule_work(&netstamp_work); #else static_key_slow_dec(&netstamp_needed); From 9216632bf4a0bafdc998d1c68b37b70446775900 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Wed, 1 Mar 2017 16:35:07 -0300 Subject: [PATCH 016/521] dccp: Unlock sock before calling sk_free() [ Upstream commit d5afb6f9b6bb2c57bd0c05e76e12489dc0d037d9 ] The code where sk_clone() came from created a new socket and locked it, but then, on the error path didn't unlock it. This problem stayed there for a long while, till b0691c8ee7c2 ("net: Unlock sock before calling sk_free()") fixed it, but unfortunately the callers of sk_clone() (now sk_clone_locked()) were not audited and the one in dccp_create_openreq_child() remained. Now in the age of the syskaller fuzzer, this was finally uncovered, as reported by Dmitry: ---- 8< ---- I've got the following report while running syzkaller fuzzer on 86292b33d4b7 ("Merge branch 'akpm' (patches from Andrew)") [ BUG: held lock freed! ] 4.10.0+ #234 Not tainted ------------------------- syz-executor6/6898 is freeing memory ffff88006286cac0-ffff88006286d3b7, with a lock still held there! (slock-AF_INET6){+.-...}, at: [] spin_lock include/linux/spinlock.h:299 [inline] (slock-AF_INET6){+.-...}, at: [] sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504 5 locks held by syz-executor6/6898: #0: (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock include/net/sock.h:1460 [inline] #0: (sk_lock-AF_INET6){+.+.+.}, at: [] inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:681 #1: (rcu_read_lock){......}, at: [] inet6_csk_xmit+0x12a/0x5d0 net/ipv6/inet6_connection_sock.c:126 #2: (rcu_read_lock){......}, at: [] __skb_unlink include/linux/skbuff.h:1767 [inline] #2: (rcu_read_lock){......}, at: [] __skb_dequeue include/linux/skbuff.h:1783 [inline] #2: (rcu_read_lock){......}, at: [] process_backlog+0x264/0x730 net/core/dev.c:4835 #3: (rcu_read_lock){......}, at: [] ip6_input_finish+0x0/0x1700 net/ipv6/ip6_input.c:59 #4: (slock-AF_INET6){+.-...}, at: [] spin_lock include/linux/spinlock.h:299 [inline] #4: (slock-AF_INET6){+.-...}, at: [] sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504 Fix it just like was done by b0691c8ee7c2 ("net: Unlock sock before calling sk_free()"). Reported-by: Dmitry Vyukov Cc: Cong Wang Cc: Eric Dumazet Cc: Gerrit Renker Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170301153510.GE15145@kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dccp/minisocks.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c index 1994f8af646b..e314caa39176 100644 --- a/net/dccp/minisocks.c +++ b/net/dccp/minisocks.c @@ -122,6 +122,7 @@ struct sock *dccp_create_openreq_child(const struct sock *sk, /* It is still raw copy of parent, so invalidate * destructor and make plain sk_free() */ newsk->sk_destruct = NULL; + bh_unlock_sock(newsk); sk_free(newsk); return NULL; } From 2681a7853ad73bfebc3a683765a496bb283c6648 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 3 Mar 2017 14:08:21 -0800 Subject: [PATCH 017/521] tcp: fix various issues for sockets morphing to listen state [ Upstream commit 02b2faaf0af1d85585f6d6980e286d53612acfc2 ] Dmitry Vyukov reported a divide by 0 triggered by syzkaller, exploiting tcp_disconnect() path that was never really considered and/or used before syzkaller ;) I was not able to reproduce the bug, but it seems issues here are the three possible actions that assumed they would never trigger on a listener. 1) tcp_write_timer_handler 2) tcp_delack_timer_handler 3) MTU reduction Only IPv6 MTU reduction was properly testing TCP_CLOSE and TCP_LISTEN states from tcp_v6_mtu_reduced() Signed-off-by: Eric Dumazet Reported-by: Dmitry Vyukov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_ipv4.c | 7 +++++-- net/ipv4/tcp_timer.c | 6 ++++-- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index b58a38eea059..f66d4b5d47f9 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -271,10 +271,13 @@ EXPORT_SYMBOL(tcp_v4_connect); */ void tcp_v4_mtu_reduced(struct sock *sk) { - struct dst_entry *dst; struct inet_sock *inet = inet_sk(sk); - u32 mtu = tcp_sk(sk)->mtu_info; + struct dst_entry *dst; + u32 mtu; + if ((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) + return; + mtu = tcp_sk(sk)->mtu_info; dst = inet_csk_update_pmtu(sk, mtu); if (!dst) return; diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 193ba1fa8a9a..ebb34d0c5e80 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -223,7 +223,8 @@ void tcp_delack_timer_handler(struct sock *sk) sk_mem_reclaim_partial(sk); - if (sk->sk_state == TCP_CLOSE || !(icsk->icsk_ack.pending & ICSK_ACK_TIMER)) + if (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) || + !(icsk->icsk_ack.pending & ICSK_ACK_TIMER)) goto out; if (time_after(icsk->icsk_ack.timeout, jiffies)) { @@ -504,7 +505,8 @@ void tcp_write_timer_handler(struct sock *sk) struct inet_connection_sock *icsk = inet_csk(sk); int event; - if (sk->sk_state == TCP_CLOSE || !icsk->icsk_pending) + if (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) || + !icsk->icsk_pending) goto out; if (time_after(icsk->icsk_timeout, jiffies)) { From 9e7683301beef0cef8254eecb661e7eac3146717 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 3 Mar 2017 21:01:02 -0800 Subject: [PATCH 018/521] net: fix socket refcounting in skb_complete_wifi_ack() [ Upstream commit dd4f10722aeb10f4f582948839f066bebe44e5fb ] TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt By the time TX completion happens, sk_refcnt might be already 0. sock_hold()/sock_put() would then corrupt critical state, like sk_wmem_alloc. Fixes: bf7fa551e0ce ("mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path") Signed-off-by: Eric Dumazet Cc: Alexander Duyck Cc: Johannes Berg Cc: Soheil Hassas Yeganeh Cc: Willem de Bruijn Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4968b5ddea69..370f4f86e2b5 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3735,7 +3735,7 @@ void skb_complete_wifi_ack(struct sk_buff *skb, bool acked) { struct sock *sk = skb->sk; struct sock_exterr_skb *serr; - int err; + int err = 1; skb->wifi_acked_valid = 1; skb->wifi_acked = acked; @@ -3745,14 +3745,15 @@ void skb_complete_wifi_ack(struct sk_buff *skb, bool acked) serr->ee.ee_errno = ENOMSG; serr->ee.ee_origin = SO_EE_ORIGIN_TXSTATUS; - /* take a reference to prevent skb_orphan() from freeing the socket */ - sock_hold(sk); - - err = sock_queue_err_skb(sk, skb); + /* Take a reference to prevent skb_orphan() from freeing the socket, + * but only if the socket refcount is not zero. + */ + if (likely(atomic_inc_not_zero(&sk->sk_refcnt))) { + err = sock_queue_err_skb(sk, skb); + sock_put(sk); + } if (err) kfree_skb(skb); - - sock_put(sk); } EXPORT_SYMBOL_GPL(skb_complete_wifi_ack); From ec4d8692b76e08a40221eb7c74775a390114f098 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 3 Mar 2017 21:01:03 -0800 Subject: [PATCH 019/521] net: fix socket refcounting in skb_complete_tx_timestamp() [ Upstream commit 9ac25fc063751379cb77434fef9f3b088cd3e2f7 ] TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt By the time TX completion happens, sk_refcnt might be already 0. sock_hold()/sock_put() would then corrupt critical state, like sk_wmem_alloc and lead to leaks or use after free. Fixes: 62bccb8cdb69 ("net-timestamp: Make the clone operation stand-alone from phy timestamping") Signed-off-by: Eric Dumazet Cc: Alexander Duyck Cc: Johannes Berg Cc: Soheil Hassas Yeganeh Cc: Willem de Bruijn Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 370f4f86e2b5..73dfd7729bc9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3678,13 +3678,14 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, if (!skb_may_tx_timestamp(sk, false)) return; - /* take a reference to prevent skb_orphan() from freeing the socket */ - sock_hold(sk); - - *skb_hwtstamps(skb) = *hwtstamps; - __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND); - - sock_put(sk); + /* Take a reference to prevent skb_orphan() from freeing the socket, + * but only if the socket refcount is not zero. + */ + if (likely(atomic_inc_not_zero(&sk->sk_refcnt))) { + *skb_hwtstamps(skb) = *hwtstamps; + __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND); + sock_put(sk); + } } EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); From d0ebde92fbeb98eedbfce15cef3c86b652846d25 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sun, 5 Mar 2017 10:52:16 -0800 Subject: [PATCH 020/521] dccp: fix use-after-free in dccp_feat_activate_values [ Upstream commit 62f8f4d9066c1c6f2474845d1ca7e2891f2ae3fd ] Dmitry reported crashes in DCCP stack [1] Problem here is that when I got rid of listener spinlock, I missed the fact that DCCP stores a complex state in struct dccp_request_sock, while TCP does not. Since multiple cpus could access it at the same time, we need to add protection. [1] BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541 at addr ffff88003713be68 Read of size 8 by task syz-executor2/8457 CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x292/0x398 lib/dump_stack.c:51 kasan_object_err+0x1c/0x70 mm/kasan/report.c:162 print_address_description mm/kasan/report.c:200 [inline] kasan_report_error mm/kasan/report.c:289 [inline] kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311 kasan_report mm/kasan/report.c:332 [inline] __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332 dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541 dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121 dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457 dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186 dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279 NF_HOOK include/linux/netfilter.h:257 [inline] ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322 dst_input include/net/dst.h:507 [inline] ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:257 [inline] ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228 process_backlog+0xe5/0x6c0 net/core/dev.c:4839 napi_poll net/core/dev.c:5202 [inline] net_rx_action+0xe70/0x1900 net/core/dev.c:5267 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902 do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328 do_softirq kernel/softirq.c:176 [inline] __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181 local_bh_enable include/linux/bottom_half.h:31 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline] ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123 ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162 ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501 inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179 dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141 dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280 dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362 dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 [inline] sock_sendmsg+0xca/0x110 net/socket.c:645 SYSC_sendto+0x660/0x810 net/socket.c:1687 SyS_sendto+0x40/0x50 net/socket.c:1655 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x4458b9 RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9 RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017 RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020 R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8 R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700 Object at ffff88003713be50, in cache kmalloc-64 size: 64 Allocated: PID = 8446 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605 kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738 kmalloc include/linux/slab.h:490 [inline] dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467 dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487 __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741 dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949 dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012 dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423 dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217 dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377 dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606 dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632 sk_backlog_rcv include/net/sock.h:893 [inline] __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479 dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279 NF_HOOK include/linux/netfilter.h:257 [inline] ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322 dst_input include/net/dst.h:507 [inline] ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:257 [inline] ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228 process_backlog+0xe5/0x6c0 net/core/dev.c:4839 napi_poll net/core/dev.c:5202 [inline] net_rx_action+0xe70/0x1900 net/core/dev.c:5267 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 Freed: PID = 15 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 [inline] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1355 [inline] slab_free_freelist_hook mm/slub.c:1377 [inline] slab_free mm/slub.c:2954 [inline] kfree+0xe8/0x2b0 mm/slub.c:3874 dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418 dccp_feat_entry_destructor net/dccp/feat.c:416 [inline] dccp_feat_list_pop net/dccp/feat.c:541 [inline] dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543 dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121 dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457 dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186 dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279 NF_HOOK include/linux/netfilter.h:257 [inline] ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322 dst_input include/net/dst.h:507 [inline] ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:257 [inline] ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228 process_backlog+0xe5/0x6c0 net/core/dev.c:4839 napi_poll net/core/dev.c:5202 [inline] net_rx_action+0xe70/0x1900 net/core/dev.c:5267 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 Memory state around the buggy address: ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb ^ Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet Reported-by: Dmitry Vyukov Tested-by: Dmitry Vyukov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/dccp.h | 1 + net/dccp/minisocks.c | 24 ++++++++++++++++-------- 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/include/linux/dccp.h b/include/linux/dccp.h index 61d042bbbf60..68449293c4b6 100644 --- a/include/linux/dccp.h +++ b/include/linux/dccp.h @@ -163,6 +163,7 @@ struct dccp_request_sock { __u64 dreq_isr; __u64 dreq_gsr; __be32 dreq_service; + spinlock_t dreq_lock; struct list_head dreq_featneg; __u32 dreq_timestamp_echo; __u32 dreq_timestamp_time; diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c index e314caa39176..68eed344b471 100644 --- a/net/dccp/minisocks.c +++ b/net/dccp/minisocks.c @@ -146,6 +146,13 @@ struct sock *dccp_check_req(struct sock *sk, struct sk_buff *skb, struct dccp_request_sock *dreq = dccp_rsk(req); bool own_req; + /* TCP/DCCP listeners became lockless. + * DCCP stores complex state in its request_sock, so we need + * a protection for them, now this code runs without being protected + * by the parent (listener) lock. + */ + spin_lock_bh(&dreq->dreq_lock); + /* Check for retransmitted REQUEST */ if (dccp_hdr(skb)->dccph_type == DCCP_PKT_REQUEST) { @@ -160,7 +167,7 @@ struct sock *dccp_check_req(struct sock *sk, struct sk_buff *skb, inet_rtx_syn_ack(sk, req); } /* Network Duplicate, discard packet */ - return NULL; + goto out; } DCCP_SKB_CB(skb)->dccpd_reset_code = DCCP_RESET_CODE_PACKET_ERROR; @@ -186,20 +193,20 @@ struct sock *dccp_check_req(struct sock *sk, struct sk_buff *skb, child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL, req, &own_req); - if (!child) - goto listen_overflow; + if (child) { + child = inet_csk_complete_hashdance(sk, child, req, own_req); + goto out; + } - return inet_csk_complete_hashdance(sk, child, req, own_req); - -listen_overflow: - dccp_pr_debug("listen_overflow!\n"); DCCP_SKB_CB(skb)->dccpd_reset_code = DCCP_RESET_CODE_TOO_BUSY; drop: if (dccp_hdr(skb)->dccph_type != DCCP_PKT_RESET) req->rsk_ops->send_reset(sk, skb); inet_csk_reqsk_queue_drop(sk, req); - return NULL; +out: + spin_unlock_bh(&dreq->dreq_lock); + return child; } EXPORT_SYMBOL_GPL(dccp_check_req); @@ -250,6 +257,7 @@ int dccp_reqsk_init(struct request_sock *req, { struct dccp_request_sock *dreq = dccp_rsk(req); + spin_lock_init(&dreq->dreq_lock); inet_rsk(req)->ir_rmt_port = dccp_hdr(skb)->dccph_sport; inet_rsk(req)->ir_num = ntohs(dccp_hdr(skb)->dccph_dport); inet_rsk(req)->acked = 0; From e671f1cc588f380b17e1c0ce38c7c712d13dfe93 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Mon, 6 Mar 2017 08:53:04 -0800 Subject: [PATCH 021/521] vrf: Fix use-after-free in vrf_xmit [ Upstream commit f7887d40e541f74402df0684a1463c0a0bb68c68 ] KASAN detected a use-after-free: [ 269.467067] BUG: KASAN: use-after-free in vrf_xmit+0x7f1/0x827 [vrf] at addr ffff8800350a21c0 [ 269.467067] Read of size 4 by task ssh/1879 [ 269.467067] CPU: 1 PID: 1879 Comm: ssh Not tainted 4.10.0+ #249 [ 269.467067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 269.467067] Call Trace: [ 269.467067] dump_stack+0x81/0xb6 [ 269.467067] kasan_object_err+0x21/0x78 [ 269.467067] kasan_report+0x2f7/0x450 [ 269.467067] ? vrf_xmit+0x7f1/0x827 [vrf] [ 269.467067] ? ip_output+0xa4/0xdb [ 269.467067] __asan_load4+0x6b/0x6d [ 269.467067] vrf_xmit+0x7f1/0x827 [vrf] ... Which corresponds to the skb access after xmit handling. Fix by saving skb->len and using the saved value to update stats. Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver") Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/vrf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index d6b619667f1a..349aecbc210a 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -345,6 +345,7 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev) static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev) { + int len = skb->len; netdev_tx_t ret = is_ip_tx_frame(skb, dev); if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) { @@ -352,7 +353,7 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev) u64_stats_update_begin(&dstats->syncp); dstats->tx_pkts++; - dstats->tx_bytes += skb->len; + dstats->tx_bytes += len; u64_stats_update_end(&dstats->syncp); } else { this_cpu_inc(dev->dstats->tx_drps); From 6c72458ab428ce659261fa060295e580503a5b12 Mon Sep 17 00:00:00 2001 From: "Dmitry V. Levin" Date: Tue, 7 Mar 2017 23:50:50 +0300 Subject: [PATCH 022/521] uapi: fix linux/packet_diag.h userspace compilation error [ Upstream commit 745cb7f8a5de0805cade3de3991b7a95317c7c73 ] Replace MAX_ADDR_LEN with its numeric value to fix the following linux/packet_diag.h userspace compilation error: /usr/include/linux/packet_diag.h:67:17: error: 'MAX_ADDR_LEN' undeclared here (not in a function) __u8 pdmc_addr[MAX_ADDR_LEN]; This is not the first case in the UAPI where the numeric value of MAX_ADDR_LEN is used instead of symbolic one, uapi/linux/if_link.h already does the same: $ grep MAX_ADDR_LEN include/uapi/linux/if_link.h __u8 mac[32]; /* MAX_ADDR_LEN */ There are no UAPI headers besides these two that use MAX_ADDR_LEN. Signed-off-by: Dmitry V. Levin Acked-by: Pavel Emelyanov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/packet_diag.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/packet_diag.h b/include/uapi/linux/packet_diag.h index d08c63f3dd6f..0c5d5dd61b6a 100644 --- a/include/uapi/linux/packet_diag.h +++ b/include/uapi/linux/packet_diag.h @@ -64,7 +64,7 @@ struct packet_diag_mclist { __u32 pdmc_count; __u16 pdmc_type; __u16 pdmc_alen; - __u8 pdmc_addr[MAX_ADDR_LEN]; + __u8 pdmc_addr[32]; /* MAX_ADDR_LEN */ }; struct packet_diag_ring { From 710fbeb3f5c5441fbe002b2c1566ceaad0725c01 Mon Sep 17 00:00:00 2001 From: Etienne Noss Date: Fri, 10 Mar 2017 16:55:32 +0100 Subject: [PATCH 023/521] act_connmark: avoid crashing on malformed nlattrs with null parms MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 52491c7607c5527138095edf44c53169dc1ddb82 ] tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS is set, resulting in a null pointer dereference when trying to access it. [501099.043007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [501099.043039] IP: [] tcf_connmark_init+0x8b/0x180 [act_connmark] ... [501099.044334] Call Trace: [501099.044345] [] ? tcf_action_init_1+0x198/0x1b0 [501099.044363] [] ? tcf_action_init+0xb0/0x120 [501099.044380] [] ? tcf_exts_validate+0xc4/0x110 [501099.044398] [] ? u32_set_parms+0xa7/0x270 [cls_u32] [501099.044417] [] ? u32_change+0x680/0x87b [cls_u32] [501099.044436] [] ? tc_ctl_tfilter+0x4dd/0x8a0 [501099.044454] [] ? security_capable+0x41/0x60 [501099.044471] [] ? rtnetlink_rcv_msg+0xe1/0x220 [501099.044490] [] ? rtnl_newlink+0x870/0x870 [501099.044507] [] ? netlink_rcv_skb+0xa1/0xc0 [501099.044524] [] ? rtnetlink_rcv+0x24/0x30 [501099.044541] [] ? netlink_unicast+0x184/0x230 [501099.044558] [] ? netlink_sendmsg+0x2f8/0x3b0 [501099.044576] [] ? sock_sendmsg+0x30/0x40 [501099.044592] [] ? SYSC_sendto+0xd3/0x150 [501099.044608] [] ? __do_page_fault+0x2d1/0x510 [501099.044626] [] ? system_call_fast_compare_end+0xc/0x9b Fixes: 22a5dc0e5e3e ("net: sched: Introduce connmark action") Signed-off-by: Étienne Noss Signed-off-by: Victorien Molle Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/act_connmark.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c index bb41699c6c49..7ecb14f3db54 100644 --- a/net/sched/act_connmark.c +++ b/net/sched/act_connmark.c @@ -109,6 +109,9 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla, if (ret < 0) return ret; + if (!tb[TCA_CONNMARK_PARMS]) + return -EINVAL; + parm = nla_data(tb[TCA_CONNMARK_PARMS]); if (!tcf_hash_check(parm->index, a, bind)) { From b57955ea30e13aa37e5955bf20617f839f32c560 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Fri, 10 Mar 2017 09:46:15 -0800 Subject: [PATCH 024/521] mpls: Send route delete notifications when router module is unloaded [ Upstream commit e37791ec1ad785b59022ae211f63a16189bacebf ] When the mpls_router module is unloaded, mpls routes are deleted but notifications are not sent to userspace leaving userspace caches out of sync. Add the call to mpls_notify_route in mpls_net_exit as routes are freed. Fixes: 0189197f44160 ("mpls: Basic routing support") Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/mpls/af_mpls.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 881bc2072809..52cfc4478511 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -1567,6 +1567,7 @@ static void mpls_net_exit(struct net *net) for (index = 0; index < platform_labels; index++) { struct mpls_route *rt = rtnl_dereference(platform_label[index]); RCU_INIT_POINTER(platform_label[index], NULL); + mpls_notify_route(net, index, rt, NULL, NULL); mpls_rt_free(rt); } rtnl_unlock(); From 5f8bc3856e285cc12597879039c17f7397f4b37d Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Mon, 13 Mar 2017 13:28:09 +0100 Subject: [PATCH 025/521] ipv6: make ECMP route replacement less greedy [ Upstream commit 67e194007be08d071294456274dd53e0a04fdf90 ] Commit 27596472473a ("ipv6: fix ECMP route replacement") introduced a loop that removes all siblings of an ECMP route that is being replaced. However, this loop doesn't stop when it has replaced siblings, and keeps removing other routes with a higher metric. We also end up triggering the WARN_ON after the loop, because after this nsiblings < 0. Instead, stop the loop when we have taken care of all routes with the same metric as the route being replaced. Reproducer: =========== #!/bin/sh ip netns add ns1 ip netns add ns2 ip -net ns1 link set lo up for x in 0 1 2 ; do ip link add veth$x netns ns2 type veth peer name eth$x netns ns1 ip -net ns1 link set eth$x up ip -net ns2 link set veth$x up done ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \ nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2 ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256 ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048 echo "before replace, 3 routes" ip -net ns1 -6 r | grep -v '^fe80\|^ff00' echo ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \ nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2 echo "after replace, only 2 routes, metric 2048 is gone" ip -net ns1 -6 r | grep -v '^fe80\|^ff00' Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Sabrina Dubroca Acked-by: Nicolas Dichtel Reviewed-by: Xin Long Reviewed-by: Michal Kubecek Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_fib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 34cf46d74554..85bf86458706 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -903,6 +903,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt, ins = &rt->dst.rt6_next; iter = *ins; while (iter) { + if (iter->rt6i_metric > rt->rt6i_metric) + break; if (rt6_qualify_for_ecmp(iter)) { *ins = iter->dst.rt6_next; fib6_purge_rt(iter, fn, info->nl_net); From aed728c38c483650885dfd975dd9f4903e5505bf Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 13 Mar 2017 16:24:28 +0100 Subject: [PATCH 026/521] ipv6: avoid write to a possibly cloned skb [ Upstream commit 79e49503efe53a8c51d8b695bedc8a346c5e4a87 ] ip6_fragment, in case skb has a fraglist, checks if the skb is cloned. If it is, it will move to the 'slow path' and allocates new skbs for each fragment. However, right before entering the slowpath loop, it updates the nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT, to account for the fragment header that will be inserted in the new ipv6-fragment skbs. In case original skb is cloned this munges nexthdr value of another skb. Avoid this by doing the nexthdr update for each of the new fragment skbs separately. This was observed with tcpdump on a bridge device where netfilter ipv6 reassembly is active: tcpdump shows malformed fragment headers as the l4 header (icmpv6, tcp, etc). is decoded as a fragment header. Cc: Hannes Frederic Sowa Reported-by: Andreas Karis Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_output.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 58900c21e4e4..8004532fa882 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -742,13 +742,14 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, * Fragment the datagram. */ - *prevhdr = NEXTHDR_FRAGMENT; troom = rt->dst.dev->needed_tailroom; /* * Keep copying data until we run out. */ while (left > 0) { + u8 *fragnexthdr_offset; + len = left; /* IF: it doesn't fit, use 'mtu' - the data space left */ if (len > mtu) @@ -793,6 +794,10 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, */ skb_copy_from_linear_data(skb, skb_network_header(frag), hlen); + fragnexthdr_offset = skb_network_header(frag); + fragnexthdr_offset += prevhdr - skb_network_header(skb); + *fragnexthdr_offset = NEXTHDR_FRAGMENT; + /* * Build fragment header. */ From 56f9b9502f2d15b9c7b83f9cfb32798e2e364f61 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 13 Mar 2017 17:38:17 +0100 Subject: [PATCH 027/521] bridge: drop netfilter fake rtable unconditionally [ Upstream commit a13b2082ece95247779b9995c4e91b4246bed023 ] Andreas reports kernel oops during rmmod of the br_netfilter module. Hannes debugged the oops down to a NULL rt6info->rt6i_indev. Problem is that br_netfilter has the nasty concept of adding a fake rtable to skb->dst; this happens in a br_netfilter prerouting hook. A second hook (in bridge LOCAL_IN) is supposed to remove these again before the skb is handed up the stack. However, on module unload hooks get unregistered which means an skb could traverse the prerouting hook that attaches the fake_rtable, while the 'fake rtable remove' hook gets removed from the hooklist immediately after. Fixes: 34666d467cbf1e2e3c7 ("netfilter: bridge: move br_netfilter out of the core") Reported-by: Andreas Karis Debugged-by: Hannes Frederic Sowa Signed-off-by: Florian Westphal Acked-by: Pablo Neira Ayuso Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/bridge/br_input.c | 1 + net/bridge/br_netfilter_hooks.c | 21 --------------------- 2 files changed, 1 insertion(+), 21 deletions(-) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index f7fba74108a9..e24754a0e052 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -29,6 +29,7 @@ EXPORT_SYMBOL(br_should_route_hook); static int br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb) { + br_drop_fake_rtable(skb); return netif_receive_skb(skb); } diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 7ddbe7ec81d6..97fc19f001bf 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -516,21 +516,6 @@ static unsigned int br_nf_pre_routing(void *priv, } -/* PF_BRIDGE/LOCAL_IN ************************************************/ -/* The packet is locally destined, which requires a real - * dst_entry, so detach the fake one. On the way up, the - * packet would pass through PRE_ROUTING again (which already - * took place when the packet entered the bridge), but we - * register an IPv4 PRE_ROUTING 'sabotage' hook that will - * prevent this from happening. */ -static unsigned int br_nf_local_in(void *priv, - struct sk_buff *skb, - const struct nf_hook_state *state) -{ - br_drop_fake_rtable(skb); - return NF_ACCEPT; -} - /* PF_BRIDGE/FORWARD *************************************************/ static int br_nf_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { @@ -900,12 +885,6 @@ static struct nf_hook_ops br_nf_ops[] __read_mostly = { .hooknum = NF_BR_PRE_ROUTING, .priority = NF_BR_PRI_BRNF, }, - { - .hook = br_nf_local_in, - .pf = NFPROTO_BRIDGE, - .hooknum = NF_BR_LOCAL_IN, - .priority = NF_BR_PRI_BRNF, - }, { .hook = br_nf_forward_ip, .pf = NFPROTO_BRIDGE, From 4ab956b561334866dfe1b17d9c7567313e07cfa2 Mon Sep 17 00:00:00 2001 From: Jon Maxwell Date: Fri, 10 Mar 2017 16:40:33 +1100 Subject: [PATCH 028/521] dccp/tcp: fix routing redirect race MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver Cc: Hannes Sowa Signed-off-by: Jon Maxwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dccp/ipv4.c | 3 ++- net/dccp/ipv6.c | 8 +++++--- net/ipv4/tcp_ipv4.c | 3 ++- net/ipv6/tcp_ipv6.c | 8 +++++--- 4 files changed, 14 insertions(+), 8 deletions(-) diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 0759f5b9180e..6467bf392e1b 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -289,7 +289,8 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info) switch (type) { case ICMP_REDIRECT: - dccp_do_redirect(skb, sk); + if (!sock_owned_by_user(sk)) + dccp_do_redirect(skb, sk); goto out; case ICMP_SOURCE_QUENCH: /* Just silently ignore these. */ diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 27c4e81efa24..8113ad58fcb4 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -122,10 +122,12 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, np = inet6_sk(sk); if (type == NDISC_REDIRECT) { - struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie); + if (!sock_owned_by_user(sk)) { + struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie); - if (dst) - dst->ops->redirect(dst, sk, skb); + if (dst) + dst->ops->redirect(dst, sk, skb); + } goto out; } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f66d4b5d47f9..198fc2314c82 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -423,7 +423,8 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info) switch (type) { case ICMP_REDIRECT: - do_redirect(icmp_skb, sk); + if (!sock_owned_by_user(sk)) + do_redirect(icmp_skb, sk); goto out; case ICMP_SOURCE_QUENCH: /* Just silently ignore these. */ diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 76a8c8057a23..1a63c4deef26 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -376,10 +376,12 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, np = inet6_sk(sk); if (type == NDISC_REDIRECT) { - struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie); + if (!sock_owned_by_user(sk)) { + struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie); - if (dst) - dst->ops->redirect(dst, sk, skb); + if (dst) + dst->ops->redirect(dst, sk, skb); + } goto out; } From 676fe978525d3d3f583e1f6463f3b25623e81afd Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Mon, 13 Mar 2017 00:01:30 +0100 Subject: [PATCH 029/521] dccp: fix memory leak during tear-down of unsuccessful connection request [ Upstream commit 72ef9c4125c7b257e3a714d62d778ab46583d6a3 ] This patch fixes a memory leak, which happens if the connection request is not fulfilled between parsing the DCCP options and handling the SYN (because e.g. the backlog is full), because we forgot to free the list of ack vectors. Reported-by: Jianwen Ji Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dccp/ccids/ccid2.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c index f053198e730c..5e3a7302f774 100644 --- a/net/dccp/ccids/ccid2.c +++ b/net/dccp/ccids/ccid2.c @@ -749,6 +749,7 @@ static void ccid2_hc_tx_exit(struct sock *sk) for (i = 0; i < hc->tx_seqbufc; i++) kfree(hc->tx_seqbuf[i]); hc->tx_seqbufc = 0; + dccp_ackvec_parsed_cleanup(&hc->tx_av_chunks); } static void ccid2_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) From c10ffe988f15a0306d5d8cb1c6b475c9fe2fc2c9 Mon Sep 17 00:00:00 2001 From: Roman Mashak Date: Fri, 24 Feb 2017 11:00:32 -0500 Subject: [PATCH 030/521] net sched actions: decrement module reference count after table flush. [ Upstream commit edb9d1bff4bbe19b8ae0e71b1f38732591a9eeb2 ] When tc actions are loaded as a module and no actions have been installed, flushing them would result in actions removed from the memory, but modules reference count not being decremented, so that the modules would not be unloaded. Following is example with GACT action: % sudo modprobe act_gact % lsmod Module Size Used by act_gact 16384 0 % % sudo tc actions ls action gact % % sudo tc actions flush action gact % lsmod Module Size Used by act_gact 16384 1 % sudo tc actions flush action gact % lsmod Module Size Used by act_gact 16384 2 % sudo rmmod act_gact rmmod: ERROR: Module act_gact is in use .... After the fix: % lsmod Module Size Used by act_gact 16384 0 % % sudo tc actions add action pass index 1 % sudo tc actions add action pass index 2 % sudo tc actions add action pass index 3 % lsmod Module Size Used by act_gact 16384 3 % % sudo tc actions flush action gact % lsmod Module Size Used by act_gact 16384 0 % % sudo tc actions flush action gact % lsmod Module Size Used by act_gact 16384 0 % sudo rmmod act_gact % lsmod Module Size Used by % Fixes: f97017cdefef ("net-sched: Fix actions flushing") Signed-off-by: Roman Mashak Signed-off-by: Jamal Hadi Salim Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/act_api.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 06e7c4a37245..694a06f1e0d5 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -820,10 +820,8 @@ static int tca_action_flush(struct net *net, struct nlattr *nla, goto out_module_put; err = a.ops->walk(skb, &dcb, RTM_DELACTION, &a); - if (err < 0) + if (err <= 0) goto out_module_put; - if (err == 0) - goto noflush_out; nla_nest_end(skb, nest); @@ -840,7 +838,6 @@ static int tca_action_flush(struct net *net, struct nlattr *nla, out_module_put: module_put(a.ops->owner); err_out: -noflush_out: kfree_skb(skb); return err; } From fd74e8d258da9f9678da6bf88a0b02b2c1b71d0c Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 19 Dec 2016 14:20:13 -0800 Subject: [PATCH 031/521] fscrypt: fix renaming and linking special files commit 42d97eb0ade31e1bc537d086842f5d6e766d9d51 upstream. Attempting to link a device node, named pipe, or socket file into an encrypted directory through rename(2) or link(2) always failed with EPERM. This happened because fscrypt_has_permitted_context() saw that the file was unencrypted and forbid creating the link. This behavior was unexpected because such files are never encrypted; only regular files, directories, and symlinks can be encrypted. To fix this, make fscrypt_has_permitted_context() always return true on special files. This will be covered by a test in my encryption xfstests patchset. Fixes: 9bd8212f981e ("ext4 crypto: add encryption policy and password salt support") Signed-off-by: Eric Biggers Reviewed-by: Richard Weinberger Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- fs/ext4/crypto_policy.c | 6 ++++++ fs/f2fs/crypto_policy.c | 5 +++++ 2 files changed, 11 insertions(+) diff --git a/fs/ext4/crypto_policy.c b/fs/ext4/crypto_policy.c index 8a9feb341f31..dd561f916f0b 100644 --- a/fs/ext4/crypto_policy.c +++ b/fs/ext4/crypto_policy.c @@ -156,6 +156,12 @@ int ext4_is_child_context_consistent_with_parent(struct inode *parent, WARN_ON(1); /* Should never happen */ return 0; } + + /* No restrictions on file types which are never encrypted */ + if (!S_ISREG(child->i_mode) && !S_ISDIR(child->i_mode) && + !S_ISLNK(child->i_mode)) + return 1; + /* no restrictions if the parent directory is not encrypted */ if (!ext4_encrypted_inode(parent)) return 1; diff --git a/fs/f2fs/crypto_policy.c b/fs/f2fs/crypto_policy.c index e504f548b64e..5bbd1989d5e6 100644 --- a/fs/f2fs/crypto_policy.c +++ b/fs/f2fs/crypto_policy.c @@ -149,6 +149,11 @@ int f2fs_is_child_context_consistent_with_parent(struct inode *parent, BUG_ON(1); } + /* No restrictions on file types which are never encrypted */ + if (!S_ISREG(child->i_mode) && !S_ISDIR(child->i_mode) && + !S_ISLNK(child->i_mode)) + return 1; + /* no restrictions if the parent directory is not encrypted */ if (!f2fs_encrypted_inode(parent)) return 1; From 3a19419c50c6ee386ca6d22a23acc2df51583d3d Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Sat, 15 Oct 2016 09:48:50 -0400 Subject: [PATCH 032/521] fscrypto: lock inode while setting encryption policy commit 8906a8223ad4909b391c5628f7991ebceda30e52 upstream. i_rwsem needs to be acquired while setting an encryption policy so that concurrent calls to FS_IOC_SET_ENCRYPTION_POLICY are correctly serialized (especially the ->get_context() + ->set_context() pair), and so that new files cannot be created in the directory during or after the ->empty_dir() check. Signed-off-by: Eric Biggers Signed-off-by: Theodore Ts'o Reviewed-by: Richard Weinberger Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/ioctl.c | 4 ++++ fs/f2fs/file.c | 9 ++++++++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 1fb12f9c97a6..789e2d6724a9 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -633,8 +633,12 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (err) goto encryption_policy_out; + mutex_lock(&inode->i_mutex); + err = ext4_process_policy(&policy, inode); + mutex_unlock(&inode->i_mutex); + mnt_drop_write_file(filp); encryption_policy_out: return err; diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index a197215ad52b..4b449d263333 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1535,12 +1535,19 @@ static int f2fs_ioc_set_encryption_policy(struct file *filp, unsigned long arg) #ifdef CONFIG_F2FS_FS_ENCRYPTION struct f2fs_encryption_policy policy; struct inode *inode = file_inode(filp); + int err; if (copy_from_user(&policy, (struct f2fs_encryption_policy __user *)arg, sizeof(policy))) return -EFAULT; - return f2fs_process_policy(&policy, inode); + mutex_lock(&inode->i_mutex); + + err = f2fs_process_policy(&policy, inode); + + mutex_unlock(&inode->i_mutex); + + return err; #else return -EOPNOTSUPP; #endif From 8e0ec20539f8c626463ae43fcaeb218e3b2b5dc4 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Mon, 13 Mar 2017 19:33:37 +0300 Subject: [PATCH 033/521] x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y commit be3606ff739d1c1be36389f8737c577ad87e1f57 upstream. The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y options selected. With branch profiling enabled we end up calling ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is built with KASAN instrumentation, so calling it before kasan has been initialized leads to crash. Use DISABLE_BRANCH_PROFILING define to make sure that we don't call ftrace_likely_update() from early code before kasan_early_init(). Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support") Reported-by: Fengguang Wu Signed-off-by: Andrey Ryabinin Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko Cc: Andrew Morton Cc: lkp@01.org Cc: Dmitry Vyukov Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head64.c | 1 + arch/x86/mm/kasan_init_64.c | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index f129a9af6357..b6b0077da1af 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -4,6 +4,7 @@ * Copyright (C) 2000 Andrea Arcangeli SuSE */ +#define DISABLE_BRANCH_PROFILING #include #include #include diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index d470cf219a2d..4e5ac46adc9d 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -1,3 +1,4 @@ +#define DISABLE_BRANCH_PROFILING #define pr_fmt(fmt) "kasan: " fmt #include #include From 62f57041fbdf15db6336542384a4b36f1f387299 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 16 Mar 2017 12:59:39 -0700 Subject: [PATCH 034/521] x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm commit 5dc855d44c2ad960a86f593c60461f1ae1566b6d upstream. If one thread mmaps a perf event while another thread in the same mm is in some context where active_mm != mm (which can happen in the scheduler, for example), refresh_pce() would write the wrong value to CR4.PCE. This broke some PAPI tests. Reported-and-tested-by: Vince Weaver Signed-off-by: Andy Lutomirski Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: stable@vger.kernel.org Fixes: 7911d3f7af14 ("perf/x86: Only allow rdpmc if a perf_event is mapped") Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/perf_event.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 1a8256dd6729..5b2f2306fbcc 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -1996,8 +1996,8 @@ static int x86_pmu_event_init(struct perf_event *event) static void refresh_pce(void *ignored) { - if (current->mm) - load_mm_cr4(current->mm); + if (current->active_mm) + load_mm_cr4(current->active_mm); } static void x86_pmu_event_mapped(struct perf_event *event) From 44854c191e2cb62d369eb9927e6b6683c11d6b04 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Sat, 4 Mar 2017 10:27:18 +0100 Subject: [PATCH 035/521] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream. While working on the futex code, I stumbled over this potential use-after-free scenario. Dmitry triggered it later with syzkaller. pi_mutex is a pointer into pi_state, which we drop the reference on in unqueue_me_pi(). So any access to that pointer after that is bad. Since other sites already do rt_mutex_unlock() with hb->lock held, see for example futex_lock_pi(), simply move the unlock before unqueue_me_pi(). Reported-by: Dmitry Vyukov Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Darren Hart Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 9d251dc3ec40..45170163a0b3 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2690,7 +2690,6 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, { struct hrtimer_sleeper timeout, *to = NULL; struct rt_mutex_waiter rt_waiter; - struct rt_mutex *pi_mutex = NULL; struct futex_hash_bucket *hb; union futex_key key2 = FUTEX_KEY_INIT; struct futex_q q = futex_q_init; @@ -2782,6 +2781,8 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, spin_unlock(q.lock_ptr); } } else { + struct rt_mutex *pi_mutex; + /* * We have been woken up by futex_unlock_pi(), a timeout, or a * signal. futex_unlock_pi() will not destroy the lock_ptr nor @@ -2805,18 +2806,19 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, if (res) ret = (res < 0) ? res : 0; + /* + * If fixup_pi_state_owner() faulted and was unable to handle + * the fault, unlock the rt_mutex and return the fault to + * userspace. + */ + if (ret && rt_mutex_owner(pi_mutex) == current) + rt_mutex_unlock(pi_mutex); + /* Unqueue and drop the lock. */ unqueue_me_pi(&q); } - /* - * If fixup_pi_state_owner() faulted and was unable to handle the - * fault, unlock the rt_mutex and return the fault to userspace. - */ - if (ret == -EFAULT) { - if (pi_mutex && rt_mutex_owner(pi_mutex) == current) - rt_mutex_unlock(pi_mutex); - } else if (ret == -EINTR) { + if (ret == -EINTR) { /* * We've already been requeued, but cannot restart by calling * futex_lock_pi() directly. We could restart this syscall, but From 99d403faba47e5adeb11dbf1094972fc78c29a75 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Sat, 4 Mar 2017 10:27:19 +0100 Subject: [PATCH 036/521] futex: Add missing error handling to FUTEX_REQUEUE_PI commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream. Thomas spotted that fixup_pi_state_owner() can return errors and we fail to unlock the rt_mutex in that case. Reported-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Darren Hart Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/futex.c b/kernel/futex.c index 45170163a0b3..3057dabf726f 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2773,6 +2773,8 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, if (q.pi_state && (q.pi_state->owner != current)) { spin_lock(q.lock_ptr); ret = fixup_pi_state_owner(uaddr2, &q, current); + if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current) + rt_mutex_unlock(&q.pi_state->pi_mutex); /* * Drop the reference to the pi state which * the requeue_pi() code acquired for us. From 0136bca4e0f65075b0b4716a270f8b04c6c46abc Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 22 Mar 2017 12:17:51 +0100 Subject: [PATCH 037/521] Linux 4.4.56 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index d9cc21df444d..cf9303a5d621 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 55 +SUBLEVEL = 56 EXTRAVERSION = NAME = Blurry Fish Butt From f03ec877840b5fd07db4fb4f76a958de0f38e936 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Wed, 4 May 2016 18:49:55 +0530 Subject: [PATCH 038/521] PM / OPP: Remove useless check Regulators are optional for devices using OPPs and the OPP core shouldn't be printing any errors for such missing regulators. It was fine before the commit 0c717d0f9cb4, but that failed to update this part of the code to remove an 'always true' check and an extra unwanted print message. Fix that now. Fixes: 0c717d0f9cb4 (PM / OPP: Initialize regulator pointer to an error value) Reported-by: Marc Gonzalez Signed-off-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki (cherry picked from commit 21f8a99ce61b2d4b74bd425a5bf7e9efbe162788) Signed-off-by: Alex Shi --- drivers/base/power/opp/core.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c index 433b60092972..d8f4cc22856c 100644 --- a/drivers/base/power/opp/core.c +++ b/drivers/base/power/opp/core.c @@ -259,9 +259,6 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct device *dev) reg = opp_table->regulator; if (IS_ERR(reg)) { /* Regulator may not be required for device */ - if (reg) - dev_err(dev, "%s: Invalid regulator (%ld)\n", __func__, - PTR_ERR(reg)); rcu_read_unlock(); return 0; } From d6b9c33cfd0fbc5933ca540fb8c0d77c5fcb84f7 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 13 Jun 2016 11:15:14 +0100 Subject: [PATCH 039/521] arm64: fix dump_instr when PAN and UAO are in use If the kernel is set to show unhandled signals, and a user task does not handle a SIGILL as a result of an instruction abort, we will attempt to log the offending instruction with dump_instr before killing the task. We use dump_instr to log the encoding of the offending userspace instruction. However, dump_instr is also used to dump instructions from kernel space, and internally always switches to KERNEL_DS before dumping the instruction with get_user. When both PAN and UAO are in use, reading a user instruction via get_user while in KERNEL_DS will result in a permission fault, which leads to an Oops. As we have regs corresponding to the context of the original instruction abort, we can inspect this and only flip to KERNEL_DS if the original abort was taken from the kernel, avoiding this issue. At the same time, remove the redundant (and incorrect) comments regarding the order dump_mem and dump_instr are called in. Cc: Catalin Marinas Cc: James Morse Cc: Robin Murphy Cc: #4.6+ Signed-off-by: Mark Rutland Reported-by: Vladimir Murzin Tested-by: Vladimir Murzin Fixes: 57f4959bad0a154a ("arm64: kernel: Add support for User Access Override") Signed-off-by: Will Deacon (cherry picked from commit c5cea06be060f38e5400d796e61cfc8c36e52924) Signed-off-by: Alex Shi --- arch/arm64/kernel/traps.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 0f35cbb22c0a..fe60ecf8be19 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -64,8 +64,7 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom, /* * We need to switch to kernel mode so that we can use __get_user - * to safely read from kernel space. Note that we now dump the - * code first, just in case the backtrace kills us. + * to safely read from kernel space. */ fs = get_fs(); set_fs(KERNEL_DS); @@ -111,21 +110,12 @@ static void dump_backtrace_entry(unsigned long where) print_ip_sym(where); } -static void dump_instr(const char *lvl, struct pt_regs *regs) +static void __dump_instr(const char *lvl, struct pt_regs *regs) { unsigned long addr = instruction_pointer(regs); - mm_segment_t fs; char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str; int i; - /* - * We need to switch to kernel mode so that we can use __get_user - * to safely read from kernel space. Note that we now dump the - * code first, just in case the backtrace kills us. - */ - fs = get_fs(); - set_fs(KERNEL_DS); - for (i = -4; i < 1; i++) { unsigned int val, bad; @@ -139,8 +129,18 @@ static void dump_instr(const char *lvl, struct pt_regs *regs) } } printk("%sCode: %s\n", lvl, str); +} - set_fs(fs); +static void dump_instr(const char *lvl, struct pt_regs *regs) +{ + if (!user_mode(regs)) { + mm_segment_t fs = get_fs(); + set_fs(KERNEL_DS); + __dump_instr(lvl, regs); + set_fs(fs); + } else { + __dump_instr(lvl, regs); + } } static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) From b1c272801015672bac0fb0b4398cd94a47b70f66 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Thu, 2 Jun 2016 15:27:04 +0100 Subject: [PATCH 040/521] arm64: spinlock: order spin_{is_locked,unlock_wait} against local locks spin_is_locked has grown two very different use-cases: (1) [The sane case] API functions may require a certain lock to be held by the caller and can therefore use spin_is_locked as part of an assert statement in order to verify that the lock is indeed held. For example, usage of assert_spin_locked. (2) [The insane case] There are two locks, where a CPU takes one of the locks and then checks whether or not the other one is held before accessing some shared state. For example, the "optimized locking" in ipc/sem.c. In the latter case, the sequence looks like: spin_lock(&sem->lock); if (!spin_is_locked(&sma->sem_perm.lock)) /* Access shared state */ and requires that the spin_is_locked check is ordered after taking the sem->lock. Unfortunately, since our spinlocks are implemented using a LDAXR/STXR sequence, the read of &sma->sem_perm.lock can be speculated before the STXR and consequently return a stale value. Whilst this hasn't been seen to cause issues in practice, PowerPC fixed the same issue in 51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()") and, although we did something similar for spin_unlock_wait in d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") that doesn't actually take care of ordering against local acquisition of a different lock. This patch adds an smp_mb() to the start of our arch_spin_is_locked and arch_spin_unlock_wait routines to ensure that the lock value is always loaded after any other locks have been taken by the current CPU. Reported-by: Peter Zijlstra Signed-off-by: Will Deacon (cherry picked from commit 38b850a73034f075c4088e7511b36ebbef9dce00) Signed-off-by: Alex Shi --- arch/arm64/include/asm/spinlock.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index fc9682bfe002..aac64d55cb22 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -31,6 +31,12 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) unsigned int tmp; arch_spinlock_t lockval; + /* + * Ensure prior spin_lock operations to other locks have completed + * on this CPU before we test whether "lock" is locked. + */ + smp_mb(); + asm volatile( " sevl\n" "1: wfe\n" @@ -148,6 +154,7 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock) static inline int arch_spin_is_locked(arch_spinlock_t *lock) { + smp_mb(); /* See arch_spin_unlock_wait */ return !arch_spin_value_unlocked(READ_ONCE(*lock)); } From 78732f2e083800c5044e3c4b8161e5fc7a1d4adc Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 8 Jun 2016 15:10:57 +0100 Subject: [PATCH 041/521] arm64: spinlock: fix spin_unlock_wait for LSE atomics Commit d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") fixed spin_unlock_wait for LL/SC-based atomics under the premise that the LSE atomics (in particular, the LDADDA instruction) are indivisible. Unfortunately, these instructions are only indivisible when used with the -AL (full ordering) suffix and, consequently, the same issue can theoretically be observed with LSE atomics, where a later (in program order) load can be speculated before the write portion of the atomic operation. This patch fixes the issue by performing a CAS of the lock once we've established that it's unlocked, in much the same way as the LL/SC code. Fixes: d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") Signed-off-by: Will Deacon (cherry picked from commit 3a5facd09da848193f5bcb0dea098a298bc1a29d) Signed-off-by: Alex Shi --- arch/arm64/include/asm/spinlock.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index aac64d55cb22..d5c894253e73 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -43,13 +43,17 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) "2: ldaxr %w0, %2\n" " eor %w1, %w0, %w0, ror #16\n" " cbnz %w1, 1b\n" + /* Serialise against any concurrent lockers */ ARM64_LSE_ATOMIC_INSN( /* LL/SC */ " stxr %w1, %w0, %2\n" -" cbnz %w1, 2b\n", /* Serialise against any concurrent lockers */ - /* LSE atomics */ " nop\n" -" nop\n") +" nop\n", + /* LSE atomics */ +" mov %w1, %w0\n" +" cas %w0, %w0, %2\n" +" eor %w1, %w1, %w0\n") +" cbnz %w1, 2b\n" : "=&r" (lockval), "=&r" (tmp), "+Q" (*lock) : : "memory"); From f76dc9a5db236a11ee3adf86aa0d7c2751582bb5 Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Tue, 26 Jul 2016 10:16:55 -0700 Subject: [PATCH 042/521] arm64: Only select ARM64_MODULE_PLTS if MODULES=y MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Selecting CONFIG_RANDOMIZE_BASE=y and CONFIG_MODULES=n fails to build the module PLTs support: CC arch/arm64/kernel/module-plts.o /work/Linux/linux-2.6-aarch64/arch/arm64/kernel/module-plts.c: In function ‘module_emit_plt_entry’: /work/Linux/linux-2.6-aarch64/arch/arm64/kernel/module-plts.c:32:49: error: dereferencing pointer to incomplete type ‘struct module’ This patch selects ARM64_MODULE_PLTS conditionally only if MODULES is enabled. Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR") Cc: # 4.6+ Reported-by: Jeff Vander Stoep Acked-by: Ard Biesheuvel Acked-by: Mark Rutland Signed-off-by: Catalin Marinas (cherry picked from commit b9c220b589daaf140f5b8ebe502c98745b94e65c) Signed-off-by: Alex Shi --- arch/arm64/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 97583a1878db..2693917d200f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -750,7 +750,7 @@ config RELOCATABLE config RANDOMIZE_BASE bool "Randomize the address of the kernel image" - select ARM64_MODULE_PLTS + select ARM64_MODULE_PLTS if MODULES select RELOCATABLE help Randomizes the virtual address at which the kernel image is From 94fa0d8ea617ea063c02781da224a0cfa6fe2233 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 24 Aug 2016 10:07:14 +0100 Subject: [PATCH 043/521] perf/core: Use this_cpu_ptr() when stopping AUX events When tearing down an AUX buf for an event via perf_mmap_close(), __perf_event_output_stop() is called on the event's CPU to ensure that trace generation is halted before the process of unmapping and freeing the buffer pages begins. The callback is performed via cpu_function_call(), which ensures that it runs with interrupts disabled and is therefore not preemptible. Unfortunately, the current code grabs the per-cpu context pointer using get_cpu_ptr(), which unnecessarily disables preemption and doesn't pair the call with put_cpu_ptr(), leading to a preempt_count() imbalance and a BUG when freeing the AUX buffer later on: WARNING: CPU: 1 PID: 2249 at kernel/events/ring_buffer.c:539 __rb_free_aux+0x10c/0x120 Modules linked in: [...] Call Trace: [] dump_stack+0x4f/0x72 [] __warn+0xc6/0xe0 [] warn_slowpath_null+0x18/0x20 [] __rb_free_aux+0x10c/0x120 [] rb_free_aux+0x13/0x20 [] perf_mmap_close+0x29e/0x2f0 [] ? perf_iterate_ctx+0xe0/0xe0 [] remove_vma+0x25/0x60 [] exit_mmap+0x106/0x140 [] mmput+0x1c/0xd0 [] do_exit+0x253/0xbf0 [] do_group_exit+0x3e/0xb0 [] get_signal+0x249/0x640 [] do_signal+0x23/0x640 [] ? _raw_write_unlock_irq+0x12/0x30 [] ? _raw_spin_unlock_irq+0x9/0x10 [] ? __schedule+0x2c6/0x710 [] exit_to_usermode_loop+0x74/0x90 [] prepare_exit_to_usermode+0x26/0x30 [] retint_user+0x8/0x10 This patch uses this_cpu_ptr() instead of get_cpu_ptr(), since preemption is already disabled by the caller. Signed-off-by: Will Deacon Reviewed-by: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vince Weaver Fixes: 95ff4ca26c49 ("perf/core: Free AUX pages in unmap path") Link: http://lkml.kernel.org/r/20160824091905.GA16944@arm.com Signed-off-by: Ingo Molnar (cherry picked from commit 8b6a3fe8fab97716990a3abde1a01fb5a34552a3) Signed-off-by: Alex Shi --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index e2998a4444b0..cbaaf6c70f29 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5797,7 +5797,7 @@ static int __perf_pmu_output_stop(void *info) { struct perf_event *event = info; struct pmu *pmu = event->pmu; - struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context); + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); struct remote_output ro = { .rb = event->rb, }; From a877498ddb55fce5f94674e4d0b39747932ef6b8 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 24 Aug 2016 18:02:08 +0100 Subject: [PATCH 044/521] arm64: avoid TLB conflict with CONFIG_RANDOMIZE_BASE When CONFIG_RANDOMIZE_BASE is selected, we modify the page tables to remap the kernel at a newly-chosen VA range. We do this with the MMU disabled, but do not invalidate TLBs prior to re-enabling the MMU with the new tables. Thus the old mappings entries may still live in TLBs, and we risk violating Break-Before-Make requirements, leading to TLB conflicts and/or other issues. We invalidate TLBs when we uninsall the idmap in early setup code, but prior to this we are subject to issues relating to the Break-Before-Make violation. Avoid these issues by invalidating the TLBs before the new mappings can be used by the hardware. Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR") Cc: # 4.6+ Acked-by: Ard Biesheuvel Acked-by: Will Deacon Signed-off-by: Mark Rutland Signed-off-by: Catalin Marinas (cherry picked from commit fd363bd417ddb6103564c69cfcbd92d9a7877431) Signed-off-by: Alex Shi --- arch/arm64/kernel/head.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 9890d04a96cb..488708687fda 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -697,6 +697,9 @@ __enable_mmu: isb bl __create_page_tables // recreate kernel mapping + tlbi vmalle1 // Remove any stale TLB entries + dsb nsh + msr sctlr_el1, x19 // re-enable the MMU isb ic iallu // flush instructions fetched From 5f642b4ad6b579bbdca82f690c2de314eccae507 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Thu, 25 Aug 2016 15:18:54 -0600 Subject: [PATCH 045/521] coresight: Remove erroneous dma_free_coherent in tmc_probe commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed") removed the static allocation of buffer for the trace data in ETR mode in tmc_probe. However it failed to remove the "devm_free_coherent" in tmc_probe when the probe fails due to other reasons. This patch gets rid of the incorrect dma_free_coherent() call. Fixes: commit de5461970b3e9e194 ("coresight: tmc: allocating memory when needed") Cc: Mathieu Poirier Signed-off-by: Suzuki K Poulose Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 481e46fe7a88557b66330cbb047b25cc13eff4b9) Signed-off-by: Alex Shi --- drivers/hwtracing/coresight/coresight-tmc.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c index 9e02ac963cd0..3978cbb6b038 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.c +++ b/drivers/hwtracing/coresight/coresight-tmc.c @@ -388,9 +388,6 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id) err_misc_register: coresight_unregister(drvdata->csdev); err_devm_kzalloc: - if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) - dma_free_coherent(dev, drvdata->size, - drvdata->vaddr, drvdata->paddr); return ret; } From 2780600caa02a0b1d28a09bfe606fca270451430 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Tue, 31 May 2016 15:58:00 +0100 Subject: [PATCH 046/521] arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled With ARM64_64K_PAGES and RANDOMIZE_TEXT_OFFSET enabled, we hit the following issue on the boot: kernel BUG at arch/arm64/mm/mmu.c:480! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.6.0 #310 Hardware name: ARM Juno development board (r2) (DT) task: ffff000008d58a80 ti: ffff000008d30000 task.ti: ffff000008d30000 PC is at map_kernel_segment+0x44/0xb0 LR is at paging_init+0x84/0x5b0 pc : [] lr : [] pstate: 600002c5 Call trace: [] map_kernel_segment+0x44/0xb0 [] paging_init+0x84/0x5b0 [] setup_arch+0x198/0x534 [] start_kernel+0x70/0x388 [] __primary_switched+0x30/0x74 Commit 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text segment mapping") removed the alignment between the .head.text and .text sections, and used the _text rather than the _stext interval for mapping the .text segment. Prior to this commit _stext was always section aligned and didn't cause any issue even when RANDOMIZE_TEXT_OFFSET was enabled. Since that alignment has been removed and _text is used to map the .text segment, we need ensure _text is always page aligned when RANDOMIZE_TEXT_OFFSET is enabled. This patch adds logic to TEXT_OFFSET fuzzing to ensure that the offset is always aligned to the kernel page size. To ensure this, we rely on the PAGE_SHIFT being available via Kconfig. Signed-off-by: Mark Rutland Reported-by: Sudeep Holla Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: Will Deacon Fixes: 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text segment mapping") Signed-off-by: Will Deacon (cherry picked from commit aed7eb8367939244ba19445292ffdfc398e0d66a) Signed-off-by: Alex Shi --- arch/arm64/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 0a9bf4500852..ed9a181c4db5 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -60,7 +60,9 @@ head-y := arch/arm64/kernel/head.o # The byte offset of the kernel image in RAM from the start of RAM. ifeq ($(CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET), y) -TEXT_OFFSET := $(shell awk 'BEGIN {srand(); printf "0x%03x000\n", int(512 * rand())}') +TEXT_OFFSET := $(shell awk "BEGIN {srand(); printf \"0x%06x\n\", \ + int(2 * 1024 * 1024 / (2 ^ $(CONFIG_ARM64_PAGE_SHIFT)) * \ + rand()) * (2 ^ $(CONFIG_ARM64_PAGE_SHIFT))}") else TEXT_OFFSET := 0x00080000 endif From 0f0c7c1c0c4e2f479a8c2056ee9cd835b9bd4854 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 3 Mar 2016 15:10:59 +0100 Subject: [PATCH 047/521] arm64: enable CONFIG_DEBUG_RODATA by default In spite of its name, CONFIG_DEBUG_RODATA is an important hardening feature for production kernels, and distros all enable it by default in their kernel configs. However, since enabling it used to result in more granular, and thus less efficient kernel mappings, it is not enabled by default for performance reasons. However, since commit 2f39b5f91eb4 ("arm64: mm: Mark .rodata as RO"), the various kernel segments (.text, .rodata, .init and .data) are already mapped individually, and the only effect of setting CONFIG_DEBUG_RODATA is that the existing .text and .rodata mappings are updated late in the boot sequence to have their read-only attributes set, which means that any performance concerns related to enabling CONFIG_DEBUG_RODATA are no longer valid. So from now on, make CONFIG_DEBUG_RODATA default to 'y' Signed-off-by: Ard Biesheuvel Acked-by: Mark Rutland Acked-by: Kees Cook Signed-off-by: Catalin Marinas (cherry picked from commit 57efac2f7108e3255d0dfe512290c9896f4ed55f) Signed-off-by: Alex Shi --- arch/arm64/Kconfig.debug | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug index ab1cb1fc4e3d..d33041614206 100644 --- a/arch/arm64/Kconfig.debug +++ b/arch/arm64/Kconfig.debug @@ -64,13 +64,13 @@ config DEBUG_SET_MODULE_RONX config DEBUG_RODATA bool "Make kernel text and rodata read-only" + default y help If this is set, kernel text and rodata will be made read-only. This is to help catch accidental or malicious attempts to change the - kernel's executable code. Additionally splits rodata from kernel - text so it can be made explicitly non-executable. + kernel's executable code. - If in doubt, say Y + If in doubt, say Y config DEBUG_ALIGN_RODATA depends on DEBUG_RODATA From 351068f5efa8ae188bb1250eae0aa74060d493c8 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Thu, 8 Sep 2016 09:57:08 +0200 Subject: [PATCH 048/521] fs/proc/kcore.c: Add bounce buffer for ktext data We hit hardened usercopy feature check for kernel text access by reading kcore file: usercopy: kernel memory exposure attempt detected from ffffffff8179a01f () (4065 bytes) kernel BUG at mm/usercopy.c:75! Bypassing this check for kcore by adding bounce buffer for ktext data. Reported-by: Steve Best Fixes: f5509cc18daa ("mm: Hardened usercopy") Suggested-by: Kees Cook Signed-off-by: Jiri Olsa Acked-by: Kees Cook Signed-off-by: Linus Torvalds (cherry picked from commit df04abfd181acc276ba6762c8206891ae10ae00d) Signed-off-by: Alex Shi --- fs/proc/kcore.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 92e6726f6e37..d1e5a66a33ba 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -516,7 +516,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) if (kern_addr_valid(start)) { unsigned long n; - n = copy_to_user(buffer, (char *)start, tsz); + /* + * Using bounce buffer to bypass the + * hardened user copy kernel text checks. + */ + memcpy(buf, (char *) start, tsz); + n = copy_to_user(buffer, buf, tsz); /* * We cannot distinguish between fault on source * and fault on destination. When this happens From d425243ac3340984ea43d32d3815eeb502d43bce Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Thu, 17 Mar 2016 12:47:12 +0100 Subject: [PATCH 049/521] s390: add DEBUG_RODATA support git commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings") adds a bogus warning to the console which states that s390 does not support kernel memory protection. This however is not true. We do support that since a couple of years however in a different way than the author of the above named patch expected. To get rid of the misleading message implement the mark_rodata_ro function and emit a message which states the amount of memory which was write protected already earlier. This is the same what parisc currently does. We currently do not support the kernel parameter "rodata=off" which would allow to write to the rodata section again. However since we have this feature since years without any problems there is no reason to add support for this. Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky (cherry picked from commit 91d37211769510ae0b4747045d8f81d3b9dd4278) Signed-off-by: Alex Shi --- arch/s390/Kconfig | 3 +++ arch/s390/mm/init.c | 10 +++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 3a55f493c7da..8ddd756faa32 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -62,6 +62,9 @@ config PCI_QUIRKS config ARCH_SUPPORTS_UPROBES def_bool y +config DEBUG_RODATA + def_bool y + config S390 def_bool y select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index c722400c7697..9c1b3c5e3174 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -108,6 +108,13 @@ void __init paging_init(void) free_area_init_nodes(max_zone_pfns); } +void mark_rodata_ro(void) +{ + /* Text and rodata are already protected. Nothing to do here. */ + pr_info("Write protecting the kernel read-only data: %luk\n", + ((unsigned long)&_eshared - (unsigned long)&_stext) >> 10); +} + void __init mem_init(void) { if (MACHINE_HAS_TLB_LC) @@ -126,9 +133,6 @@ void __init mem_init(void) setup_zero_pages(); /* Setup zeroed pages. */ mem_init_print_info(NULL); - printk("Write protected kernel read-only data: %#lx - %#lx\n", - (unsigned long)&_stext, - PFN_ALIGN((unsigned long)&_eshared) - 1); } void free_initmem(void) From 957413466cde02bee0b6540889376d35d6b0b41e Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Tue, 7 Jun 2016 12:20:51 +0200 Subject: [PATCH 050/521] vmlinux.lds.h: allow arch specific handling of ro_after_init data section commit c74ba8b3480d ("arch: Introduce post-init read-only memory") introduced the __ro_after_init attribute which allows to add variables to the ro_after_init data section. This new section was added to rodata, even though it contains writable data. This in turn causes problems on architectures which mark the page table entries read-only that point to rodata very early. This patch allows architectures to implement an own handling of the .data..ro_after_init section. Usually that would be: - mark the rodata section read-only very early - mark the ro_after_init section read-only within mark_rodata_ro Signed-off-by: Heiko Carstens Reviewed-by: Kees Cook Signed-off-by: Martin Schwidefsky (cherry picked from commit 32fb2fc5c357fb99616bbe100dbcb27bc7f5d045) Signed-off-by: Alex Shi --- include/asm-generic/vmlinux.lds.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 772c784ba763..33eb2fe23071 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -248,6 +248,14 @@ . = ALIGN(align); \ *(.data..init_task) +/* + * Allow architectures to handle ro_after_init data on their + * own by defining an empty RO_AFTER_INIT_DATA. + */ +#ifndef RO_AFTER_INIT_DATA +#define RO_AFTER_INIT_DATA *(.data..ro_after_init) +#endif + /* * Read only Data */ @@ -256,7 +264,7 @@ .rodata : AT(ADDR(.rodata) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start_rodata) = .; \ *(.rodata) *(.rodata.*) \ - *(.data..ro_after_init) /* Read only after init */ \ + RO_AFTER_INIT_DATA /* Read only after init */ \ *(__vermagic) /* Kernel version magic */ \ . = ALIGN(8); \ VMLINUX_SYMBOL(__start___tracepoints_ptrs) = .; \ From a34cb2390f7f28f4a44568cccbc2c65a9bdff721 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 3 Mar 2016 15:10:59 +0100 Subject: [PATCH 051/521] arm64: enable CONFIG_DEBUG_RODATA by default In spite of its name, CONFIG_DEBUG_RODATA is an important hardening feature for production kernels, and distros all enable it by default in their kernel configs. However, since enabling it used to result in more granular, and thus less efficient kernel mappings, it is not enabled by default for performance reasons. However, since commit 2f39b5f91eb4 ("arm64: mm: Mark .rodata as RO"), the various kernel segments (.text, .rodata, .init and .data) are already mapped individually, and the only effect of setting CONFIG_DEBUG_RODATA is that the existing .text and .rodata mappings are updated late in the boot sequence to have their read-only attributes set, which means that any performance concerns related to enabling CONFIG_DEBUG_RODATA are no longer valid. So from now on, make CONFIG_DEBUG_RODATA default to 'y' Signed-off-by: Ard Biesheuvel Acked-by: Mark Rutland Acked-by: Kees Cook Signed-off-by: Catalin Marinas (cherry picked from commit 57efac2f7108e3255d0dfe512290c9896f4ed55f) Signed-off-by: Alex Shi --- arch/arm64/Kconfig.debug | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug index 04fb73b973f1..8b0cd45de394 100644 --- a/arch/arm64/Kconfig.debug +++ b/arch/arm64/Kconfig.debug @@ -64,13 +64,13 @@ config DEBUG_SET_MODULE_RONX config DEBUG_RODATA bool "Make kernel text and rodata read-only" + default y help If this is set, kernel text and rodata will be made read-only. This is to help catch accidental or malicious attempts to change the - kernel's executable code. Additionally splits rodata from kernel - text so it can be made explicitly non-executable. + kernel's executable code. - If in doubt, say Y + If in doubt, say Y config DEBUG_ALIGN_RODATA depends on DEBUG_RODATA && ARM64_4K_PAGES From 1ff46aa097c9d407a905de365ecdfcb1fa27d0cc Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Thu, 11 Aug 2016 14:11:05 +0100 Subject: [PATCH 052/521] arm64: hibernate: avoid potential TLB conflict In create_safe_exec_page we install a set of global mappings in TTBR0, then subsequently invalidate TLBs. While TTBR0 points at the zero page, and the TLBs should be free of stale global entries, we may have stale ASID-tagged entries (e.g. from the EFI runtime services mappings) for the same VAs. Per the ARM ARM these ASID-tagged entries may conflict with newly-allocated global entries, and we must follow a Break-Before-Make approach to avoid issues resulting from this. This patch reworks create_safe_exec_page to invalidate TLBs while the zero page is still in place, ensuring that there are no potential conflicts when the new TTBR0 value is installed. As a single CPU is online while this code executes, we do not need to perform broadcast TLB maintenance, and can call local_flush_tlb_all(), which also subsumes some barriers. The remaining assembly is converted to use write_sysreg() and isb(). Other than this, we safely manipulate TTBRs in the hibernate dance. The code we install as part of the new TTBR0 mapping (the hibernated kernel's swsusp_arch_suspend_exit) installs a zero page into TTBR1, invalidates TLBs, then installs its preferred value. Upon being restored to the middle of swsusp_arch_suspend, the new image will call __cpu_suspend_exit, which will call cpu_uninstall_idmap, installing the zero page in TTBR0 and invalidating all TLB entries. Fixes: 82869ac57b5d ("arm64: kernel: Add support for hibernate/suspend-to-disk") Signed-off-by: Mark Rutland Acked-by: James Morse Tested-by: James Morse Cc: Lorenzo Pieralisi Cc: Will Deacon Cc: # 4.7+ Signed-off-by: Catalin Marinas (cherry picked from commit 0194e760f7d2f42adb5e1db31b27a4331dd89c2f) Signed-off-by: Alex Shi --- arch/arm64/kernel/hibernate.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index f8df75d740f4..043a07206527 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -34,6 +34,7 @@ #include #include #include +#include #include /* @@ -216,12 +217,22 @@ static int create_safe_exec_page(void *src_start, size_t length, set_pte(pte, __pte(virt_to_phys((void *)dst) | pgprot_val(PAGE_KERNEL_EXEC))); - /* Load our new page tables */ - asm volatile("msr ttbr0_el1, %0;" - "isb;" - "tlbi vmalle1is;" - "dsb ish;" - "isb" : : "r"(virt_to_phys(pgd))); + /* + * Load our new page tables. A strict BBM approach requires that we + * ensure that TLBs are free of any entries that may overlap with the + * global mappings we are about to install. + * + * For a real hibernate/resume cycle TTBR0 currently points to a zero + * page, but TLBs may contain stale ASID-tagged entries (e.g. for EFI + * runtime services), while for a userspace-driven test_resume cycle it + * points to userspace page tables (and we must point it at a zero page + * ourselves). Elsewhere we only (un)install the idmap with preemption + * disabled, so T0SZ should be as required regardless. + */ + cpu_set_reserved_ttbr0(); + local_flush_tlb_all(); + write_sysreg(virt_to_phys(pgd), ttbr0_el1); + isb(); *phys_dst_addr = virt_to_phys((void *)dst); From 18fc694578719686df59e31ae32dd862eb5f0ba8 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Thu, 11 Aug 2016 14:11:06 +0100 Subject: [PATCH 053/521] arm64: hibernate: handle allocation failures In create_safe_exec_page(), we create a copy of the hibernate exit text, along with some page tables to map this via TTBR0. We then install the new tables in TTBR0. In swsusp_arch_resume() we call create_safe_exec_page() before trying a number of operations which may fail (e.g. copying the linear map page tables). If these fail, we bail out of swsusp_arch_resume() and return an error code, but leave TTBR0 as-is. Subsequently, the core hibernate code will call free_basic_memory_bitmaps(), which will free all of the memory allocations we made, including the page tables installed in TTBR0. Thus, we may have TTBR0 pointing at dangling freed memory for some period of time. If the hibernate attempt was triggered by a user requesting a hibernate test via the reboot syscall, we may return to userspace with the clobbered TTBR0 value. Avoid these issues by reorganising swsusp_arch_resume() such that we have no failure paths after create_safe_exec_page(). We also add a check that the zero page allocation succeeded, matching what we have for other allocations. Fixes: 82869ac57b5d ("arm64: kernel: Add support for hibernate/suspend-to-disk") Signed-off-by: Mark Rutland Acked-by: James Morse Cc: Lorenzo Pieralisi Cc: Will Deacon Cc: # 4.7+ Signed-off-by: Catalin Marinas (cherry picked from commit dfbca61af0b654990b9af8297ac574a9986d8275) Signed-off-by: Alex Shi --- arch/arm64/kernel/hibernate.c | 59 +++++++++++++++++++---------------- 1 file changed, 32 insertions(+), 27 deletions(-) diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index 043a07206527..6dd18140ebb8 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -398,6 +398,38 @@ int swsusp_arch_resume(void) void __noreturn (*hibernate_exit)(phys_addr_t, phys_addr_t, void *, void *, phys_addr_t, phys_addr_t); + /* + * Restoring the memory image will overwrite the ttbr1 page tables. + * Create a second copy of just the linear map, and use this when + * restoring. + */ + tmp_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); + if (!tmp_pg_dir) { + pr_err("Failed to allocate memory for temporary page tables."); + rc = -ENOMEM; + goto out; + } + rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, 0); + if (rc) + goto out; + + /* + * Since we only copied the linear map, we need to find restore_pblist's + * linear map address. + */ + lm_restore_pblist = LMADDR(restore_pblist); + + /* + * We need a zero page that is zero before & after resume in order to + * to break before make on the ttbr1 page tables. + */ + zero_page = (void *)get_safe_page(GFP_ATOMIC); + if (!zero_page) { + pr_err("Failed to allocate zero page."); + rc = -ENOMEM; + goto out; + } + /* * Locate the exit code in the bottom-but-one page, so that *NULL * still has disastrous affects. @@ -423,27 +455,6 @@ int swsusp_arch_resume(void) */ __flush_dcache_area(hibernate_exit, exit_size); - /* - * Restoring the memory image will overwrite the ttbr1 page tables. - * Create a second copy of just the linear map, and use this when - * restoring. - */ - tmp_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC); - if (!tmp_pg_dir) { - pr_err("Failed to allocate memory for temporary page tables."); - rc = -ENOMEM; - goto out; - } - rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, 0); - if (rc) - goto out; - - /* - * Since we only copied the linear map, we need to find restore_pblist's - * linear map address. - */ - lm_restore_pblist = LMADDR(restore_pblist); - /* * KASLR will cause the el2 vectors to be in a different location in * the resumed kernel. Load hibernate's temporary copy into el2. @@ -458,12 +469,6 @@ int swsusp_arch_resume(void) __hyp_set_vectors(el2_vectors); } - /* - * We need a zero page that is zero before & after resume in order to - * to break before make on the ttbr1 page tables. - */ - zero_page = (void *)get_safe_page(GFP_ATOMIC); - hibernate_exit(virt_to_phys(tmp_pg_dir), resume_hdr.ttbr1_el1, resume_hdr.reenter_kernel, lm_restore_pblist, resume_hdr.__hyp_stub_vectors, virt_to_phys(zero_page)); From 95bcc3e563748f65c9fc0eb1bbc3f99b76efd4b2 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 26 Aug 2016 16:03:42 +0100 Subject: [PATCH 054/521] arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1 Changes to make the resume from cpu_suspend() code behave more like secondary boot caused debug exceptions to be unmasked early by __cpu_setup(). We then go on to restore mdscr_el1 in cpu_do_resume(), potentially taking break or watch points based on uninitialised registers. Mask debug exceptions in cpu_do_resume(), which is specific to resume from cpu_suspend(). Debug exceptions will be restored to their original state by local_dbg_restore() in cpu_suspend(), which runs after hw_breakpoint_restore() has re-initialised the other registers. Reported-by: Lorenzo Pieralisi Fixes: cabe1c81ea5b ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va") Cc: # 4.7+ Signed-off-by: James Morse Acked-by: Will Deacon Signed-off-by: Catalin Marinas (cherry picked from commit 744c6c37cc18705d19e179622f927f5b781fe9cc) Signed-off-by: Alex Shi --- arch/arm64/mm/proc.S | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 5bb61de23201..9d37e967fa19 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -100,7 +100,16 @@ ENTRY(cpu_do_resume) msr tcr_el1, x8 msr vbar_el1, x9 + + /* + * __cpu_setup() cleared MDSCR_EL1.MDE and friends, before unmasking + * debug exceptions. By restoring MDSCR_EL1 here, we may take a debug + * exception. Mask them until local_dbg_restore() in cpu_suspend() + * resets them. + */ + disable_dbg msr mdscr_el1, x10 + msr sctlr_el1, x12 /* * Restore oslsr_el1 by writing oslar_el1 From ac1a97d8a562161e42edd23e5d0f1740a3d93c85 Mon Sep 17 00:00:00 2001 From: Chris Bainbridge Date: Mon, 25 Apr 2016 13:48:38 +0100 Subject: [PATCH 055/521] usb: core: hub: hub_port_init lock controller instead of bus commit feb26ac31a2a5cb88d86680d9a94916a6343e9e6 upstream. The XHCI controller presents two USB buses to the system - one for USB2 and one for USB3. The hub init code (hub_port_init) is reentrant but only locks one bus per thread, leading to a race condition failure when two threads attempt to simultaneously initialise a USB2 and USB3 device: [ 8.034843] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this failure occurred on 6% of all boots. The call traces at the point of failure are: Call Trace: [] schedule+0x37/0x90 [] usb_kill_urb+0x8d/0xd0 [] ? wake_up_atomic_t+0x30/0x30 [] usb_start_wait_urb+0xbe/0x150 [] usb_control_msg+0xbc/0xf0 [] hub_port_init+0x51e/0xb70 [] hub_event+0x817/0x1570 [] process_one_work+0x1ff/0x620 [] ? process_one_work+0x15f/0x620 [] worker_thread+0x64/0x4b0 [] ? rescuer_thread+0x390/0x390 [] kthread+0x105/0x120 [] ? kthread_create_on_node+0x200/0x200 [] ret_from_fork+0x3f/0x70 [] ? kthread_create_on_node+0x200/0x200 Call Trace: [] xhci_setup_device+0x53d/0xa40 [] xhci_address_device+0xe/0x10 [] hub_port_init+0x1bf/0xb70 [] ? trace_hardirqs_on+0xd/0x10 [] hub_event+0x817/0x1570 [] process_one_work+0x1ff/0x620 [] ? process_one_work+0x15f/0x620 [] worker_thread+0x64/0x4b0 [] ? rescuer_thread+0x390/0x390 [] kthread+0x105/0x120 [] ? kthread_create_on_node+0x200/0x200 [] ret_from_fork+0x3f/0x70 [] ? kthread_create_on_node+0x200/0x200 Which results from the two call chains: hub_port_init usb_get_device_descriptor usb_get_descriptor usb_control_msg usb_internal_control_msg usb_start_wait_urb usb_submit_urb / wait_for_completion_timeout / usb_kill_urb hub_port_init hub_set_address xhci_address_device xhci_setup_device Mathias Nyman explains the current behaviour violates the XHCI spec: hub_port_reset() will end up moving the corresponding xhci device slot to default state. As hub_port_reset() is called several times in hub_port_init() it sounds reasonable that we could end up with two threads having their xhci device slots in default state at the same time, which according to xhci 4.5.3 specs still is a big no no: "Note: Software shall not transition more than one Device Slot to the Default State at a time" So both threads fail at their next task after this. One fails to read the descriptor, and the other fails addressing the device. Fix this in hub_port_init by locking the USB controller (instead of an individual bus) to prevent simultaneous initialisation of both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Link: https://lkml.org/lkml/2016/2/8/312 Link: https://lkml.org/lkml/2016/2/4/748 Signed-off-by: Chris Bainbridge Cc: stable Acked-by: Mathias Nyman Signed-off-by: Sumit Semwal [sumits: minor merge conflict resolution for linux-4.4.y] Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hcd.c | 15 +++++++++++++-- drivers/usb/core/hub.c | 8 ++++---- include/linux/usb.h | 3 +-- include/linux/usb/hcd.h | 1 + 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index f44ce09367bc..9a5303c17de7 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus) bus->bandwidth_allocated = 0; bus->bandwidth_int_reqs = 0; bus->bandwidth_isoc_reqs = 0; - mutex_init(&bus->usb_address0_mutex); + mutex_init(&bus->devnum_next_mutex); INIT_LIST_HEAD (&bus->bus_list); } @@ -2497,6 +2497,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, return NULL; } if (primary_hcd == NULL) { + hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex), + GFP_KERNEL); + if (!hcd->address0_mutex) { + kfree(hcd); + dev_dbg(dev, "hcd address0 mutex alloc failed\n"); + return NULL; + } + mutex_init(hcd->address0_mutex); hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex), GFP_KERNEL); if (!hcd->bandwidth_mutex) { @@ -2508,6 +2516,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, dev_set_drvdata(dev, hcd); } else { mutex_lock(&usb_port_peer_mutex); + hcd->address0_mutex = primary_hcd->address0_mutex; hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex; hcd->primary_hcd = primary_hcd; primary_hcd->primary_hcd = primary_hcd; @@ -2574,8 +2583,10 @@ static void hcd_release(struct kref *kref) struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref); mutex_lock(&usb_port_peer_mutex); - if (usb_hcd_is_primary_hcd(hcd)) + if (usb_hcd_is_primary_hcd(hcd)) { + kfree(hcd->address0_mutex); kfree(hcd->bandwidth_mutex); + } if (hcd->shared_hcd) { struct usb_hcd *peer = hcd->shared_hcd; diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 780db8bb2262..f52d8abf6979 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1980,7 +1980,7 @@ static void choose_devnum(struct usb_device *udev) struct usb_bus *bus = udev->bus; /* be safe when more hub events are proceed in parallel */ - mutex_lock(&bus->usb_address0_mutex); + mutex_lock(&bus->devnum_next_mutex); if (udev->wusb) { devnum = udev->portnum + 1; BUG_ON(test_bit(devnum, bus->devmap.devicemap)); @@ -1998,7 +1998,7 @@ static void choose_devnum(struct usb_device *udev) set_bit(devnum, bus->devmap.devicemap); udev->devnum = devnum; } - mutex_unlock(&bus->usb_address0_mutex); + mutex_unlock(&bus->devnum_next_mutex); } static void release_devnum(struct usb_device *udev) @@ -4262,7 +4262,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, if (oldspeed == USB_SPEED_LOW) delay = HUB_LONG_RESET_TIME; - mutex_lock(&hdev->bus->usb_address0_mutex); + mutex_lock(hcd->address0_mutex); /* Reset the device; full speed may morph to high speed */ /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ @@ -4548,7 +4548,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, hub_port_disable(hub, port1, 0); update_devnum(udev, devnum); /* for disconnect processing */ } - mutex_unlock(&hdev->bus->usb_address0_mutex); + mutex_unlock(hcd->address0_mutex); return retval; } diff --git a/include/linux/usb.h b/include/linux/usb.h index 12891ffd4bf0..8c75af6b7d5b 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -371,14 +371,13 @@ struct usb_bus { int devnum_next; /* Next open device number in * round-robin allocation */ + struct mutex devnum_next_mutex; /* devnum_next mutex */ struct usb_devmap devmap; /* device address allocation map */ struct usb_device *root_hub; /* Root hub */ struct usb_bus *hs_companion; /* Companion EHCI bus, if any */ struct list_head bus_list; /* list of busses */ - struct mutex usb_address0_mutex; /* unaddressed device mutex */ - int bandwidth_allocated; /* on this bus: how much of the time * reserved for periodic (intr/iso) * requests is used, on average? diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h index f89c24bd53a4..3993b21f3d11 100644 --- a/include/linux/usb/hcd.h +++ b/include/linux/usb/hcd.h @@ -180,6 +180,7 @@ struct usb_hcd { * bandwidth_mutex should be dropped after a successful control message * to the device, or resetting the bandwidth after a failed attempt. */ + struct mutex *address0_mutex; struct mutex *bandwidth_mutex; struct usb_hcd *shared_hcd; struct usb_hcd *primary_hcd; From 45d9558837d4d79e6d241f1c45cabea8d20dca22 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Mon, 27 Jun 2016 10:23:10 -0400 Subject: [PATCH 056/521] USB: don't free bandwidth_mutex too early commit ab2a4bf83902c170d29ba130a8abb5f9d90559e1 upstream. The USB core contains a bug that can show up when a USB-3 host controller is removed. If the primary (USB-2) hcd structure is released before the shared (USB-3) hcd, the core will try to do a double-free of the common bandwidth_mutex. The problem was described in graphical form by Chung-Geol Kim, who first reported it: ================================================= At *remove USB(3.0) Storage sequence <1> --> <5> ((Problem Case)) ================================================= VOLD ------------------------------------|------------ (uevent) ________|_________ |<1> | |dwc3_otg_sm_work | |usb_put_hcd | |peer_hcd(kref=2)| |__________________| ________|_________ |<2> | |New USB BUS #2 | | | |peer_hcd(kref=1) | | | --(Link)-bandXX_mutex| | |__________________| | ___________________ | |<3> | | |dwc3_otg_sm_work | | |usb_put_hcd | | |primary_hcd(kref=1)| | |___________________| | _________|_________ | |<4> | | |New USB BUS #1 | | |hcd_release | | |primary_hcd(kref=0)| | | | | |bandXX_mutex(free) |<- |___________________| (( VOLD )) ______|___________ |<5> | | SCSI | |usb_put_hcd | |peer_hcd(kref=0) | |*hcd_release | |bandXX_mutex(free*)|<- double free |__________________| ================================================= This happens because hcd_release() frees the bandwidth_mutex whenever it sees a primary hcd being released (which is not a very good idea in any case), but in the course of releasing the primary hcd, it changes the pointers in the shared hcd in such a way that the shared hcd will appear to be primary when it gets released. This patch fixes the problem by changing hcd_release() so that it deallocates the bandwidth_mutex only when the _last_ hcd structure referencing it is released. The patch also removes an unnecessary test, so that when an hcd is released, both the shared_hcd and primary_hcd pointers in the hcd's peer will be cleared. Signed-off-by: Alan Stern Reported-by: Chung-Geol Kim Tested-by: Chung-Geol Kim Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hcd.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index 9a5303c17de7..5724d7c41e29 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -2573,26 +2573,23 @@ EXPORT_SYMBOL_GPL(usb_create_hcd); * Don't deallocate the bandwidth_mutex until the last shared usb_hcd is * deallocated. * - * Make sure to only deallocate the bandwidth_mutex when the primary HCD is - * freed. When hcd_release() is called for either hcd in a peer set - * invalidate the peer's ->shared_hcd and ->primary_hcd pointers to - * block new peering attempts + * Make sure to deallocate the bandwidth_mutex only when the last HCD is + * freed. When hcd_release() is called for either hcd in a peer set, + * invalidate the peer's ->shared_hcd and ->primary_hcd pointers. */ static void hcd_release(struct kref *kref) { struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref); mutex_lock(&usb_port_peer_mutex); - if (usb_hcd_is_primary_hcd(hcd)) { - kfree(hcd->address0_mutex); - kfree(hcd->bandwidth_mutex); - } if (hcd->shared_hcd) { struct usb_hcd *peer = hcd->shared_hcd; peer->shared_hcd = NULL; - if (peer->primary_hcd == hcd) - peer->primary_hcd = NULL; + peer->primary_hcd = NULL; + } else { + kfree(hcd->address0_mutex); + kfree(hcd->bandwidth_mutex); } mutex_unlock(&usb_port_peer_mutex); kfree(hcd); From c78c3376ec6707f4e2177906928b12cb6cd8c5a9 Mon Sep 17 00:00:00 2001 From: "Wang, Rui Y" Date: Sun, 29 Nov 2015 22:45:33 +0800 Subject: [PATCH 057/521] crypto: ghash-clmulni - Fix load failure commit 3a020a723c65eb8ffa7c237faca26521a024e582 upstream. ghash_clmulni_intel fails to load on Linux 4.3+ with the following message: "modprobe: ERROR: could not insert 'ghash_clmulni_intel': Invalid argument" After 8996eafdc ("crypto: ahash - ensure statesize is non-zero") all ahash drivers are required to implement import()/export(), and must have a non- zero statesize. This patch has been tested with the algif_hash interface. The calculated digest values, after several rounds of import()s and export()s, match those calculated by tcrypt. Signed-off-by: Rui Wang Signed-off-by: Herbert Xu Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- arch/x86/crypto/ghash-clmulni-intel_glue.c | 26 ++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c index 440df0c7a2ee..a69321a77783 100644 --- a/arch/x86/crypto/ghash-clmulni-intel_glue.c +++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c @@ -219,6 +219,29 @@ static int ghash_async_final(struct ahash_request *req) } } +static int ghash_async_import(struct ahash_request *req, const void *in) +{ + struct ahash_request *cryptd_req = ahash_request_ctx(req); + struct shash_desc *desc = cryptd_shash_desc(cryptd_req); + struct ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + ghash_async_init(req); + memcpy(dctx, in, sizeof(*dctx)); + return 0; + +} + +static int ghash_async_export(struct ahash_request *req, void *out) +{ + struct ahash_request *cryptd_req = ahash_request_ctx(req); + struct shash_desc *desc = cryptd_shash_desc(cryptd_req); + struct ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + memcpy(out, dctx, sizeof(*dctx)); + return 0; + +} + static int ghash_async_digest(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); @@ -288,8 +311,11 @@ static struct ahash_alg ghash_async_alg = { .final = ghash_async_final, .setkey = ghash_async_setkey, .digest = ghash_async_digest, + .export = ghash_async_export, + .import = ghash_async_import, .halg = { .digestsize = GHASH_DIGEST_SIZE, + .statesize = sizeof(struct ghash_desc_ctx), .base = { .cra_name = "ghash", .cra_driver_name = "ghash-clmulni", From 10659b8f5c600e642d0f1cadbbf83c739ac0c739 Mon Sep 17 00:00:00 2001 From: "Wang, Rui Y" Date: Sun, 29 Nov 2015 22:45:34 +0800 Subject: [PATCH 058/521] crypto: cryptd - Assign statesize properly commit 1a07834024dfca5c4bed5de8f8714306e0a11836 upstream. cryptd_create_hash() fails by returning -EINVAL. It is because after 8996eafdc ("crypto: ahash - ensure statesize is non-zero") all ahash drivers must have a non-zero statesize. This patch fixes the problem by properly assigning the statesize. Signed-off-by: Rui Wang Signed-off-by: Herbert Xu Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- crypto/cryptd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/cryptd.c b/crypto/cryptd.c index e7aa904cb20b..26a504db3f53 100644 --- a/crypto/cryptd.c +++ b/crypto/cryptd.c @@ -642,6 +642,7 @@ static int cryptd_create_hash(struct crypto_template *tmpl, struct rtattr **tb, inst->alg.halg.base.cra_flags = type; inst->alg.halg.digestsize = salg->digestsize; + inst->alg.halg.statesize = salg->statesize; inst->alg.halg.base.cra_ctxsize = sizeof(struct cryptd_hash_ctx); inst->alg.halg.base.cra_init = cryptd_hash_init_tfm; From f8c07cbc2e72a7e26bff8c5823f6e045eeeb4e16 Mon Sep 17 00:00:00 2001 From: "Wang, Rui Y" Date: Wed, 27 Jan 2016 17:08:36 +0800 Subject: [PATCH 059/521] crypto: mcryptd - Fix load failure commit ddef482420b1ba8ec45e6123a7e8d3f67b21e5e3 upstream. mcryptd_create_hash() fails by returning -EINVAL, causing any driver using mcryptd to fail to load. It is because it needs to set its statesize properly. Signed-off-by: Rui Wang Signed-off-by: Herbert Xu Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- crypto/mcryptd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/mcryptd.c b/crypto/mcryptd.c index a0ceb41d5ccc..b4f3930266b1 100644 --- a/crypto/mcryptd.c +++ b/crypto/mcryptd.c @@ -531,6 +531,7 @@ static int mcryptd_create_hash(struct crypto_template *tmpl, struct rtattr **tb, inst->alg.halg.base.cra_flags = type; inst->alg.halg.digestsize = salg->digestsize; + inst->alg.halg.statesize = salg->statesize; inst->alg.halg.base.cra_ctxsize = sizeof(struct mcryptd_hash_ctx); inst->alg.halg.base.cra_init = mcryptd_hash_init_tfm; From 12e1a3cd11ea373143e957cf9698a26a4e43f4a6 Mon Sep 17 00:00:00 2001 From: "Manoj N. Kumar" Date: Fri, 4 Mar 2016 15:55:20 -0600 Subject: [PATCH 060/521] cxlflash: Increase cmd_per_lun for better throughput commit 83430833b4d4a9c9b23964babbeb1f36450f8136 upstream. With the current value of cmd_per_lun at 16, the throughput over a single adapter is limited to around 150kIOPS. Increase the value of cmd_per_lun to 256 to improve throughput. With this change a single adapter is able to attain close to the maximum throughput (380kIOPS). Also change the number of RRQ entries that can be queued. Signed-off-by: Manoj N. Kumar Acked-by: Matthew R. Ochs Reviewed-by: Uma Krishnan Signed-off-by: Martin K. Petersen Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/cxlflash/common.h | 8 +++++--- drivers/scsi/cxlflash/main.c | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h index 5ada9268a450..a8ac4c0a1493 100644 --- a/drivers/scsi/cxlflash/common.h +++ b/drivers/scsi/cxlflash/common.h @@ -34,7 +34,6 @@ extern const struct file_operations cxlflash_cxl_fops; sectors */ -#define NUM_RRQ_ENTRY 16 /* for master issued cmds */ #define MAX_RHT_PER_CONTEXT (PAGE_SIZE / sizeof(struct sisl_rht_entry)) /* AFU command retry limit */ @@ -48,9 +47,12 @@ extern const struct file_operations cxlflash_cxl_fops; index derivation */ -#define CXLFLASH_MAX_CMDS 16 +#define CXLFLASH_MAX_CMDS 256 #define CXLFLASH_MAX_CMDS_PER_LUN CXLFLASH_MAX_CMDS +/* RRQ for master issued cmds */ +#define NUM_RRQ_ENTRY CXLFLASH_MAX_CMDS + static inline void check_sizes(void) { @@ -149,7 +151,7 @@ struct afu_cmd { struct afu { /* Stuff requiring alignment go first. */ - u64 rrq_entry[NUM_RRQ_ENTRY]; /* 128B RRQ */ + u64 rrq_entry[NUM_RRQ_ENTRY]; /* 2K RRQ */ /* * Command & data for AFU commands. */ diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c index c86847c68448..2882bcac918a 100644 --- a/drivers/scsi/cxlflash/main.c +++ b/drivers/scsi/cxlflash/main.c @@ -2305,7 +2305,7 @@ static struct scsi_host_template driver_template = { .eh_device_reset_handler = cxlflash_eh_device_reset_handler, .eh_host_reset_handler = cxlflash_eh_host_reset_handler, .change_queue_depth = cxlflash_change_queue_depth, - .cmd_per_lun = 16, + .cmd_per_lun = CXLFLASH_MAX_CMDS_PER_LUN, .can_queue = CXLFLASH_MAX_CMDS, .this_id = -1, .sg_tablesize = SG_NONE, /* No scatter gather support */ From 962c66c74184b1c7927f5906c9848e605fe8b236 Mon Sep 17 00:00:00 2001 From: Alex Hung Date: Fri, 27 May 2016 15:47:06 +0800 Subject: [PATCH 061/521] ACPI / video: skip evaluating _DOD when it does not exist commit e34fbbac669de0b7fb7803929d0477f35f6e2833 upstream. Some system supports hybrid graphics and its discrete VGA does not have any connectors and therefore has no _DOD method. Signed-off-by: Alex Hung Reviewed-by: Aaron Lu Signed-off-by: Rafael J. Wysocki Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/acpi_video.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c index 5fdac394207a..549cdbed7b0e 100644 --- a/drivers/acpi/acpi_video.c +++ b/drivers/acpi/acpi_video.c @@ -1211,6 +1211,9 @@ static int acpi_video_device_enumerate(struct acpi_video_bus *video) union acpi_object *dod = NULL; union acpi_object *obj; + if (!video->cap._DOD) + return AE_NOT_EXIST; + status = acpi_evaluate_object(video->device->handle, "_DOD", NULL, &buffer); if (!ACPI_SUCCESS(status)) { ACPI_EXCEPTION((AE_INFO, status, "Evaluating _DOD")); From 3787a071d145055a89442cf614ceec39c315bc9f Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Mon, 22 Aug 2016 14:42:52 +0300 Subject: [PATCH 062/521] pinctrl: cherryview: Do not mask all interrupts in probe commit bcb48cca23ec9852739e4a464307fa29515bbe48 upstream. The Cherryview GPIO controller has 8 or 16 wires connected to the I/O-APIC which can be used directly by the platform/BIOS or drivers. One such wire is used as SCI (System Control Interrupt) which ACPI depends on to be able to trigger GPEs (General Purpose Events). The pinctrl driver itself uses another IRQ resource which is wire OR of all the 8 (or 16) wires and follows what BIOS has programmed to the IntSel register of each pin. Currently the driver masks all interrupts at probe time and this prevents these direct interrupts from working as expected. The reason for this is that some early stage prototypes had some pins misconfigured causing lots of spurious interrupts. We fix this by leaving the interrupt mask untouched. This allows SCI and other direct interrupts work properly. What comes to the possible spurious interrupts we switch the default handler to be handle_bad_irq() instead of handle_simple_irq() (which was not correct anyway). Reported-by: Yu C Chen Reported-by: Anisse Astier Signed-off-by: Mika Westerberg Signed-off-by: Linus Walleij Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/intel/pinctrl-cherryview.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/pinctrl/intel/pinctrl-cherryview.c b/drivers/pinctrl/intel/pinctrl-cherryview.c index a009ae34c5ef..930f0f25c1ce 100644 --- a/drivers/pinctrl/intel/pinctrl-cherryview.c +++ b/drivers/pinctrl/intel/pinctrl-cherryview.c @@ -1466,12 +1466,11 @@ static int chv_gpio_probe(struct chv_pinctrl *pctrl, int irq) offset += range->npins; } - /* Mask and clear all interrupts */ - chv_writel(0, pctrl->regs + CHV_INTMASK); + /* Clear all interrupts */ chv_writel(0xffff, pctrl->regs + CHV_INTSTAT); ret = gpiochip_irqchip_add(chip, &chv_gpio_irqchip, 0, - handle_simple_irq, IRQ_TYPE_NONE); + handle_bad_irq, IRQ_TYPE_NONE); if (ret) { dev_err(pctrl->dev, "failed to add IRQ chip\n"); goto fail; From 0a2512768f1683514ef964e2e0767458baef14de Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Sat, 30 Apr 2016 19:21:35 -0700 Subject: [PATCH 063/521] Drivers: hv: balloon: don't crash when memory is added in non-sorted order commit 77c0c9735bc0ba5898e637a3a20d6bcb50e3f67d upstream. When we iterate through all HA regions in handle_pg_range() we have an assumption that all these regions are sorted in the list and the 'start_pfn >= has->end_pfn' check is enough to find the proper region. Unfortunately it's not the case with WS2016 where host can hot-add regions in a different order. We end up modifying the wrong HA region and crashing later on pages online. Modify the check to make sure we found the region we were searching for while iterating. Fix the same check in pfn_covered() as well. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/hv/hv_balloon.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index b853b4b083bd..43af91362be5 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -714,7 +714,7 @@ static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt) * If the pfn range we are dealing with is not in the current * "hot add block", move on. */ - if ((start_pfn >= has->end_pfn)) + if (start_pfn < has->start_pfn || start_pfn >= has->end_pfn) continue; /* * If the current hot add-request extends beyond @@ -768,7 +768,7 @@ static unsigned long handle_pg_range(unsigned long pg_start, * If the pfn range we are dealing with is not in the current * "hot add block", move on. */ - if ((start_pfn >= has->end_pfn)) + if (start_pfn < has->start_pfn || start_pfn >= has->end_pfn) continue; old_covered_state = has->covered_end_pfn; From b1a0f744f8e63fbef10dc84029e9d213e03a3a18 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Fri, 3 Jun 2016 17:09:22 -0700 Subject: [PATCH 064/521] Drivers: hv: avoid vfree() on crash commit a9f61ca793becabdefab03b77568d6c6f8c1bc79 upstream. When we crash from NMI context (e.g. after NMI injection from host when 'sysctl -w kernel.unknown_nmi_panic=1' is set) we hit kernel BUG at mm/vmalloc.c:1530! as vfree() is denied. While the issue could be solved with in_nmi() check instead I opted for skipping vfree on all sorts of crashes to reduce the amount of work which can cause consequent crashes. We don't really need to free anything on crash. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/hv/hv.c | 8 +++++--- drivers/hv/hyperv_vmbus.h | 2 +- drivers/hv/vmbus_drv.c | 8 ++++---- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c index 57c191798699..ddbf7e7e0d98 100644 --- a/drivers/hv/hv.c +++ b/drivers/hv/hv.c @@ -274,7 +274,7 @@ int hv_init(void) * * This routine is called normally during driver unloading or exiting. */ -void hv_cleanup(void) +void hv_cleanup(bool crash) { union hv_x64_msr_hypercall_contents hypercall_msr; @@ -284,7 +284,8 @@ void hv_cleanup(void) if (hv_context.hypercall_page) { hypercall_msr.as_uint64 = 0; wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); - vfree(hv_context.hypercall_page); + if (!crash) + vfree(hv_context.hypercall_page); hv_context.hypercall_page = NULL; } @@ -304,7 +305,8 @@ void hv_cleanup(void) hypercall_msr.as_uint64 = 0; wrmsrl(HV_X64_MSR_REFERENCE_TSC, hypercall_msr.as_uint64); - vfree(hv_context.tsc_page); + if (!crash) + vfree(hv_context.tsc_page); hv_context.tsc_page = NULL; } #endif diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h index 12156db2e88e..75e383e6d03d 100644 --- a/drivers/hv/hyperv_vmbus.h +++ b/drivers/hv/hyperv_vmbus.h @@ -581,7 +581,7 @@ struct hv_ring_buffer_debug_info { extern int hv_init(void); -extern void hv_cleanup(void); +extern void hv_cleanup(bool crash); extern int hv_post_message(union hv_connection_id connection_id, enum hv_message_type message_type, diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 509ed9731630..802dcb409030 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -889,7 +889,7 @@ static int vmbus_bus_init(int irq) bus_unregister(&hv_bus); err_cleanup: - hv_cleanup(); + hv_cleanup(false); return ret; } @@ -1254,7 +1254,7 @@ static void hv_kexec_handler(void) vmbus_initiate_unload(); for_each_online_cpu(cpu) smp_call_function_single(cpu, hv_synic_cleanup, NULL, 1); - hv_cleanup(); + hv_cleanup(false); }; static void hv_crash_handler(struct pt_regs *regs) @@ -1266,7 +1266,7 @@ static void hv_crash_handler(struct pt_regs *regs) * for kdump. */ hv_synic_cleanup(NULL); - hv_cleanup(); + hv_cleanup(true); }; static int __init hv_acpi_init(void) @@ -1330,7 +1330,7 @@ static void __exit vmbus_exit(void) &hyperv_panic_block); } bus_unregister(&hv_bus); - hv_cleanup(); + hv_cleanup(false); for_each_online_cpu(cpu) { tasklet_kill(hv_context.event_dpc[cpu]); smp_call_function_single(cpu, hv_synic_cleanup, NULL, 1); From e2d9577854f5a5469bcf7a3d1b17ca5e9b9ba673 Mon Sep 17 00:00:00 2001 From: Ross Lagerwall Date: Fri, 22 Apr 2016 13:05:31 +0100 Subject: [PATCH 065/521] xen/qspinlock: Don't kick CPU if IRQ is not initialized commit 707e59ba494372a90d245f18b0c78982caa88e48 upstream. The following commit: 1fb3a8b2cfb2 ("xen/spinlock: Fix locking path engaging too soon under PVHVM.") ... moved the initalization of the kicker interrupt until after native_cpu_up() is called. However, when using qspinlocks, a CPU may try to kick another CPU that is spinning (because it has not yet initialized its kicker interrupt), resulting in the following crash during boot: kernel BUG at /build/linux-Ay7j_C/linux-4.4.0/drivers/xen/events/events_base.c:1210! invalid opcode: 0000 [#1] SMP ... RIP: 0010:[] [] xen_send_IPI_one+0x59/0x60 ... Call Trace: [] xen_qlock_kick+0xe/0x10 [] __pv_queued_spin_unlock+0xb2/0xf0 [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [] ? check_tsc_warp+0x76/0x150 [] check_tsc_sync_source+0x96/0x160 [] native_cpu_up+0x3d8/0x9f0 [] xen_hvm_cpu_up+0x35/0x80 [] _cpu_up+0x13c/0x180 [] cpu_up+0x7a/0xa0 [] smp_init+0x7f/0x81 [] kernel_init_freeable+0xef/0x212 [] ? rest_init+0x80/0x80 [] kernel_init+0xe/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 To fix this, only send the kick if the target CPU's interrupt has been initialized. This check isn't racy, because the target is waiting for the spinlock, so it won't have initialized the interrupt in the meantime. Signed-off-by: Ross Lagerwall Reviewed-by: Boris Ostrovsky Cc: David Vrabel Cc: Juergen Gross Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: Ingo Molnar Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/spinlock.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 9e2ba5c6e1dd..f42e78de1e10 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -27,6 +27,12 @@ static bool xen_pvspin = true; static void xen_qlock_kick(int cpu) { + int irq = per_cpu(lock_kicker_irq, cpu); + + /* Don't kick if the target's kicker interrupt is not initialized. */ + if (irq == -1) + return; + xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); } From 50730d7f361f9915ec7063a629500119b0e8c3b6 Mon Sep 17 00:00:00 2001 From: Thomas Huth Date: Wed, 18 May 2016 21:01:20 +0200 Subject: [PATCH 066/521] KVM: PPC: Book3S PR: Fix illegal opcode emulation commit 708e75a3ee750dce1072134e630d66c4e6eaf63c upstream. If kvmppc_handle_exit_pr() calls kvmppc_emulate_instruction() to emulate one instruction (in the BOOK3S_INTERRUPT_H_EMUL_ASSIST case), it calls kvmppc_core_queue_program() afterwards if kvmppc_emulate_instruction() returned EMULATE_FAIL, so the guest gets an program interrupt for the illegal opcode. However, the kvmppc_emulate_instruction() also tried to inject a program exception for this already, so the program interrupt gets injected twice and the return address in srr0 gets destroyed. All other callers of kvmppc_emulate_instruction() are also injecting a program interrupt, and since the callers have the right knowledge about the srr1 flags that should be used, it is the function kvmppc_emulate_instruction() that should _not_ inject program interrupts, so remove the kvmppc_core_queue_program() here. This fixes the issue discovered by Laurent Vivier with kvm-unit-tests where the logs are filled with these messages when the test tries to execute an illegal instruction: Couldn't emulate instruction 0x00000000 (op 0 xop 0) kvmppc_handle_exit_pr: emulation at 700 failed (00000000) Signed-off-by: Thomas Huth Reviewed-by: Alexander Graf Tested-by: Laurent Vivier Signed-off-by: Paul Mackerras Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kvm/emulate.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index 5cc2e7af3a7b..b379146de55b 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -302,7 +302,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) advance = 0; printk(KERN_ERR "Couldn't emulate instruction 0x%08x " "(op %d xop %d)\n", inst, get_op(inst), get_xop(inst)); - kvmppc_core_queue_program(vcpu, 0); } } From 68ea3948ed3d48dd1e0897b121f37da6f14ffbcc Mon Sep 17 00:00:00 2001 From: Sebastian Ott Date: Fri, 15 Apr 2016 09:41:35 +0200 Subject: [PATCH 067/521] s390/pci: fix use after free in dma_init commit dba599091c191d209b1499511a524ad9657c0e5a upstream. After a failure during registration of the dma_table (because of the function being in error state) we free its memory but don't reset the associated pointer to zero. When we then receive a notification from firmware (about the function being in error state) we'll try to walk and free the dma_table again. Fix this by resetting the dma_table pointer. In addition to that make sure that we free the iommu_bitmap when appropriate. Signed-off-by: Sebastian Ott Reviewed-by: Gerald Schaefer Signed-off-by: Martin Schwidefsky Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- arch/s390/pci/pci_dma.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c index 3a40f718baef..4004e03267cd 100644 --- a/arch/s390/pci/pci_dma.c +++ b/arch/s390/pci/pci_dma.c @@ -455,7 +455,7 @@ int zpci_dma_init_device(struct zpci_dev *zdev) zdev->dma_table = dma_alloc_cpu_table(); if (!zdev->dma_table) { rc = -ENOMEM; - goto out_clean; + goto out; } /* @@ -475,18 +475,22 @@ int zpci_dma_init_device(struct zpci_dev *zdev) zdev->iommu_bitmap = vzalloc(zdev->iommu_pages / 8); if (!zdev->iommu_bitmap) { rc = -ENOMEM; - goto out_reg; + goto free_dma_table; } rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma, (u64) zdev->dma_table); if (rc) - goto out_reg; - return 0; + goto free_bitmap; -out_reg: + return 0; +free_bitmap: + vfree(zdev->iommu_bitmap); + zdev->iommu_bitmap = NULL; +free_dma_table: dma_free_cpu_table(zdev->dma_table); -out_clean: + zdev->dma_table = NULL; +out: return rc; } From 13a26889cbc1eb8a7b9a7712c05538c55659fe40 Mon Sep 17 00:00:00 2001 From: Dave Airlie Date: Thu, 14 Jan 2016 08:07:55 +1000 Subject: [PATCH 068/521] drm/amdgpu: add missing irq.h include commit e9c5e7402dad6f4f04c2430db6f283512bcd4392 upstream. this fixes the build on arm. Signed-off-by: Dave Airlie Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 7c42ff670080..a0924330d125 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -25,6 +25,7 @@ * Alex Deucher * Jerome Glisse */ +#include #include #include #include From cea050150323a2c09efc316f0272af053e0b87e2 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 25 Nov 2015 14:05:30 -0700 Subject: [PATCH 069/521] tpm_tis: Use devm_free_irq not free_irq commit 727f28b8ca24a581c7bd868326b8cea1058c720a upstream. The interrupt is always allocated with devm_request_irq so it must always be freed with devm_free_irq. Fixes: 448e9c55c12d ("tpm_tis: verify interrupt during init") Signed-off-by: Jason Gunthorpe Acked-by: Jarkko Sakkinen Tested-by: Jarkko Sakkinen Tested-by: Martin Wilck Signed-off-by: Jarkko Sakkinen Acked-by: Peter Huewe Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/char/tpm/tpm_tis.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 65f7eecc45b0..f10a107614b4 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -401,7 +401,7 @@ static void disable_interrupts(struct tpm_chip *chip) iowrite32(intmask, chip->vendor.iobase + TPM_INT_ENABLE(chip->vendor.locality)); - free_irq(chip->vendor.irq, chip); + devm_free_irq(chip->pdev, chip->vendor.irq, chip); chip->vendor.irq = 0; } From 6cc5b73d79697e1a529249572ac022192f1ddffd Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Mon, 25 Jan 2016 16:00:41 +0100 Subject: [PATCH 070/521] hv_netvsc: use skb_get_hash() instead of a homegrown implementation commit 757647e10e55c01fb7a9c4356529442e316a7c72 upstream. Recent changes to 'struct flow_keys' (e.g commit d34af823ff40 ("net: Add VLAN ID to flow_keys")) introduced a performance regression in netvsc driver. Is problem is, however, not the above mentioned commit but the fact that netvsc_set_hash() function did some assumptions on the struct flow_keys data layout and this is wrong. Get rid of netvsc_set_hash() by switching to skb_get_hash(). This change will also imply switching to Jenkins hash from the currently used Toeplitz but it seems there is no good excuse for Toeplitz to stay. Signed-off-by: Vitaly Kuznetsov Acked-by: Eric Dumazet Signed-off-by: David S. Miller Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/net/hyperv/netvsc_drv.c | 67 ++------------------------------- 1 file changed, 3 insertions(+), 64 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index e8a09ff9e724..c8a7802d2953 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -197,65 +197,6 @@ static void *init_ppi_data(struct rndis_message *msg, u32 ppi_size, return ppi; } -union sub_key { - u64 k; - struct { - u8 pad[3]; - u8 kb; - u32 ka; - }; -}; - -/* Toeplitz hash function - * data: network byte order - * return: host byte order - */ -static u32 comp_hash(u8 *key, int klen, void *data, int dlen) -{ - union sub_key subk; - int k_next = 4; - u8 dt; - int i, j; - u32 ret = 0; - - subk.k = 0; - subk.ka = ntohl(*(u32 *)key); - - for (i = 0; i < dlen; i++) { - subk.kb = key[k_next]; - k_next = (k_next + 1) % klen; - dt = ((u8 *)data)[i]; - for (j = 0; j < 8; j++) { - if (dt & 0x80) - ret ^= subk.ka; - dt <<= 1; - subk.k <<= 1; - } - } - - return ret; -} - -static bool netvsc_set_hash(u32 *hash, struct sk_buff *skb) -{ - struct flow_keys flow; - int data_len; - - if (!skb_flow_dissect_flow_keys(skb, &flow, 0) || - !(flow.basic.n_proto == htons(ETH_P_IP) || - flow.basic.n_proto == htons(ETH_P_IPV6))) - return false; - - if (flow.basic.ip_proto == IPPROTO_TCP) - data_len = 12; - else - data_len = 8; - - *hash = comp_hash(netvsc_hash_key, HASH_KEYLEN, &flow, data_len); - - return true; -} - static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb, void *accel_priv, select_queue_fallback_t fallback) { @@ -268,11 +209,9 @@ static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb, if (nvsc_dev == NULL || ndev->real_num_tx_queues <= 1) return 0; - if (netvsc_set_hash(&hash, skb)) { - q_idx = nvsc_dev->send_table[hash % VRSS_SEND_TAB_SIZE] % - ndev->real_num_tx_queues; - skb_set_hash(skb, hash, PKT_HASH_TYPE_L3); - } + hash = skb_get_hash(skb); + q_idx = nvsc_dev->send_table[hash % VRSS_SEND_TAB_SIZE] % + ndev->real_num_tx_queues; return q_idx; } From 6052eb871217c0679ac63779fc5e43eb49c83b0c Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Mon, 23 May 2016 16:24:05 -0700 Subject: [PATCH 071/521] kernek/fork.c: allocate idle task for a CPU always on its local node commit 725fc629ff2545b061407305ae51016c9f928fce upstream. Linux preallocates the task structs of the idle tasks for all possible CPUs. This currently means they all end up on node 0. This also implies that the cache line of MWAIT, which is around the flags field in the task struct, are all located in node 0. We see a noticeable performance improvement on Knights Landing CPUs when the cache lines used for MWAIT are located in the local nodes of the CPUs using them. I would expect this to give a (likely slight) improvement on other systems too. The patch implements placing the idle task in the node of its CPUs, by passing the right target node to copy_process() [akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1] Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.org Signed-off-by: Andi Kleen Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- kernel/fork.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 2e55b53399de..278a2ddad351 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -331,13 +331,14 @@ void set_task_stack_end_magic(struct task_struct *tsk) *stackend = STACK_END_MAGIC; /* for overflow detection */ } -static struct task_struct *dup_task_struct(struct task_struct *orig) +static struct task_struct *dup_task_struct(struct task_struct *orig, int node) { struct task_struct *tsk; struct thread_info *ti; - int node = tsk_fork_get_node(orig); int err; + if (node == NUMA_NO_NODE) + node = tsk_fork_get_node(orig); tsk = alloc_task_struct_node(node); if (!tsk) return NULL; @@ -1270,7 +1271,8 @@ static struct task_struct *copy_process(unsigned long clone_flags, int __user *child_tidptr, struct pid *pid, int trace, - unsigned long tls) + unsigned long tls, + int node) { int retval; struct task_struct *p; @@ -1323,7 +1325,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, goto fork_out; retval = -ENOMEM; - p = dup_task_struct(current); + p = dup_task_struct(current, node); if (!p) goto fork_out; @@ -1699,7 +1701,8 @@ static inline void init_idle_pids(struct pid_link *links) struct task_struct *fork_idle(int cpu) { struct task_struct *task; - task = copy_process(CLONE_VM, 0, 0, NULL, &init_struct_pid, 0, 0); + task = copy_process(CLONE_VM, 0, 0, NULL, &init_struct_pid, 0, 0, + cpu_to_node(cpu)); if (!IS_ERR(task)) { init_idle_pids(task->pids); init_idle(task, cpu); @@ -1744,7 +1747,7 @@ long _do_fork(unsigned long clone_flags, } p = copy_process(clone_flags, stack_start, stack_size, - child_tidptr, NULL, trace, tls); + child_tidptr, NULL, trace, tls, NUMA_NO_NODE); /* * Do this prior waking up the new thread - the thread pointer * might get invalid after that point, if the thread exits quickly. From 4cb0c0b73d1c79a8ce260836b3f27650aa1c57f1 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 2 Mar 2017 12:17:22 -0800 Subject: [PATCH 072/521] give up on gcc ilog2() constant optimizations commit 474c90156c8dcc2fa815e6716cc9394d7930cb9c upstream. gcc-7 has an "optimization" pass that completely screws up, and generates the code expansion for the (impossible) case of calling ilog2() with a zero constant, even when the code gcc compiles does not actually have a zero constant. And we try to generate a compile-time error for anybody doing ilog2() on a constant where that doesn't make sense (be it zero or negative). So now gcc7 will fail the build due to our sanity checking, because it created that constant-zero case that didn't actually exist in the source code. There's a whole long discussion on the kernel mailing about how to work around this gcc bug. The gcc people themselevs have discussed their "feature" in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785 but it's all water under the bridge, because while it looked at one point like it would be solved by the time gcc7 was released, that was not to be. So now we have to deal with this compiler braindamage. And the only simple approach seems to be to just delete the code that tries to warn about bad uses of ilog2(). So now "ilog2()" will just return 0 not just for the value 1, but for any non-positive value too. It's not like I can recall anybody having ever actually tried to use this function on any invalid value, but maybe the sanity check just meant that such code never made it out in public. Reported-by: Laura Abbott Cc: John Stultz , Cc: Thomas Gleixner Cc: Ard Biesheuvel Signed-off-by: Linus Torvalds Cc: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- include/linux/log2.h | 13 ++----------- tools/include/linux/log2.h | 13 ++----------- 2 files changed, 4 insertions(+), 22 deletions(-) diff --git a/include/linux/log2.h b/include/linux/log2.h index fd7ff3d91e6a..f38fae23bdac 100644 --- a/include/linux/log2.h +++ b/include/linux/log2.h @@ -15,12 +15,6 @@ #include #include -/* - * deal with unrepresentable constant logarithms - */ -extern __attribute__((const, noreturn)) -int ____ilog2_NaN(void); - /* * non-constant log of base 2 calculators * - the arch may override these in asm/bitops.h if they can be implemented @@ -85,7 +79,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n) #define ilog2(n) \ ( \ __builtin_constant_p(n) ? ( \ - (n) < 1 ? ____ilog2_NaN() : \ + (n) < 2 ? 0 : \ (n) & (1ULL << 63) ? 63 : \ (n) & (1ULL << 62) ? 62 : \ (n) & (1ULL << 61) ? 61 : \ @@ -148,10 +142,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n) (n) & (1ULL << 4) ? 4 : \ (n) & (1ULL << 3) ? 3 : \ (n) & (1ULL << 2) ? 2 : \ - (n) & (1ULL << 1) ? 1 : \ - (n) & (1ULL << 0) ? 0 : \ - ____ilog2_NaN() \ - ) : \ + 1 ) : \ (sizeof(n) <= 4) ? \ __ilog2_u32(n) : \ __ilog2_u64(n) \ diff --git a/tools/include/linux/log2.h b/tools/include/linux/log2.h index 41446668ccce..d5677d39c1e4 100644 --- a/tools/include/linux/log2.h +++ b/tools/include/linux/log2.h @@ -12,12 +12,6 @@ #ifndef _TOOLS_LINUX_LOG2_H #define _TOOLS_LINUX_LOG2_H -/* - * deal with unrepresentable constant logarithms - */ -extern __attribute__((const, noreturn)) -int ____ilog2_NaN(void); - /* * non-constant log of base 2 calculators * - the arch may override these in asm/bitops.h if they can be implemented @@ -78,7 +72,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n) #define ilog2(n) \ ( \ __builtin_constant_p(n) ? ( \ - (n) < 1 ? ____ilog2_NaN() : \ + (n) < 2 ? 0 : \ (n) & (1ULL << 63) ? 63 : \ (n) & (1ULL << 62) ? 62 : \ (n) & (1ULL << 61) ? 61 : \ @@ -141,10 +135,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n) (n) & (1ULL << 4) ? 4 : \ (n) & (1ULL << 3) ? 3 : \ (n) & (1ULL << 2) ? 2 : \ - (n) & (1ULL << 1) ? 1 : \ - (n) & (1ULL << 0) ? 0 : \ - ____ilog2_NaN() \ - ) : \ + 1 ) : \ (sizeof(n) <= 4) ? \ __ilog2_u32(n) : \ __ilog2_u64(n) \ From f02729f2ab87c84bbc959e7631487a4b84dbdf63 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 16 Mar 2017 13:47:49 +0100 Subject: [PATCH 073/521] perf/core: Fix event inheritance on fork() commit e7cc4865f0f31698ef2f7aac01a50e78968985b7 upstream. While hunting for clues to a use-after-free, Oleg spotted that perf_event_init_context() can loose an error value with the result that fork() can succeed even though we did not fully inherit the perf event context. Spotted-by: Oleg Nesterov Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Dmitry Vyukov Cc: Frederic Weisbecker Cc: Jiri Olsa Cc: Linus Torvalds Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: oleg@redhat.com Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists") Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/events/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 9bbe9ac23cf2..e4b5494f05f8 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9230,7 +9230,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn) ret = inherit_task_group(event, parent, parent_ctx, child, ctxn, &inherited_all); if (ret) - break; + goto out_unlock; } /* @@ -9246,7 +9246,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn) ret = inherit_task_group(event, parent, parent_ctx, child, ctxn, &inherited_all); if (ret) - break; + goto out_unlock; } raw_spin_lock_irqsave(&parent_ctx->lock, flags); @@ -9274,6 +9274,7 @@ static int perf_event_init_context(struct task_struct *child, int ctxn) } raw_spin_unlock_irqrestore(&parent_ctx->lock, flags); +out_unlock: mutex_unlock(&parent_ctx->mutex); perf_unpin_context(parent_ctx); From 09875d1393d4589bcdfeeba8747a12dd69810cc9 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 15 Mar 2017 00:12:16 +0100 Subject: [PATCH 074/521] cpufreq: Fix and clean up show_cpuinfo_cur_freq() commit 9b4f603e7a9f4282aec451063ffbbb8bb410dcd9 upstream. There is a missing newline in show_cpuinfo_cur_freq(), so add it, but while at it clean that function up somewhat too. Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar Signed-off-by: Greg Kroah-Hartman --- drivers/cpufreq/cpufreq.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 8412ce5f93a7..86fa9fdc8323 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -626,9 +626,11 @@ static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy, char *buf) { unsigned int cur_freq = __cpufreq_get(policy); - if (!cur_freq) - return sprintf(buf, ""); - return sprintf(buf, "%u\n", cur_freq); + + if (cur_freq) + return sprintf(buf, "%u\n", cur_freq); + + return sprintf(buf, "\n"); } /** From b24473976b985fd1c1d57a9ea934f9792bf654cc Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Tue, 7 Mar 2017 16:14:49 +1100 Subject: [PATCH 075/521] powerpc/boot: Fix zImage TOC alignment commit 97ee351b50a49717543533cfb85b4bf9d88c9680 upstream. Recent toolchains force the TOC to be 256 byte aligned. We need to enforce this alignment in the zImage linker script, otherwise pointers to our TOC variables (__toc_start) could be incorrect. If the actual start of the TOC and __toc_start don't have the same value we crash early in the zImage wrapper. Suggested-by: Alan Modra Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/boot/zImage.lds.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/boot/zImage.lds.S b/arch/powerpc/boot/zImage.lds.S index 861e72109df2..f080abfc2f83 100644 --- a/arch/powerpc/boot/zImage.lds.S +++ b/arch/powerpc/boot/zImage.lds.S @@ -68,6 +68,7 @@ SECTIONS } #ifdef CONFIG_PPC64_BOOT_WRAPPER + . = ALIGN(256); .got : { __toc_start = .; From 582f548924cdda2dadf842020075f6b2525421d2 Mon Sep 17 00:00:00 2001 From: Shaohua Li Date: Tue, 28 Feb 2017 13:00:20 -0800 Subject: [PATCH 076/521] md/raid1/10: fix potential deadlock commit 61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3 upstream. Neil Brown pointed out a potential deadlock in raid 10 code with bio_split/chain. The raid1 code could have the same issue, but recent barrier rework makes it less likely to happen. The deadlock happens in below sequence: 1. generic_make_request(bio), this will set current->bio_list 2. raid10_make_request will split bio to bio1 and bio2 3. __make_request(bio1), wait_barrer, add underlayer disk bio to current->bio_list 4. __make_request(bio2), wait_barrer If raise_barrier happens between 3 & 4, since wait_barrier runs at 3, raise_barrier waits for IO completion from 3. And since raise_barrier sets barrier, 4 waits for raise_barrier. But IO from 3 can't be dispatched because raid10_make_request() doesn't finished yet. The solution is to adjust the IO ordering. Quotes from Neil: " It is much safer to: if (need to split) { split = bio_split(bio, ...) bio_chain(...) make_request_fn(split); generic_make_request(bio); } else make_request_fn(mddev, bio); This way we first process the initial section of the bio (in 'split') which will queue some requests to the underlying devices. These requests will be queued in generic_make_request. Then we queue the remainder of the bio, which will be added to the end of the generic_make_request queue. Then we return. generic_make_request() will pop the lower-level device requests off the queue and handle them first. Then it will process the remainder of the original bio once the first section has been fully processed. " Note, this only happens in read path. In write path, the bio is flushed to underlaying disks either by blk flush (from schedule) or offladed to raid1/10d. It's queued in current->bio_list. Cc: Coly Li Suggested-by: NeilBrown Reviewed-by: Jack Wang Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid10.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index ebb0dd612ebd..122af340a531 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1477,7 +1477,25 @@ static void make_request(struct mddev *mddev, struct bio *bio) split = bio; } + /* + * If a bio is splitted, the first part of bio will pass + * barrier but the bio is queued in current->bio_list (see + * generic_make_request). If there is a raise_barrier() called + * here, the second part of bio can't pass barrier. But since + * the first part bio isn't dispatched to underlaying disks + * yet, the barrier is never released, hence raise_barrier will + * alays wait. We have a deadlock. + * Note, this only happens in read path. For write path, the + * first part of bio is dispatched in a schedule() call + * (because of blk plug) or offloaded to raid10d. + * Quitting from the function immediately can change the bio + * order queued in bio_list and avoid the deadlock. + */ __make_request(mddev, split); + if (split != bio && bio_data_dir(bio) == READ) { + generic_make_request(bio); + break; + } } while (split != bio); /* In case raid10d snuck in to freeze_array */ From 0a621633cdfa780c50907506457798c907cb8110 Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger Date: Thu, 3 Nov 2016 23:06:53 -0700 Subject: [PATCH 077/521] target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export commit a04e54f2c35823ca32d56afcd5cea5b783e2f51a upstream. The following fixes a divide by zero OOPs with TYPE_TAPE due to pscsi_tape_read_blocksize() failing causing a zero sd->sector_size being propigated up via dev_attrib.hw_block_size. It also fixes another long-standing bug where TYPE_TAPE and TYPE_MEDIMUM_CHANGER where using pscsi_create_type_other(), which does not call scsi_device_get() to take the device reference. Instead, rename pscsi_create_type_rom() to pscsi_create_type_nondisk() and use it for all cases. Finally, also drop a dump_stack() in pscsi_get_blocks() for non TYPE_DISK, which in modern target-core can get invoked via target_sense_desc_format() during CHECK_CONDITION. Reported-by: Malcolm Haak Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman --- drivers/target/target_core_pscsi.c | 47 ++++++++---------------------- 1 file changed, 12 insertions(+), 35 deletions(-) diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c index de18790eb21c..d72a4058fd08 100644 --- a/drivers/target/target_core_pscsi.c +++ b/drivers/target/target_core_pscsi.c @@ -154,7 +154,7 @@ static void pscsi_tape_read_blocksize(struct se_device *dev, buf = kzalloc(12, GFP_KERNEL); if (!buf) - return; + goto out_free; memset(cdb, 0, MAX_COMMAND_SIZE); cdb[0] = MODE_SENSE; @@ -169,9 +169,10 @@ static void pscsi_tape_read_blocksize(struct se_device *dev, * If MODE_SENSE still returns zero, set the default value to 1024. */ sdev->sector_size = (buf[9] << 16) | (buf[10] << 8) | (buf[11]); +out_free: if (!sdev->sector_size) sdev->sector_size = 1024; -out_free: + kfree(buf); } @@ -314,9 +315,10 @@ static int pscsi_add_device_to_list(struct se_device *dev, sd->lun, sd->queue_depth); } - dev->dev_attrib.hw_block_size = sd->sector_size; + dev->dev_attrib.hw_block_size = + min_not_zero((int)sd->sector_size, 512); dev->dev_attrib.hw_max_sectors = - min_t(int, sd->host->max_sectors, queue_max_hw_sectors(q)); + min_not_zero(sd->host->max_sectors, queue_max_hw_sectors(q)); dev->dev_attrib.hw_queue_depth = sd->queue_depth; /* @@ -339,8 +341,10 @@ static int pscsi_add_device_to_list(struct se_device *dev, /* * For TYPE_TAPE, attempt to determine blocksize with MODE_SENSE. */ - if (sd->type == TYPE_TAPE) + if (sd->type == TYPE_TAPE) { pscsi_tape_read_blocksize(dev, sd); + dev->dev_attrib.hw_block_size = sd->sector_size; + } return 0; } @@ -406,7 +410,7 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd) /* * Called with struct Scsi_Host->host_lock called. */ -static int pscsi_create_type_rom(struct se_device *dev, struct scsi_device *sd) +static int pscsi_create_type_nondisk(struct se_device *dev, struct scsi_device *sd) __releases(sh->host_lock) { struct pscsi_hba_virt *phv = dev->se_hba->hba_ptr; @@ -433,28 +437,6 @@ static int pscsi_create_type_rom(struct se_device *dev, struct scsi_device *sd) return 0; } -/* - * Called with struct Scsi_Host->host_lock called. - */ -static int pscsi_create_type_other(struct se_device *dev, - struct scsi_device *sd) - __releases(sh->host_lock) -{ - struct pscsi_hba_virt *phv = dev->se_hba->hba_ptr; - struct Scsi_Host *sh = sd->host; - int ret; - - spin_unlock_irq(sh->host_lock); - ret = pscsi_add_device_to_list(dev, sd); - if (ret) - return ret; - - pr_debug("CORE_PSCSI[%d] - Added Type: %s for %d:%d:%d:%llu\n", - phv->phv_host_id, scsi_device_type(sd->type), sh->host_no, - sd->channel, sd->id, sd->lun); - return 0; -} - static int pscsi_configure_device(struct se_device *dev) { struct se_hba *hba = dev->se_hba; @@ -542,11 +524,8 @@ static int pscsi_configure_device(struct se_device *dev) case TYPE_DISK: ret = pscsi_create_type_disk(dev, sd); break; - case TYPE_ROM: - ret = pscsi_create_type_rom(dev, sd); - break; default: - ret = pscsi_create_type_other(dev, sd); + ret = pscsi_create_type_nondisk(dev, sd); break; } @@ -611,8 +590,7 @@ static void pscsi_free_device(struct se_device *dev) else if (pdv->pdv_lld_host) scsi_host_put(pdv->pdv_lld_host); - if ((sd->type == TYPE_DISK) || (sd->type == TYPE_ROM)) - scsi_device_put(sd); + scsi_device_put(sd); pdv->pdv_sd = NULL; } @@ -1088,7 +1066,6 @@ static sector_t pscsi_get_blocks(struct se_device *dev) if (pdv->pdv_bd && pdv->pdv_bd->bd_part) return pdv->pdv_bd->bd_part->nr_sects; - dump_stack(); return 0; } From 82bd06aba880215dadd3e33dc3d583d32df9dbbb Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Mon, 13 Feb 2017 08:49:20 +1100 Subject: [PATCH 078/521] scsi: lpfc: Add shutdown method for kexec commit 85e8a23936ab3442de0c42da97d53b29f004ece1 upstream. We see lpfc devices regularly fail during kexec. Fix this by adding a shutdown method which mirrors the remove method. Signed-off-by: Anton Blanchard Reviewed-by: Mauricio Faria de Oliveira Tested-by: Mauricio Faria de Oliveira Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/lpfc/lpfc_init.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index c14ab6c3ae40..60c21093f865 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -11387,6 +11387,7 @@ static struct pci_driver lpfc_driver = { .id_table = lpfc_id_table, .probe = lpfc_pci_probe_one, .remove = lpfc_pci_remove_one, + .shutdown = lpfc_pci_remove_one, .suspend = lpfc_pci_suspend_one, .resume = lpfc_pci_resume_one, .err_handler = &lpfc_err_handler, From 246760c61d9c4c0114ba5bd324df4ae17468e238 Mon Sep 17 00:00:00 2001 From: Chris Leech Date: Mon, 27 Feb 2017 16:58:36 -0800 Subject: [PATCH 079/521] scsi: libiscsi: add lock around task lists to fix list corruption regression commit 6f8830f5bbab16e54f261de187f3df4644a5b977 upstream. There's a rather long standing regression from the commit "libiscsi: Reduce locking contention in fast path" Depending on iSCSI target behavior, it's possible to hit the case in iscsi_complete_task where the task is still on a pending list (!list_empty(&task->running)). When that happens the task is removed from the list while holding the session back_lock, but other task list modification occur under the frwd_lock. That leads to linked list corruption and eventually a panicked system. Rather than back out the session lock split entirely, in order to try and keep some of the performance gains this patch adds another lock to maintain the task lists integrity. Major enterprise supported kernels have been backing out the lock split for while now, thanks to the efforts at IBM where a lab setup has the most reliable reproducer I've seen on this issue. This patch has been tested there successfully. Signed-off-by: Chris Leech Fixes: 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast path") Reported-by: Prashantha Subbarao Reviewed-by: Guilherme G. Piccoli Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/libiscsi.c | 26 +++++++++++++++++++++++++- include/scsi/libiscsi.h | 1 + 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 6bffd91b973a..c1ccf1ee99ea 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task, int state) WARN_ON_ONCE(task->state == ISCSI_TASK_FREE); task->state = state; - if (!list_empty(&task->running)) + spin_lock_bh(&conn->taskqueuelock); + if (!list_empty(&task->running)) { + pr_debug_once("%s while task on list", __func__); list_del_init(&task->running); + } + spin_unlock_bh(&conn->taskqueuelock); if (conn->task == task) conn->task = NULL; @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, if (session->tt->xmit_task(task)) goto free_task; } else { + spin_lock_bh(&conn->taskqueuelock); list_add_tail(&task->running, &conn->mgmtqueue); + spin_unlock_bh(&conn->taskqueuelock); iscsi_conn_queue_work(conn); } @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task) * this may be on the requeue list already if the xmit_task callout * is handling the r2ts while we are adding new ones */ + spin_lock_bh(&conn->taskqueuelock); if (list_empty(&task->running)) list_add_tail(&task->running, &conn->requeue); + spin_unlock_bh(&conn->taskqueuelock); iscsi_conn_queue_work(conn); } EXPORT_SYMBOL_GPL(iscsi_requeue_task); @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) * only have one nop-out as a ping from us and targets should not * overflow us with nop-ins */ + spin_lock_bh(&conn->taskqueuelock); check_mgmt: while (!list_empty(&conn->mgmtqueue)) { conn->task = list_entry(conn->mgmtqueue.next, struct iscsi_task, running); list_del_init(&conn->task->running); + spin_unlock_bh(&conn->taskqueuelock); if (iscsi_prep_mgmt_task(conn, conn->task)) { /* regular RX path uses back_lock */ spin_lock_bh(&conn->session->back_lock); __iscsi_put_task(conn->task); spin_unlock_bh(&conn->session->back_lock); conn->task = NULL; + spin_lock_bh(&conn->taskqueuelock); continue; } rc = iscsi_xmit_task(conn); if (rc) goto done; + spin_lock_bh(&conn->taskqueuelock); } /* process pending command queue */ @@ -1535,19 +1547,24 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) conn->task = list_entry(conn->cmdqueue.next, struct iscsi_task, running); list_del_init(&conn->task->running); + spin_unlock_bh(&conn->taskqueuelock); if (conn->session->state == ISCSI_STATE_LOGGING_OUT) { fail_scsi_task(conn->task, DID_IMM_RETRY); + spin_lock_bh(&conn->taskqueuelock); continue; } rc = iscsi_prep_scsi_cmd_pdu(conn->task); if (rc) { if (rc == -ENOMEM || rc == -EACCES) { + spin_lock_bh(&conn->taskqueuelock); list_add_tail(&conn->task->running, &conn->cmdqueue); conn->task = NULL; + spin_unlock_bh(&conn->taskqueuelock); goto done; } else fail_scsi_task(conn->task, DID_ABORT); + spin_lock_bh(&conn->taskqueuelock); continue; } rc = iscsi_xmit_task(conn); @@ -1558,6 +1575,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) * we need to check the mgmt queue for nops that need to * be sent to aviod starvation */ + spin_lock_bh(&conn->taskqueuelock); if (!list_empty(&conn->mgmtqueue)) goto check_mgmt; } @@ -1577,12 +1595,15 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) conn->task = task; list_del_init(&conn->task->running); conn->task->state = ISCSI_TASK_RUNNING; + spin_unlock_bh(&conn->taskqueuelock); rc = iscsi_xmit_task(conn); if (rc) goto done; + spin_lock_bh(&conn->taskqueuelock); if (!list_empty(&conn->mgmtqueue)) goto check_mgmt; } + spin_unlock_bh(&conn->taskqueuelock); spin_unlock_bh(&conn->session->frwd_lock); return -ENODATA; @@ -1738,7 +1759,9 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc) goto prepd_reject; } } else { + spin_lock_bh(&conn->taskqueuelock); list_add_tail(&task->running, &conn->cmdqueue); + spin_unlock_bh(&conn->taskqueuelock); iscsi_conn_queue_work(conn); } @@ -2900,6 +2923,7 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session, int dd_size, INIT_LIST_HEAD(&conn->mgmtqueue); INIT_LIST_HEAD(&conn->cmdqueue); INIT_LIST_HEAD(&conn->requeue); + spin_lock_init(&conn->taskqueuelock); INIT_WORK(&conn->xmitwork, iscsi_xmitworker); /* allocate login_task used for the login/text sequences */ diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h index 4d1c46aac331..c7b1dc713cdd 100644 --- a/include/scsi/libiscsi.h +++ b/include/scsi/libiscsi.h @@ -196,6 +196,7 @@ struct iscsi_conn { struct iscsi_task *task; /* xmit task in progress */ /* xmit */ + spinlock_t taskqueuelock; /* protects the next three lists */ struct list_head mgmtqueue; /* mgmt (control) xmit queue */ struct list_head cmdqueue; /* data-path cmd queue */ struct list_head requeue; /* tasks needing another run */ From d267ecbdfdb4199c0e3a967ecc17a6b80d95209a Mon Sep 17 00:00:00 2001 From: Max Lohrmann Date: Tue, 7 Mar 2017 22:09:56 -0800 Subject: [PATCH 080/521] target: Fix VERIFY_16 handling in sbc_parse_cdb commit 13603685c1f12c67a7a2427f00b63f39a2b6f7c9 upstream. As reported by Max, the Windows 2008 R2 chkdsk utility expects VERIFY_16 to be supported, and does not handle the returned CHECK_CONDITION properly, resulting in an infinite loop. The kernel will log huge amounts of this error: kernel: TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x8f, sending CHECK_CONDITION. Signed-off-by: Max Lohrmann Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman --- drivers/target/target_core_sbc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index 2e27b1034ede..90c5dffc9fa4 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -1096,9 +1096,15 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) return ret; break; case VERIFY: + case VERIFY_16: size = 0; - sectors = transport_get_sectors_10(cdb); - cmd->t_task_lba = transport_lba_32(cdb); + if (cdb[0] == VERIFY) { + sectors = transport_get_sectors_10(cdb); + cmd->t_task_lba = transport_lba_32(cdb); + } else { + sectors = transport_get_sectors_16(cdb); + cmd->t_task_lba = transport_lba_64(cdb); + } cmd->execute_cmd = sbc_emulate_noop; goto check_lba; case REZERO_UNIT: From 4f47ca4882564c4b76cc9c426583a49d23893dda Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Mar 2017 13:39:01 +0100 Subject: [PATCH 081/521] isdn/gigaset: fix NULL-deref at probe commit 68c32f9c2a36d410aa242e661506e5b2c2764179 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Fixes: cf7776dc05b8 ("[PATCH] isdn4linux: Siemens Gigaset drivers - direct USB connection") Cc: Hansjoerg Lipp Signed-off-by: Johan Hovold Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/isdn/gigaset/bas-gigaset.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/isdn/gigaset/bas-gigaset.c b/drivers/isdn/gigaset/bas-gigaset.c index aecec6d32463..7f1c625b08ec 100644 --- a/drivers/isdn/gigaset/bas-gigaset.c +++ b/drivers/isdn/gigaset/bas-gigaset.c @@ -2317,6 +2317,9 @@ static int gigaset_probe(struct usb_interface *interface, return -ENODEV; } + if (hostif->desc.bNumEndpoints < 1) + return -ENODEV; + dev_info(&udev->dev, "%s: Device matched (Vendor: 0x%x, Product: 0x%x)\n", __func__, le16_to_cpu(udev->descriptor.idVendor), From e08f608ab4288f4192a504e6c94dd7c9c931dad8 Mon Sep 17 00:00:00 2001 From: Andreas Gruenbacher Date: Mon, 6 Mar 2017 12:58:42 -0500 Subject: [PATCH 082/521] gfs2: Avoid alignment hole in struct lm_lockname commit 28ea06c46fbcab63fd9a55531387b7928a18a590 upstream. Commit 88ffbf3e03 switches to using rhashtables for glocks, hashing over the entire struct lm_lockname instead of its individual fields. On some architectures, struct lm_lockname contains a hole of uninitialized memory due to alignment rules, which now leads to incorrect hash values. Get rid of that hole. Signed-off-by: Andreas Gruenbacher Signed-off-by: Bob Peterson Signed-off-by: Greg Kroah-Hartman --- fs/gfs2/incore.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index de7b4f97ac75..be519416c112 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -207,7 +207,7 @@ struct lm_lockname { struct gfs2_sbd *ln_sbd; u64 ln_number; unsigned int ln_type; -}; +} __packed __aligned(sizeof(int)); #define lm_name_equal(name1, name2) \ (((name1)->ln_number == (name2)->ln_number) && \ From d88b83e66bbf588a5d85168d9839501cd47fe561 Mon Sep 17 00:00:00 2001 From: Tahsin Erdogan Date: Sat, 25 Feb 2017 13:00:19 -0800 Subject: [PATCH 083/521] percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages commit 320661b08dd6f1746d5c7ab4eb435ec64b97cd45 upstream. Update to pcpu_nr_empty_pop_pages in pcpu_alloc() is currently done without holding pcpu_lock. This can lead to bad updates to the variable. Add missing lock calls. Fixes: b539b87fed37 ("percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated") Signed-off-by: Tahsin Erdogan Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman --- mm/percpu.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/percpu.c b/mm/percpu.c index 1f376bce413c..ef6353f0adbd 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1012,8 +1012,11 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, mutex_unlock(&pcpu_alloc_mutex); } - if (chunk != pcpu_reserved_chunk) + if (chunk != pcpu_reserved_chunk) { + spin_lock_irqsave(&pcpu_lock, flags); pcpu_nr_empty_pop_pages -= occ_pages; + spin_unlock_irqrestore(&pcpu_lock, flags); + } if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW) pcpu_schedule_balance_work(); From 5fa513cb07213608907d4daa123b81e5a32d13e0 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Wed, 15 Feb 2017 01:26:39 -0500 Subject: [PATCH 084/521] ext4: fix fencepost in s_first_meta_bg validation commit 2ba3e6e8afc9b6188b471f27cf2b5e3cf34e7af2 upstream. It is OK for s_first_meta_bg to be equal to the number of block group descriptor blocks. (It rarely happens, but it shouldn't cause any problems.) https://bugzilla.kernel.org/show_bug.cgi?id=194567 Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe Signed-off-by: Theodore Ts'o Cc: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 6fe8e30eeb99..68345a9e59b8 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3666,7 +3666,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) / EXT4_DESC_PER_BLOCK(sb); if (ext4_has_feature_meta_bg(sb)) { - if (le32_to_cpu(es->s_first_meta_bg) >= db_count) { + if (le32_to_cpu(es->s_first_meta_bg) > db_count) { ext4_msg(sb, KERN_WARNING, "first meta block group too large: %u " "(group descriptor block count %u)", From a5c3f390eb7799c3d1d92121382372b1fd365fa3 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 26 Mar 2017 12:13:55 +0200 Subject: [PATCH 085/521] Linux 4.4.57 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index cf9303a5d621..841675e63a38 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 56 +SUBLEVEL = 57 EXTRAVERSION = NAME = Blurry Fish Butt From b362d6735156add0e43b1221b17277d5fb45622d Mon Sep 17 00:00:00 2001 From: Or Gerlitz Date: Wed, 15 Mar 2017 18:10:47 +0200 Subject: [PATCH 086/521] net/openvswitch: Set the ipv6 source tunnel key address attribute correctly [ Upstream commit 3d20f1f7bd575d147ffa75621fa560eea0aec690 ] When dealing with ipv6 source tunnel key address attribute (OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel dst ip, fix that. Fixes: 6b26ba3a7d95 ('openvswitch: netlink attributes for IPv6 tunneling') Signed-off-by: Or Gerlitz Reported-by: Paul Blakey Acked-by: Jiri Benc Acked-by: Joe Stringer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/openvswitch/flow_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index d1bd4a45ca2d..d26b28def310 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -588,7 +588,7 @@ static int ip_tun_from_nlattr(const struct nlattr *attr, ipv4 = true; break; case OVS_TUNNEL_KEY_ATTR_IPV6_SRC: - SW_FLOW_KEY_PUT(match, tun_key.u.ipv6.dst, + SW_FLOW_KEY_PUT(match, tun_key.u.ipv6.src, nla_get_in6_addr(a), is_mask); ipv6 = true; break; From 12f0bffc489dff7088c73f600b6be5769bc73cbd Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Wed, 15 Mar 2017 12:57:21 -0700 Subject: [PATCH 087/521] net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled [ Upstream commit 5371bbf4b295eea334ed453efa286afa2c3ccff3 ] Suspending the PHY would be putting it in a low power state where it may no longer allow us to do Wake-on-LAN. Fixes: cc013fb48898 ("net: bcmgenet: correctly suspend and resume PHY device") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index 91627561c58d..f971d92f7b41 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -3495,7 +3495,8 @@ static int bcmgenet_suspend(struct device *d) bcmgenet_netif_stop(dev); - phy_suspend(priv->phydev); + if (!device_may_wakeup(d)) + phy_suspend(priv->phydev); netif_device_detach(dev); @@ -3592,7 +3593,8 @@ static int bcmgenet_resume(struct device *d) netif_device_attach(dev); - phy_resume(priv->phydev); + if (!device_may_wakeup(d)) + phy_resume(priv->phydev); if (priv->eee.eee_enabled) bcmgenet_eee_enable_set(dev, true); From f3126725228c0fdbe17c18bcc5ace1b86465cce9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 15 Mar 2017 13:21:28 -0700 Subject: [PATCH 088/521] net: properly release sk_frag.page [ Upstream commit 22a0e18eac7a9e986fec76c60fa4a2926d1291e2 ] I mistakenly added the code to release sk->sk_frag in sk_common_release() instead of sk_destruct() TCP sockets using sk->sk_allocation == GFP_ATOMIC do no call sk_common_release() at close time, thus leaking one (order-3) page. iSCSI is using such sockets. Fixes: 5640f7685831 ("net: use a per task frag allocator") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/sock.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index f4c0917e66b5..9f4c4473156a 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1459,6 +1459,11 @@ void sk_destruct(struct sock *sk) pr_debug("%s: optmem leakage (%d bytes) detected\n", __func__, atomic_read(&sk->sk_omem_alloc)); + if (sk->sk_frag.page) { + put_page(sk->sk_frag.page); + sk->sk_frag.page = NULL; + } + if (sk->sk_peer_cred) put_cred(sk->sk_peer_cred); put_pid(sk->sk_peer_pid); @@ -2691,11 +2696,6 @@ void sk_common_release(struct sock *sk) sk_refcnt_debug_release(sk); - if (sk->sk_frag.page) { - put_page(sk->sk_frag.page); - sk->sk_frag.page = NULL; - } - sock_put(sk); } EXPORT_SYMBOL(sk_common_release); From ae43f9360a21b35cf785ae9a0fdce524d7af0938 Mon Sep 17 00:00:00 2001 From: "Lendacky, Thomas" Date: Wed, 15 Mar 2017 15:11:23 -0500 Subject: [PATCH 089/521] amd-xgbe: Fix jumbo MTU processing on newer hardware [ Upstream commit 622c36f143fc9566ba49d7cec994c2da1182d9e2 ] Newer hardware does not provide a cumulative payload length when multiple descriptors are needed to handle the data. Once the MTU increases beyond the size that can be handled by a single descriptor, the SKB does not get built properly by the driver. The driver will now calculate the size of the data buffers used by the hardware. The first buffer of the first descriptor is for packet headers or packet headers and data when the headers can't be split. Subsequent descriptors in a multi-descriptor chain will not use the first buffer. The second buffer is used by all the descriptors in the chain for payload data. Based on whether the driver is processing the first, intermediate, or last descriptor it can calculate the buffer usage and build the SKB properly. Tested and verified on both old and new hardware. Signed-off-by: Tom Lendacky Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/amd/xgbe/xgbe-common.h | 6 +- drivers/net/ethernet/amd/xgbe/xgbe-dev.c | 20 ++-- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 102 ++++++++++++-------- 3 files changed, 78 insertions(+), 50 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-common.h b/drivers/net/ethernet/amd/xgbe/xgbe-common.h index b6fa89102526..66ba1e0ff37e 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-common.h +++ b/drivers/net/ethernet/amd/xgbe/xgbe-common.h @@ -913,8 +913,8 @@ #define RX_PACKET_ATTRIBUTES_CSUM_DONE_WIDTH 1 #define RX_PACKET_ATTRIBUTES_VLAN_CTAG_INDEX 1 #define RX_PACKET_ATTRIBUTES_VLAN_CTAG_WIDTH 1 -#define RX_PACKET_ATTRIBUTES_INCOMPLETE_INDEX 2 -#define RX_PACKET_ATTRIBUTES_INCOMPLETE_WIDTH 1 +#define RX_PACKET_ATTRIBUTES_LAST_INDEX 2 +#define RX_PACKET_ATTRIBUTES_LAST_WIDTH 1 #define RX_PACKET_ATTRIBUTES_CONTEXT_NEXT_INDEX 3 #define RX_PACKET_ATTRIBUTES_CONTEXT_NEXT_WIDTH 1 #define RX_PACKET_ATTRIBUTES_CONTEXT_INDEX 4 @@ -923,6 +923,8 @@ #define RX_PACKET_ATTRIBUTES_RX_TSTAMP_WIDTH 1 #define RX_PACKET_ATTRIBUTES_RSS_HASH_INDEX 6 #define RX_PACKET_ATTRIBUTES_RSS_HASH_WIDTH 1 +#define RX_PACKET_ATTRIBUTES_FIRST_INDEX 7 +#define RX_PACKET_ATTRIBUTES_FIRST_WIDTH 1 #define RX_NORMAL_DESC0_OVT_INDEX 0 #define RX_NORMAL_DESC0_OVT_WIDTH 16 diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c index f6a7161e3b85..5e6238e0b2bd 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c @@ -1658,10 +1658,15 @@ static int xgbe_dev_read(struct xgbe_channel *channel) /* Get the header length */ if (XGMAC_GET_BITS_LE(rdesc->desc3, RX_NORMAL_DESC3, FD)) { + XGMAC_SET_BITS(packet->attributes, RX_PACKET_ATTRIBUTES, + FIRST, 1); rdata->rx.hdr_len = XGMAC_GET_BITS_LE(rdesc->desc2, RX_NORMAL_DESC2, HL); if (rdata->rx.hdr_len) pdata->ext_stats.rx_split_header_packets++; + } else { + XGMAC_SET_BITS(packet->attributes, RX_PACKET_ATTRIBUTES, + FIRST, 0); } /* Get the RSS hash */ @@ -1684,19 +1689,16 @@ static int xgbe_dev_read(struct xgbe_channel *channel) } } - /* Get the packet length */ - rdata->rx.len = XGMAC_GET_BITS_LE(rdesc->desc3, RX_NORMAL_DESC3, PL); - - if (!XGMAC_GET_BITS_LE(rdesc->desc3, RX_NORMAL_DESC3, LD)) { - /* Not all the data has been transferred for this packet */ - XGMAC_SET_BITS(packet->attributes, RX_PACKET_ATTRIBUTES, - INCOMPLETE, 1); + /* Not all the data has been transferred for this packet */ + if (!XGMAC_GET_BITS_LE(rdesc->desc3, RX_NORMAL_DESC3, LD)) return 0; - } /* This is the last of the data for this packet */ XGMAC_SET_BITS(packet->attributes, RX_PACKET_ATTRIBUTES, - INCOMPLETE, 0); + LAST, 1); + + /* Get the packet length */ + rdata->rx.len = XGMAC_GET_BITS_LE(rdesc->desc3, RX_NORMAL_DESC3, PL); /* Set checksum done indicator as appropriate */ if (netdev->features & NETIF_F_RXCSUM) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c index 53ce1222b11d..865b7e0b133b 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c @@ -1760,13 +1760,12 @@ static struct sk_buff *xgbe_create_skb(struct xgbe_prv_data *pdata, { struct sk_buff *skb; u8 *packet; - unsigned int copy_len; skb = napi_alloc_skb(napi, rdata->rx.hdr.dma_len); if (!skb) return NULL; - /* Start with the header buffer which may contain just the header + /* Pull in the header buffer which may contain just the header * or the header plus data */ dma_sync_single_range_for_cpu(pdata->dev, rdata->rx.hdr.dma_base, @@ -1775,30 +1774,49 @@ static struct sk_buff *xgbe_create_skb(struct xgbe_prv_data *pdata, packet = page_address(rdata->rx.hdr.pa.pages) + rdata->rx.hdr.pa.pages_offset; - copy_len = (rdata->rx.hdr_len) ? rdata->rx.hdr_len : len; - copy_len = min(rdata->rx.hdr.dma_len, copy_len); - skb_copy_to_linear_data(skb, packet, copy_len); - skb_put(skb, copy_len); - - len -= copy_len; - if (len) { - /* Add the remaining data as a frag */ - dma_sync_single_range_for_cpu(pdata->dev, - rdata->rx.buf.dma_base, - rdata->rx.buf.dma_off, - rdata->rx.buf.dma_len, - DMA_FROM_DEVICE); - - skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, - rdata->rx.buf.pa.pages, - rdata->rx.buf.pa.pages_offset, - len, rdata->rx.buf.dma_len); - rdata->rx.buf.pa.pages = NULL; - } + skb_copy_to_linear_data(skb, packet, len); + skb_put(skb, len); return skb; } +static unsigned int xgbe_rx_buf1_len(struct xgbe_ring_data *rdata, + struct xgbe_packet_data *packet) +{ + /* Always zero if not the first descriptor */ + if (!XGMAC_GET_BITS(packet->attributes, RX_PACKET_ATTRIBUTES, FIRST)) + return 0; + + /* First descriptor with split header, return header length */ + if (rdata->rx.hdr_len) + return rdata->rx.hdr_len; + + /* First descriptor but not the last descriptor and no split header, + * so the full buffer was used + */ + if (!XGMAC_GET_BITS(packet->attributes, RX_PACKET_ATTRIBUTES, LAST)) + return rdata->rx.hdr.dma_len; + + /* First descriptor and last descriptor and no split header, so + * calculate how much of the buffer was used + */ + return min_t(unsigned int, rdata->rx.hdr.dma_len, rdata->rx.len); +} + +static unsigned int xgbe_rx_buf2_len(struct xgbe_ring_data *rdata, + struct xgbe_packet_data *packet, + unsigned int len) +{ + /* Always the full buffer if not the last descriptor */ + if (!XGMAC_GET_BITS(packet->attributes, RX_PACKET_ATTRIBUTES, LAST)) + return rdata->rx.buf.dma_len; + + /* Last descriptor so calculate how much of the buffer was used + * for the last bit of data + */ + return rdata->rx.len - len; +} + static int xgbe_tx_poll(struct xgbe_channel *channel) { struct xgbe_prv_data *pdata = channel->pdata; @@ -1881,8 +1899,8 @@ static int xgbe_rx_poll(struct xgbe_channel *channel, int budget) struct napi_struct *napi; struct sk_buff *skb; struct skb_shared_hwtstamps *hwtstamps; - unsigned int incomplete, error, context_next, context; - unsigned int len, rdesc_len, max_len; + unsigned int last, error, context_next, context; + unsigned int len, buf1_len, buf2_len, max_len; unsigned int received = 0; int packet_count = 0; @@ -1892,7 +1910,7 @@ static int xgbe_rx_poll(struct xgbe_channel *channel, int budget) if (!ring) return 0; - incomplete = 0; + last = 0; context_next = 0; napi = (pdata->per_channel_irq) ? &channel->napi : &pdata->napi; @@ -1926,9 +1944,8 @@ static int xgbe_rx_poll(struct xgbe_channel *channel, int budget) received++; ring->cur++; - incomplete = XGMAC_GET_BITS(packet->attributes, - RX_PACKET_ATTRIBUTES, - INCOMPLETE); + last = XGMAC_GET_BITS(packet->attributes, RX_PACKET_ATTRIBUTES, + LAST); context_next = XGMAC_GET_BITS(packet->attributes, RX_PACKET_ATTRIBUTES, CONTEXT_NEXT); @@ -1937,7 +1954,7 @@ static int xgbe_rx_poll(struct xgbe_channel *channel, int budget) CONTEXT); /* Earlier error, just drain the remaining data */ - if ((incomplete || context_next) && error) + if ((!last || context_next) && error) goto read_again; if (error || packet->errors) { @@ -1949,16 +1966,22 @@ static int xgbe_rx_poll(struct xgbe_channel *channel, int budget) } if (!context) { - /* Length is cumulative, get this descriptor's length */ - rdesc_len = rdata->rx.len - len; - len += rdesc_len; + /* Get the data length in the descriptor buffers */ + buf1_len = xgbe_rx_buf1_len(rdata, packet); + len += buf1_len; + buf2_len = xgbe_rx_buf2_len(rdata, packet, len); + len += buf2_len; - if (rdesc_len && !skb) { + if (!skb) { skb = xgbe_create_skb(pdata, napi, rdata, - rdesc_len); - if (!skb) + buf1_len); + if (!skb) { error = 1; - } else if (rdesc_len) { + goto skip_data; + } + } + + if (buf2_len) { dma_sync_single_range_for_cpu(pdata->dev, rdata->rx.buf.dma_base, rdata->rx.buf.dma_off, @@ -1968,13 +1991,14 @@ static int xgbe_rx_poll(struct xgbe_channel *channel, int budget) skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rdata->rx.buf.pa.pages, rdata->rx.buf.pa.pages_offset, - rdesc_len, + buf2_len, rdata->rx.buf.dma_len); rdata->rx.buf.pa.pages = NULL; } } - if (incomplete || context_next) +skip_data: + if (!last || context_next) goto read_again; if (!skb) @@ -2033,7 +2057,7 @@ static int xgbe_rx_poll(struct xgbe_channel *channel, int budget) } /* Check if we need to save state before leaving */ - if (received && (incomplete || context_next)) { + if (received && (!last || context_next)) { rdata = XGBE_GET_DESC_DATA(ring, ring->cur); rdata->state_saved = 1; rdata->state.skb = skb; From 610c6bcc5fcfb6d02d63cfded2375a829df7faba Mon Sep 17 00:00:00 2001 From: Andrey Ulanov Date: Tue, 14 Mar 2017 20:16:42 -0700 Subject: [PATCH 090/521] net: unix: properly re-increment inflight counter of GC discarded candidates [ Upstream commit 7df9c24625b9981779afb8fcdbe2bb4765e61147 ] Dmitry has reported that a BUG_ON() condition in unix_notinflight() may be triggered by a simple code that forwards unix socket in an SCM_RIGHTS message. That is caused by incorrect unix socket GC implementation in unix_gc(). The GC first collects list of candidates, then (a) decrements their "children's" inflight counter, (b) checks which inflight counters are now 0, and then (c) increments all inflight counters back. (a) and (c) are done by calling scan_children() with inc_inflight or dec_inflight as the second argument. Commit 6209344f5a37 ("net: unix: fix inflight counting bug in garbage collector") changed scan_children() such that it no longer considers sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block of code that that unsets this flag _before_ invoking scan_children(, dec_iflight, ). This may lead to incorrect inflight counters for some sockets. This change fixes this bug by changing order of operations: UNIX_GC_CANDIDATE is now unset only after all inflight counters are restored to the original state. kernel BUG at net/unix/garbage.c:149! RIP: 0010:[] [] unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149 Call Trace: [] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487 [] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496 [] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655 [] skb_release_all+0x1a/0x60 net/core/skbuff.c:668 [] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684 [] kfree_skb+0x184/0x570 net/core/skbuff.c:705 [] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559 [] unix_release+0x49/0x90 net/unix/af_unix.c:836 [] sock_release+0x92/0x1f0 net/socket.c:570 [] sock_close+0x1b/0x20 net/socket.c:1017 [] __fput+0x34e/0x910 fs/file_table.c:208 [] ____fput+0x1a/0x20 fs/file_table.c:244 [] task_work_run+0x1a0/0x280 kernel/task_work.c:116 [< inline >] exit_task_work include/linux/task_work.h:21 [] do_exit+0x183a/0x2640 kernel/exit.c:828 [] do_group_exit+0x14e/0x420 kernel/exit.c:931 [] get_signal+0x663/0x1880 kernel/signal.c:2307 [] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807 [] exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [] syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259 [] entry_SYSCALL_64_fastpath+0xc4/0xc6 Link: https://lkml.org/lkml/2017/3/6/252 Signed-off-by: Andrey Ulanov Reported-by: Dmitry Vyukov Fixes: 6209344 ("net: unix: fix inflight counting bug in garbage collector") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/unix/garbage.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 6a0d48525fcf..c36757e72844 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -146,6 +146,7 @@ void unix_notinflight(struct user_struct *user, struct file *fp) if (s) { struct unix_sock *u = unix_sk(s); + BUG_ON(!atomic_long_read(&u->inflight)); BUG_ON(list_empty(&u->link)); if (atomic_long_dec_and_test(&u->inflight)) @@ -341,6 +342,14 @@ void unix_gc(void) } list_del(&cursor); + /* Now gc_candidates contains only garbage. Restore original + * inflight counters for these as well, and remove the skbuffs + * which are creating the cycle(s). + */ + skb_queue_head_init(&hitlist); + list_for_each_entry(u, &gc_candidates, link) + scan_children(&u->sk, inc_inflight, &hitlist); + /* not_cycle_list contains those sockets which do not make up a * cycle. Restore these to the inflight list. */ @@ -350,14 +359,6 @@ void unix_gc(void) list_move_tail(&u->link, &gc_inflight_list); } - /* Now gc_candidates contains only garbage. Restore original - * inflight counters for these as well, and remove the skbuffs - * which are creating the cycle(s). - */ - skb_queue_head_init(&hitlist); - list_for_each_entry(u, &gc_candidates, link) - scan_children(&u->sk, inc_inflight, &hitlist); - spin_unlock(&unix_gc_lock); /* Here we are. Hitlist is filled. Die. */ From 9d1894cba25c06b061565da6934ab43f446d3c69 Mon Sep 17 00:00:00 2001 From: Maor Gottlieb Date: Tue, 21 Mar 2017 15:59:17 +0200 Subject: [PATCH 091/521] net/mlx5: Increase number of max QPs in default profile [ Upstream commit 5f40b4ed975c26016cf41953b7510fe90718e21c ] With ConnectX-4 sharing SRQs from the same space as QPs, we hit a limit preventing some applications to allocate needed QPs amount. Double the size to 256K. Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Maor Gottlieb Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index ba115ec7aa92..1e611980cf99 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -85,7 +85,7 @@ static struct mlx5_profile profile[] = { [2] = { .mask = MLX5_PROF_MASK_QP_SIZE | MLX5_PROF_MASK_MR_CACHE, - .log_max_qp = 17, + .log_max_qp = 18, .mr_cache[0] = { .size = 500, .limit = 250 From fdcee7c1e2f8c6f46f26010b133ed963b620da2b Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Tue, 21 Mar 2017 15:59:19 +0200 Subject: [PATCH 092/521] net/mlx5e: Count LRO packets correctly [ Upstream commit 8ab7e2ae15d84ba758b2c8c6f4075722e9bd2a08 ] RX packets statistics ('rx_packets' counter) used to count LRO packets as one, even though it contains multiple segments. This patch will increment the counter by the number of segments, and align the driver with the behavior of other drivers in the stack. Note that no information is lost in this patch due to 'rx_lro_packets' counter existence. Before, ethtool showed: $ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets" rx_packets: 435277 rx_lro_packets: 35847 rx_packets_phy: 1935066 Now, we will see the more logical statistics: $ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets" rx_packets: 1935066 rx_lro_packets: 35847 rx_packets_phy: 1935066 Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Gal Pressman Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index cf0098596e85..e9408f5e2a1d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -197,6 +197,10 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe, if (lro_num_seg > 1) { mlx5e_lro_update_hdr(skb, cqe); skb_shinfo(skb)->gso_size = DIV_ROUND_UP(cqe_bcnt, lro_num_seg); + /* Subtract one since we already counted this as one + * "regular" packet in mlx5e_complete_rx_cqe() + */ + rq->stats.packets += lro_num_seg - 1; rq->stats.lro_packets++; rq->stats.lro_bytes += cqe_bcnt; } From 85f00dac91a1047b57e600df9636c8408f70001f Mon Sep 17 00:00:00 2001 From: Doug Berger Date: Tue, 21 Mar 2017 14:01:06 -0700 Subject: [PATCH 093/521] net: bcmgenet: remove bcmgenet_internal_phy_setup() [ Upstream commit 31739eae738ccbe8b9d627c3f2251017ca03f4d2 ] Commit 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset") removed the bcmgenet_mii_reset() function from bcmgenet_power_up() and bcmgenet_internal_phy_setup() functions. In so doing it broke the reset of the internal PHY devices used by the GENETv1-GENETv3 which required this reset before the UniMAC was enabled. It also broke the internal GPHY devices used by the GENETv4 because the config_init that installed the AFE workaround was no longer occurring after the reset of the GPHY performed by bcmgenet_phy_power_set() in bcmgenet_internal_phy_setup(). In addition the code in bcmgenet_internal_phy_setup() related to the "enable APD" comment goes with the bcmgenet_mii_reset() so it should have also been removed. Commit bd4060a6108b ("net: bcmgenet: Power on integrated GPHY in bcmgenet_power_up()") moved the bcmgenet_phy_power_set() call to the bcmgenet_power_up() function, but failed to remove it from the bcmgenet_internal_phy_setup() function. Had it done so, the bcmgenet_internal_phy_setup() function would have been empty and could have been removed at that time. Commit 5dbebbb44a6a ("net: bcmgenet: Software reset EPHY after power on") was submitted to correct the functional problems introduced by commit 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset"). It was included in v4.4 and made available on 4.3-stable. Unfortunately, it didn't fully revert the commit because this bcmgenet_mii_reset() doesn't apply the soft reset to the internal GPHY used by GENETv4 like the previous one did. This prevents the restoration of the AFE work- arounds for internal GPHY devices after the bcmgenet_phy_power_set() in bcmgenet_internal_phy_setup(). This commit takes the alternate approach of removing the unnecessary bcmgenet_internal_phy_setup() function which shouldn't have been in v4.3 so that when bcmgenet_mii_reset() was restored it should have only gone into bcmgenet_power_up(). This will avoid the problems while also removing the redundancy (and hopefully some of the confusion). Fixes: 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset") Signed-off-by: Doug Berger Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/genet/bcmmii.c | 15 --------------- 1 file changed, 15 deletions(-) diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c index 8bdfe53754ba..e96d1f95bb47 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmmii.c +++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c @@ -220,20 +220,6 @@ void bcmgenet_phy_power_set(struct net_device *dev, bool enable) udelay(60); } -static void bcmgenet_internal_phy_setup(struct net_device *dev) -{ - struct bcmgenet_priv *priv = netdev_priv(dev); - u32 reg; - - /* Power up PHY */ - bcmgenet_phy_power_set(dev, true); - /* enable APD */ - reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT); - reg |= EXT_PWR_DN_EN_LD; - bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT); - bcmgenet_mii_reset(dev); -} - static void bcmgenet_moca_phy_setup(struct bcmgenet_priv *priv) { u32 reg; @@ -281,7 +267,6 @@ int bcmgenet_mii_config(struct net_device *dev) if (priv->internal_phy) { phy_name = "internal PHY"; - bcmgenet_internal_phy_setup(dev); } else if (priv->phy_interface == PHY_INTERFACE_MODE_MOCA) { phy_name = "MoCA"; bcmgenet_moca_phy_setup(priv); From 38dece41e5be77478b333db580b5e171b136befa Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 21 Mar 2017 19:22:28 -0700 Subject: [PATCH 094/521] ipv4: provide stronger user input validation in nl_fib_input() [ Upstream commit c64c0b3cac4c5b8cb093727d2c19743ea3965c0b ] Alexander reported a KMSAN splat caused by reads of uninitialized field (tb_id_in) from user provided struct fib_result_nl It turns out nl_fib_input() sanity tests on user input is a bit wrong : User can pretend nlh->nlmsg_len is big enough, but provide at sendmsg() time a too small buffer. Reported-by: Alexander Potapenko Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/fib_frontend.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 4e60dae86df5..1adba44f8fbc 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1080,7 +1080,8 @@ static void nl_fib_input(struct sk_buff *skb) net = sock_net(skb->sk); nlh = nlmsg_hdr(skb); - if (skb->len < NLMSG_HDRLEN || skb->len < nlh->nlmsg_len || + if (skb->len < nlmsg_total_size(sizeof(*frn)) || + skb->len < nlh->nlmsg_len || nlmsg_len(nlh) < sizeof(*frn)) return; From 95aa915c2f04c27bb3935c8b9446435f40f17f9d Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Wed, 22 Mar 2017 13:08:08 +0100 Subject: [PATCH 095/521] socket, bpf: fix sk_filter use after free in sk_clone_lock [ Upstream commit a97e50cc4cb67e1e7bff56f6b41cda62ca832336 ] In sk_clone_lock(), we create a new socket and inherit most of the parent's members via sock_copy() which memcpy()'s various sections. Now, in case the parent socket had a BPF socket filter attached, then newsk->sk_filter points to the same instance as the original sk->sk_filter. sk_filter_charge() is then called on the newsk->sk_filter to take a reference and should that fail due to hitting max optmem, we bail out and release the newsk instance. The issue is that commit 278571baca2a ("net: filter: simplify socket charging") wrongly combined the dismantle path with the failure path of xfrm_sk_clone_policy(). This means, even when charging failed, we call sk_free_unlock_clone() on the newsk, which then still points to the same sk_filter as the original sk. Thus, sk_free_unlock_clone() calls into __sk_destruct() eventually where it tests for present sk_filter and calls sk_filter_uncharge() on it, which potentially lets sk_omem_alloc wrap around and releases the eBPF prog and sk_filter structure from the (still intact) parent. Fix it by making sure that when sk_filter_charge() failed, we reset newsk->sk_filter back to NULL before passing to sk_free_unlock_clone(), so that we don't mess with the parents sk_filter. Only if xfrm_sk_clone_policy() fails, we did reach the point where either the parent's filter was NULL and as a result newsk's as well or where we previously had a successful sk_filter_charge(), thus for that case, we do need sk_filter_uncharge() to release the prior taken reference on sk_filter. Fixes: 278571baca2a ("net: filter: simplify socket charging") Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/sock.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/core/sock.c b/net/core/sock.c index 9f4c4473156a..9c708a5fb751 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1557,6 +1557,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) is_charged = sk_filter_charge(newsk, filter); if (unlikely(!is_charged || xfrm_sk_clone_policy(newsk, sk))) { + /* We need to make sure that we don't uncharge the new + * socket if we couldn't charge it in the first place + * as otherwise we uncharge the parent's filter. + */ + if (!is_charged) + RCU_INIT_POINTER(newsk->sk_filter, NULL); /* It is still raw copy of parent, so invalidate * destructor and make plain sk_free() */ newsk->sk_destruct = NULL; From afaed241928f029e788bbbeed26b2b530ba7cd1a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 22 Mar 2017 08:10:21 -0700 Subject: [PATCH 096/521] tcp: initialize icsk_ack.lrcvtime at session start time [ Upstream commit 15bb7745e94a665caf42bfaabf0ce062845b533b ] icsk_ack.lrcvtime has a 0 value at socket creation time. tcpi_last_data_recv can have bogus value if no payload is ever received. This patch initializes icsk_ack.lrcvtime for active sessions in tcp_finish_connect(), and for passive sessions in tcp_create_openreq_child() Signed-off-by: Eric Dumazet Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 2 +- net/ipv4/tcp_minisocks.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 7cc0f8aac28f..818630cec54f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5435,6 +5435,7 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff *skb) struct inet_connection_sock *icsk = inet_csk(sk); tcp_set_state(sk, TCP_ESTABLISHED); + icsk->icsk_ack.lrcvtime = tcp_time_stamp; if (skb) { icsk->icsk_af_ops->sk_rx_dst_set(sk, skb); @@ -5647,7 +5648,6 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, * to stand against the temptation 8) --ANK */ inet_csk_schedule_ack(sk); - icsk->icsk_ack.lrcvtime = tcp_time_stamp; tcp_enter_quickack_mode(sk); inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK, TCP_DELACK_MAX, TCP_RTO_MAX); diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 9475a2748a9a..019db68bdb9f 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -472,6 +472,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, newtp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT); newtp->rtt_min[0].rtt = ~0U; newicsk->icsk_rto = TCP_TIMEOUT_INIT; + newicsk->icsk_ack.lrcvtime = tcp_time_stamp; newtp->packets_out = 0; newtp->retrans_out = 0; From 9ac7bd114e13628467c037066786775a357d91d6 Mon Sep 17 00:00:00 2001 From: Matjaz Hegedic Date: Fri, 10 Mar 2017 14:33:09 -0800 Subject: [PATCH 097/521] Input: elan_i2c - add ASUS EeeBook X205TA special touchpad fw commit 92ef6f97a66e580189a41a132d0f8a9f78d6ddce upstream. EeeBook X205TA is yet another ASUS device with a special touchpad firmware that needs to be accounted for during initialization, or else the touchpad will go into an invalid state upon suspend/resume. Adding the appropriate ic_type and product_id check fixes the problem. Signed-off-by: Matjaz Hegedic Acked-by: KT Liao Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elan_i2c_core.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/input/mouse/elan_i2c_core.c b/drivers/input/mouse/elan_i2c_core.c index ed1935f300a7..da5458dfb1e3 100644 --- a/drivers/input/mouse/elan_i2c_core.c +++ b/drivers/input/mouse/elan_i2c_core.c @@ -218,17 +218,19 @@ static int elan_query_product(struct elan_tp_data *data) static int elan_check_ASUS_special_fw(struct elan_tp_data *data) { - if (data->ic_type != 0x0E) - return false; - - switch (data->product_id) { - case 0x05 ... 0x07: - case 0x09: - case 0x13: + if (data->ic_type == 0x0E) { + switch (data->product_id) { + case 0x05 ... 0x07: + case 0x09: + case 0x13: + return true; + } + } else if (data->ic_type == 0x08 && data->product_id == 0x26) { + /* ASUS EeeBook X205TA */ return true; - default: - return false; } + + return false; } static int __elan_initialize(struct elan_tp_data *data) From 5f9243e4fca610599c30b552baacdcffc76ea7af Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Tue, 7 Mar 2017 09:31:29 -0800 Subject: [PATCH 098/521] Input: i8042 - add noloop quirk for Dell Embedded Box PC 3000 commit 45838660e34d90db8d4f7cbc8fd66e8aff79f4fe upstream. The aux port does not get detected without noloop quirk, so external PS/2 mouse cannot work as result. The PS/2 mouse can work with this quirk. BugLink: https://bugs.launchpad.net/bugs/1591053 Signed-off-by: Kai-Heng Feng Reviewed-by: Marcos Paulo de Souza Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/serio/i8042-x86ia64io.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index 0cdd95801a25..25eab453f2b2 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -119,6 +119,13 @@ static const struct dmi_system_id __initconst i8042_dmi_noloop_table[] = { DMI_MATCH(DMI_PRODUCT_VERSION, "DL760"), }, }, + { + /* Dell Embedded Box PC 3000 */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Embedded Box PC 3000"), + }, + }, { /* OQO Model 01 */ .matches = { From a07d3669654ad335c19df048199da0a063e0c387 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 16 Mar 2017 11:34:02 -0700 Subject: [PATCH 099/521] Input: iforce - validate number of endpoints before using them commit 59cf8bed44a79ec42303151dd014fdb6434254bb upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer or accessing memory that lie beyond the end of the endpoint array should a malicious device lack the expected endpoints. Signed-off-by: Johan Hovold Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/joystick/iforce/iforce-usb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/input/joystick/iforce/iforce-usb.c b/drivers/input/joystick/iforce/iforce-usb.c index d96aa27dfcdc..db64adfbe1af 100644 --- a/drivers/input/joystick/iforce/iforce-usb.c +++ b/drivers/input/joystick/iforce/iforce-usb.c @@ -141,6 +141,9 @@ static int iforce_usb_probe(struct usb_interface *intf, interface = intf->cur_altsetting; + if (interface->desc.bNumEndpoints < 2) + return -ENODEV; + epirq = &interface->endpoint[0].desc; epout = &interface->endpoint[1].desc; From 6bed7c1e2b78e58adab2e8448f3e6799857b5726 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 16 Mar 2017 11:36:13 -0700 Subject: [PATCH 100/521] Input: ims-pcu - validate number of endpoints before using them commit 1916d319271664241b7aa0cd2b05e32bdb310ce9 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack control-interface endpoints. Fixes: 628329d52474 ("Input: add IMS Passenger Control Unit driver") Signed-off-by: Johan Hovold Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/misc/ims-pcu.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/input/misc/ims-pcu.c b/drivers/input/misc/ims-pcu.c index 9c0ea36913b4..f4e8fbec6a94 100644 --- a/drivers/input/misc/ims-pcu.c +++ b/drivers/input/misc/ims-pcu.c @@ -1667,6 +1667,10 @@ static int ims_pcu_parse_cdc_data(struct usb_interface *intf, struct ims_pcu *pc return -EINVAL; alt = pcu->ctrl_intf->cur_altsetting; + + if (alt->desc.bNumEndpoints < 1) + return -ENODEV; + pcu->ep_ctrl = &alt->endpoint[0].desc; pcu->max_ctrl_size = usb_endpoint_maxp(pcu->ep_ctrl); From 0812c6855c89d905e34e88166570cae4a401b23a Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 16 Mar 2017 11:39:29 -0700 Subject: [PATCH 101/521] Input: hanwang - validate number of endpoints before using them commit ba340d7b83703768ce566f53f857543359aa1b98 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Fixes: bba5394ad3bd ("Input: add support for Hanwang tablets") Signed-off-by: Johan Hovold Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/tablet/hanwang.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/input/tablet/hanwang.c b/drivers/input/tablet/hanwang.c index cd852059b99e..df4bea96d7ed 100644 --- a/drivers/input/tablet/hanwang.c +++ b/drivers/input/tablet/hanwang.c @@ -340,6 +340,9 @@ static int hanwang_probe(struct usb_interface *intf, const struct usb_device_id int error; int i; + if (intf->cur_altsetting->desc.bNumEndpoints < 1) + return -ENODEV; + hanwang = kzalloc(sizeof(struct hanwang), GFP_KERNEL); input_dev = input_allocate_device(); if (!hanwang || !input_dev) { From e916f1d6188ef765303b4f74387d7e92d49a5be6 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 16 Mar 2017 11:37:01 -0700 Subject: [PATCH 102/521] Input: yealink - validate number of endpoints before using them commit 5cc4a1a9f5c179795c8a1f2b0f4361829d6a070e upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Fixes: aca951a22a1d ("[PATCH] input-driver-yealink-P1K-usb-phone") Signed-off-by: Johan Hovold Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/misc/yealink.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/input/misc/yealink.c b/drivers/input/misc/yealink.c index 79c964c075f1..6e7ff9561d92 100644 --- a/drivers/input/misc/yealink.c +++ b/drivers/input/misc/yealink.c @@ -875,6 +875,10 @@ static int usb_probe(struct usb_interface *intf, const struct usb_device_id *id) int ret, pipe, i; interface = intf->cur_altsetting; + + if (interface->desc.bNumEndpoints < 1) + return -ENODEV; + endpoint = &interface->endpoint[0].desc; if (!usb_endpoint_is_int_in(endpoint)) return -ENODEV; From c05490638ddfffa35d2fb03c1852f9013757a9e1 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 16 Mar 2017 11:35:12 -0700 Subject: [PATCH 103/521] Input: cm109 - validate number of endpoints before using them commit ac2ee9ba953afe88f7a673e1c0c839227b1d7891 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Fixes: c04148f915e5 ("Input: add driver for USB VoIP phones with CM109...") Signed-off-by: Johan Hovold Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/misc/cm109.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/input/misc/cm109.c b/drivers/input/misc/cm109.c index 9365535ba7f1..50a7faa504f7 100644 --- a/drivers/input/misc/cm109.c +++ b/drivers/input/misc/cm109.c @@ -675,6 +675,10 @@ static int cm109_usb_probe(struct usb_interface *intf, int error = -ENOMEM; interface = intf->cur_altsetting; + + if (interface->desc.bNumEndpoints < 1) + return -ENODEV; + endpoint = &interface->endpoint[0].desc; if (!usb_endpoint_is_int_in(endpoint)) From b3c4c0c470b58dd4a5e40e11ccd9fea7fbbfa799 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 16 Mar 2017 11:41:55 -0700 Subject: [PATCH 104/521] Input: kbtab - validate number of endpoints before using them commit cb1b494663e037253337623bf1ef2df727883cb7 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Signed-off-by: Johan Hovold Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/tablet/kbtab.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/input/tablet/kbtab.c b/drivers/input/tablet/kbtab.c index d2ac7c2b5b82..2812f9236b7d 100644 --- a/drivers/input/tablet/kbtab.c +++ b/drivers/input/tablet/kbtab.c @@ -122,6 +122,9 @@ static int kbtab_probe(struct usb_interface *intf, const struct usb_device_id *i struct input_dev *input_dev; int error = -ENOMEM; + if (intf->cur_altsetting->desc.bNumEndpoints < 1) + return -ENODEV; + kbtab = kzalloc(sizeof(struct kbtab), GFP_KERNEL); input_dev = input_allocate_device(); if (!kbtab || !input_dev) From 549993001e7de0553d85c9022dc41d5b3ff7d1ff Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 16 Mar 2017 11:43:09 -0700 Subject: [PATCH 105/521] Input: sur40 - validate number of endpoints before using them commit 92461f5d723037530c1f36cce93640770037812c upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer or accessing memory that lie beyond the end of the endpoint array should a malicious device lack the expected endpoints. Fixes: bdb5c57f209c ("Input: add sur40 driver for Samsung SUR40... ") Signed-off-by: Johan Hovold Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/touchscreen/sur40.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/input/touchscreen/sur40.c b/drivers/input/touchscreen/sur40.c index 45b466e3bbe8..0146e2c74649 100644 --- a/drivers/input/touchscreen/sur40.c +++ b/drivers/input/touchscreen/sur40.c @@ -500,6 +500,9 @@ static int sur40_probe(struct usb_interface *interface, if (iface_desc->desc.bInterfaceClass != 0xFF) return -ENODEV; + if (iface_desc->desc.bNumEndpoints < 5) + return -ENODEV; + /* Use endpoint #4 (0x86). */ endpoint = &iface_desc->endpoint[4].desc; if (endpoint->bEndpointAddress != TOUCH_ENDPOINT) From b55ffcb1bc8a9c40db928f568ef61016ac681c29 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Tue, 21 Mar 2017 13:56:04 +0100 Subject: [PATCH 106/521] ALSA: seq: Fix racy cell insertions during snd_seq_pool_done() commit c520ff3d03f0b5db7146d9beed6373ad5d2a5e0e upstream. When snd_seq_pool_done() is called, it marks the closing flag to refuse the further cell insertions. But snd_seq_pool_done() itself doesn't clear the cells but just waits until all cells are cleared by the caller side. That is, it's racy, and this leads to the endless stall as syzkaller spotted. This patch addresses the racy by splitting the setup of pool->closing flag out of snd_seq_pool_done(), and calling it properly before snd_seq_pool_done(). BugLink: http://lkml.kernel.org/r/CACT4Y+aqqy8bZA1fFieifNxR2fAfFQQABcBHj801+u5ePV0URw@mail.gmail.com Reported-and-tested-by: Dmitry Vyukov Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/seq/seq_clientmgr.c | 1 + sound/core/seq/seq_fifo.c | 3 +++ sound/core/seq/seq_memory.c | 17 +++++++++++++---- sound/core/seq/seq_memory.h | 1 + 4 files changed, 18 insertions(+), 4 deletions(-) diff --git a/sound/core/seq/seq_clientmgr.c b/sound/core/seq/seq_clientmgr.c index 58e79e02f217..c67f9c212dd1 100644 --- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -1921,6 +1921,7 @@ static int snd_seq_ioctl_set_client_pool(struct snd_seq_client *client, info.output_pool != client->pool->size)) { if (snd_seq_write_pool_allocated(client)) { /* remove all existing cells */ + snd_seq_pool_mark_closing(client->pool); snd_seq_queue_client_leave_cells(client->number); snd_seq_pool_done(client->pool); } diff --git a/sound/core/seq/seq_fifo.c b/sound/core/seq/seq_fifo.c index 86240d02b530..3f4efcb85df5 100644 --- a/sound/core/seq/seq_fifo.c +++ b/sound/core/seq/seq_fifo.c @@ -70,6 +70,9 @@ void snd_seq_fifo_delete(struct snd_seq_fifo **fifo) return; *fifo = NULL; + if (f->pool) + snd_seq_pool_mark_closing(f->pool); + snd_seq_fifo_clear(f); /* wake up clients if any */ diff --git a/sound/core/seq/seq_memory.c b/sound/core/seq/seq_memory.c index dfa5156f3585..5847c4475bf3 100644 --- a/sound/core/seq/seq_memory.c +++ b/sound/core/seq/seq_memory.c @@ -414,6 +414,18 @@ int snd_seq_pool_init(struct snd_seq_pool *pool) return 0; } +/* refuse the further insertion to the pool */ +void snd_seq_pool_mark_closing(struct snd_seq_pool *pool) +{ + unsigned long flags; + + if (snd_BUG_ON(!pool)) + return; + spin_lock_irqsave(&pool->lock, flags); + pool->closing = 1; + spin_unlock_irqrestore(&pool->lock, flags); +} + /* remove events */ int snd_seq_pool_done(struct snd_seq_pool *pool) { @@ -424,10 +436,6 @@ int snd_seq_pool_done(struct snd_seq_pool *pool) return -EINVAL; /* wait for closing all threads */ - spin_lock_irqsave(&pool->lock, flags); - pool->closing = 1; - spin_unlock_irqrestore(&pool->lock, flags); - if (waitqueue_active(&pool->output_sleep)) wake_up(&pool->output_sleep); @@ -484,6 +492,7 @@ int snd_seq_pool_delete(struct snd_seq_pool **ppool) *ppool = NULL; if (pool == NULL) return 0; + snd_seq_pool_mark_closing(pool); snd_seq_pool_done(pool); kfree(pool); return 0; diff --git a/sound/core/seq/seq_memory.h b/sound/core/seq/seq_memory.h index 4a2ec779b8a7..32f959c17786 100644 --- a/sound/core/seq/seq_memory.h +++ b/sound/core/seq/seq_memory.h @@ -84,6 +84,7 @@ static inline int snd_seq_total_cells(struct snd_seq_pool *pool) int snd_seq_pool_init(struct snd_seq_pool *pool); /* done pool - free events */ +void snd_seq_pool_mark_closing(struct snd_seq_pool *pool); int snd_seq_pool_done(struct snd_seq_pool *pool); /* create pool */ From ed00b613bbcb7af32fbdd87e3c985c00e2c9c5a3 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 20 Mar 2017 10:08:19 +0100 Subject: [PATCH 107/521] ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call commit f363a06642f28caaa78cb6446bbad90c73fe183c upstream. In the commit [15c75b09f8d1: ALSA: ctxfi: Fallback DMA mask to 32bit], I forgot to put "!" at dam_set_mask() call check in cthw20k1.c (while cthw20k2.c is OK). This patch fixes that obvious bug. (As a side note: although the original commit was completely wrong, it's still working for most of machines, as it sets to 32bit DMA mask in the end. So the bug severity is low.) Fixes: 15c75b09f8d1 ("ALSA: ctxfi: Fallback DMA mask to 32bit") Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/ctxfi/cthw20k1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/pci/ctxfi/cthw20k1.c b/sound/pci/ctxfi/cthw20k1.c index ab4cdab5cfa5..79edd88d5cd0 100644 --- a/sound/pci/ctxfi/cthw20k1.c +++ b/sound/pci/ctxfi/cthw20k1.c @@ -1905,7 +1905,7 @@ static int hw_card_start(struct hw *hw) return err; /* Set DMA transfer mask */ - if (dma_set_mask(&pci->dev, DMA_BIT_MASK(dma_bits))) { + if (!dma_set_mask(&pci->dev, DMA_BIT_MASK(dma_bits))) { dma_set_coherent_mask(&pci->dev, DMA_BIT_MASK(dma_bits)); } else { dma_set_mask(&pci->dev, DMA_BIT_MASK(32)); From 1ea551eec703102af8db2c2dcc99fc660baa3602 Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Thu, 23 Mar 2017 10:00:25 +0800 Subject: [PATCH 108/521] ALSA: hda - Adding a group of pin definition to fix headset problem commit 3f307834e695f59dac4337a40316bdecfb9d0508 upstream. A new Dell laptop needs to apply ALC269_FIXUP_DELL1_MIC_NO_PRESENCE to fix the headset problem, and the pin definiton of this machine is not in the pin quirk table yet, now adding it to the table. Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index cf0785ddbd14..1d4f34379f56 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6040,6 +6040,8 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = { ALC295_STANDARD_PINS, {0x17, 0x21014040}, {0x18, 0x21a19050}), + SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC295_STANDARD_PINS), SND_HDA_PIN_QUIRK(0x10ec0298, 0x1028, "Dell", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE, ALC298_STANDARD_PINS, {0x17, 0x90170110}), From 8f0f081647cc1c7e7ce6bea99a3b2ebb3604b1f1 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 9 Mar 2017 11:32:28 -0600 Subject: [PATCH 109/521] USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems commit 6e9f44eaaef0df7b846e9316fa9ca72a02025d44 upstream. Add Quectel UC15, UC20, EC21, and EC25. The EC20 is handled by qcserial due to a USB VID/PID conflict with an existing Acer device. Signed-off-by: Dan Williams Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 42cc72e54c05..af67a0de6b5d 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -233,6 +233,14 @@ static void option_instat_callback(struct urb *urb); #define BANDRICH_PRODUCT_1012 0x1012 #define QUALCOMM_VENDOR_ID 0x05C6 +/* These Quectel products use Qualcomm's vendor ID */ +#define QUECTEL_PRODUCT_UC20 0x9003 +#define QUECTEL_PRODUCT_UC15 0x9090 + +#define QUECTEL_VENDOR_ID 0x2c7c +/* These Quectel products use Quectel's vendor ID */ +#define QUECTEL_PRODUCT_EC21 0x0121 +#define QUECTEL_PRODUCT_EC25 0x0125 #define CMOTECH_VENDOR_ID 0x16d8 #define CMOTECH_PRODUCT_6001 0x6001 @@ -1161,7 +1169,14 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE(QUALCOMM_VENDOR_ID, 0x6613)}, /* Onda H600/ZTE MF330 */ { USB_DEVICE(QUALCOMM_VENDOR_ID, 0x0023)}, /* ONYX 3G device */ { USB_DEVICE(QUALCOMM_VENDOR_ID, 0x9000)}, /* SIMCom SIM5218 */ - { USB_DEVICE(QUALCOMM_VENDOR_ID, 0x9003), /* Quectel UC20 */ + /* Quectel products using Qualcomm vendor ID */ + { USB_DEVICE(QUALCOMM_VENDOR_ID, QUECTEL_PRODUCT_UC15)}, + { USB_DEVICE(QUALCOMM_VENDOR_ID, QUECTEL_PRODUCT_UC20), + .driver_info = (kernel_ulong_t)&net_intf4_blacklist }, + /* Quectel products using Quectel vendor ID */ + { USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC21), + .driver_info = (kernel_ulong_t)&net_intf4_blacklist }, + { USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC25), .driver_info = (kernel_ulong_t)&net_intf4_blacklist }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6001) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_CMU_300) }, From 9218793a39def5ee7555d990ef6034260024a379 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= Date: Fri, 17 Mar 2017 17:21:28 +0100 Subject: [PATCH 110/521] USB: serial: qcserial: add Dell DW5811e MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 436ecf5519d892397af133a79ccd38a17c25fa51 upstream. This is a Dell branded Sierra Wireless EM7455. Signed-off-by: Bjørn Mork Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/qcserial.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c index 696458db7e3c..38b3f0d8cd58 100644 --- a/drivers/usb/serial/qcserial.c +++ b/drivers/usb/serial/qcserial.c @@ -169,6 +169,8 @@ static const struct usb_device_id id_table[] = { {DEVICE_SWI(0x413c, 0x81a9)}, /* Dell Wireless 5808e Gobi(TM) 4G LTE Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81b1)}, /* Dell Wireless 5809e Gobi(TM) 4G LTE Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81b3)}, /* Dell Wireless 5809e Gobi(TM) 4G LTE Mobile Broadband Card (rev3) */ + {DEVICE_SWI(0x413c, 0x81b5)}, /* Dell Wireless 5811e QDL */ + {DEVICE_SWI(0x413c, 0x81b6)}, /* Dell Wireless 5811e QDL */ /* Huawei devices */ {DEVICE_HWI(0x03f0, 0x581d)}, /* HP lt4112 LTE/HSPA+ Gobi 4G Modem (Huawei me906e) */ From 19f0fe67b9d04580c377efc568cc8630a5af06b4 Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Tue, 14 Mar 2017 12:09:56 +0100 Subject: [PATCH 111/521] ACM gadget: fix endianness in notifications MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit cdd7928df0d2efaa3270d711963773a08a4cc8ab upstream. The gadget code exports the bitfield for serial status changes over the wire in its internal endianness. The fix is to convert to little endian before sending it over the wire. Signed-off-by: Oliver Neukum Tested-by: 家瑋 Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/function/f_acm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/gadget/function/f_acm.c b/drivers/usb/gadget/function/f_acm.c index 2fa1e80a3ce7..67e474b13fca 100644 --- a/drivers/usb/gadget/function/f_acm.c +++ b/drivers/usb/gadget/function/f_acm.c @@ -535,13 +535,15 @@ static int acm_notify_serial_state(struct f_acm *acm) { struct usb_composite_dev *cdev = acm->port.func.config->cdev; int status; + __le16 serial_state; spin_lock(&acm->lock); if (acm->notify_req) { dev_dbg(&cdev->gadget->dev, "acm ttyGS%d serial state %04x\n", acm->port_num, acm->serial_state); + serial_state = cpu_to_le16(acm->serial_state); status = acm_cdc_notify(acm, USB_CDC_NOTIFY_SERIAL_STATE, - 0, &acm->serial_state, sizeof(acm->serial_state)); + 0, &serial_state, sizeof(acm->serial_state)); } else { acm->pending = true; status = 0; From 8a8a8007871acae231ca5dba49f648d64326e919 Mon Sep 17 00:00:00 2001 From: Roger Quadros Date: Wed, 8 Mar 2017 16:05:43 +0200 Subject: [PATCH 112/521] usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval commit 09424c50b7dff40cb30011c09114404a4656e023 upstream. The streaming_maxburst module parameter is 0 offset (0..15) so we must add 1 while using it for wBytesPerInterval calculation for the SuperSpeed companion descriptor. Without this host uvcvideo driver will always see the wrong wBytesPerInterval for SuperSpeed uvc gadget and may not find a suitable video interface endpoint. e.g. for streaming_maxburst = 0 case it will always fail as wBytePerInterval was evaluating to 0. Reviewed-by: Laurent Pinchart Signed-off-by: Roger Quadros Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/function/f_uvc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c index 29b41b5dee04..c7689d05356c 100644 --- a/drivers/usb/gadget/function/f_uvc.c +++ b/drivers/usb/gadget/function/f_uvc.c @@ -625,7 +625,7 @@ uvc_function_bind(struct usb_configuration *c, struct usb_function *f) uvc_ss_streaming_comp.bMaxBurst = opts->streaming_maxburst; uvc_ss_streaming_comp.wBytesPerInterval = cpu_to_le16(max_packet_size * max_packet_mult * - opts->streaming_maxburst); + (opts->streaming_maxburst + 1)); /* Allocate endpoints. */ ep = usb_ep_autoconfig(cdev->gadget, &uvc_control_ep); From 2c929ea720f968da2f1ad90db995cc49a937955f Mon Sep 17 00:00:00 2001 From: Samuel Thibault Date: Mon, 13 Mar 2017 20:50:08 +0100 Subject: [PATCH 113/521] usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk commit 3243367b209faed5c320a4e5f9a565ee2a2ba958 upstream. Some USB 2.0 devices erroneously report millisecond values in bInterval. The generic config code manages to catch most of them, but in some cases it's not completely enough. The case at stake here is a USB 2.0 braille device, which wants to announce 10ms and thus sets bInterval to 10, but with the USB 2.0 computation that yields to 64ms. It happens that one can type fast enough to reach this interval and get the device buffers overflown, leading to problematic latencies. The generic config code does not catch this case because the 64ms is considered a sane enough value. This change thus adds a USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL quirk to mark devices which actually report milliseconds in bInterval, and marks Vario Ultra devices as needing it. Signed-off-by: Samuel Thibault Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/config.c | 10 ++++++++++ drivers/usb/core/quirks.c | 8 ++++++++ include/linux/usb/quirks.h | 6 ++++++ 3 files changed, 24 insertions(+) diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c index ac30a051ad71..325cbc9c35d8 100644 --- a/drivers/usb/core/config.c +++ b/drivers/usb/core/config.c @@ -246,6 +246,16 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, int inum, /* * Adjust bInterval for quirked devices. + */ + /* + * This quirk fixes bIntervals reported in ms. + */ + if (to_usb_device(ddev)->quirks & + USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL) { + n = clamp(fls(d->bInterval) + 3, i, j); + i = j = n; + } + /* * This quirk fixes bIntervals reported in * linear microframes. */ diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 24f9f98968a5..96b21b0dac1e 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -170,6 +170,14 @@ static const struct usb_device_id usb_quirk_list[] = { /* M-Systems Flash Disk Pioneers */ { USB_DEVICE(0x08ec, 0x1000), .driver_info = USB_QUIRK_RESET_RESUME }, + /* Baum Vario Ultra */ + { USB_DEVICE(0x0904, 0x6101), .driver_info = + USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL }, + { USB_DEVICE(0x0904, 0x6102), .driver_info = + USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL }, + { USB_DEVICE(0x0904, 0x6103), .driver_info = + USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL }, + /* Keytouch QWERTY Panel keyboard */ { USB_DEVICE(0x0926, 0x3333), .driver_info = USB_QUIRK_CONFIG_INTF_STRINGS }, diff --git a/include/linux/usb/quirks.h b/include/linux/usb/quirks.h index 1d0043dc34e4..de2a722fe3cf 100644 --- a/include/linux/usb/quirks.h +++ b/include/linux/usb/quirks.h @@ -50,4 +50,10 @@ /* device can't handle Link Power Management */ #define USB_QUIRK_NO_LPM BIT(10) +/* + * Device reports its bInterval as linear frames instead of the + * USB 2.0 calculation. + */ +#define USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL BIT(11) + #endif /* __LINUX_USB_QUIRKS_H */ From 73490abe249c238e2141f62995e2cc2d4ae392db Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Mar 2017 13:47:50 +0100 Subject: [PATCH 114/521] USB: uss720: fix NULL-deref at probe commit f259ca3eed6e4b79ac3d5c5c9fb259fb46e86217 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer or accessing memory beyond the endpoint array should a malicious device lack the expected endpoints. Note that the endpoint access that causes the NULL-deref is currently only used for debugging purposes during probe so the oops only happens when dynamic debugging is enabled. This means the driver could be rewritten to continue to accept device with only two endpoints, should such devices exist. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/misc/uss720.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/usb/misc/uss720.c b/drivers/usb/misc/uss720.c index bbd029c9c725..442b6631162e 100644 --- a/drivers/usb/misc/uss720.c +++ b/drivers/usb/misc/uss720.c @@ -711,6 +711,11 @@ static int uss720_probe(struct usb_interface *intf, interface = intf->cur_altsetting; + if (interface->desc.bNumEndpoints < 3) { + usb_put_dev(usbdev); + return -ENODEV; + } + /* * Allocate parport interface */ From a7712869e2e7cb1a5add2a8613f04e6c3647ef38 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Mar 2017 13:47:49 +0100 Subject: [PATCH 115/521] USB: lvtest: fix NULL-deref at probe commit 1dc56c52d2484be09c7398a5207d6b11a4256be9 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should the probed device lack endpoints. Note that this driver does not bind to any devices by default. Fixes: ce21bfe603b3 ("USB: Add LVS Test device driver") Cc: Pratyush Anand Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/misc/lvstest.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/misc/lvstest.c b/drivers/usb/misc/lvstest.c index 86b4e4b2ab9a..383fa007348f 100644 --- a/drivers/usb/misc/lvstest.c +++ b/drivers/usb/misc/lvstest.c @@ -370,6 +370,10 @@ static int lvs_rh_probe(struct usb_interface *intf, hdev = interface_to_usbdev(intf); desc = intf->cur_altsetting; + + if (desc->desc.bNumEndpoints < 1) + return -ENODEV; + endpoint = &desc->endpoint[0].desc; /* valid only for SS root hub */ From d6389d6abb8aff1d67ea64ef5b295ab3f4967d2d Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Mar 2017 13:47:48 +0100 Subject: [PATCH 116/521] USB: idmouse: fix NULL-deref at probe commit b0addd3fa6bcd119be9428996d5d4522479ab240 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/misc/idmouse.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/misc/idmouse.c b/drivers/usb/misc/idmouse.c index 4e38683c653c..6d4e75785710 100644 --- a/drivers/usb/misc/idmouse.c +++ b/drivers/usb/misc/idmouse.c @@ -346,6 +346,9 @@ static int idmouse_probe(struct usb_interface *interface, if (iface_desc->desc.bInterfaceClass != 0x0A) return -ENODEV; + if (iface_desc->desc.bNumEndpoints < 1) + return -ENODEV; + /* allocate memory for our device state and initialize it */ dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (dev == NULL) From a7cb1fafe429ebd9ecf7768edc577662cbb6011e Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Mar 2017 13:47:51 +0100 Subject: [PATCH 117/521] USB: wusbcore: fix NULL-deref at probe commit 03ace948a4eb89d1cf51c06afdfc41ebca5fdb27 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer or accessing memory beyond the endpoint array should a malicious device lack the expected endpoints. This specifically fixes the NULL-pointer dereference when probing HWA HC devices. Fixes: df3654236e31 ("wusb: add the Wire Adapter (WA) core") Cc: Inaky Perez-Gonzalez Cc: David Vrabel Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/wusbcore/wa-hc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/wusbcore/wa-hc.c b/drivers/usb/wusbcore/wa-hc.c index 252c7bd9218a..d01496fd27fe 100644 --- a/drivers/usb/wusbcore/wa-hc.c +++ b/drivers/usb/wusbcore/wa-hc.c @@ -39,6 +39,9 @@ int wa_create(struct wahc *wa, struct usb_interface *iface, int result; struct device *dev = &iface->dev; + if (iface->cur_altsetting->desc.bNumEndpoints < 3) + return -ENODEV; + result = wa_rpipes_create(wa); if (result < 0) goto error_rpipes_create; From 47285be050ca3e9ca45f22966b0b655b5b83c250 Mon Sep 17 00:00:00 2001 From: Bin Liu Date: Fri, 10 Mar 2017 14:43:35 -0600 Subject: [PATCH 118/521] usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer commit 0090114d336a9604aa2d90bc83f20f7cd121b76c upstream. The CPPI 4.1 driver polls register to workaround the premature TX interrupt issue, but it causes audio playback underrun when triggered in Isoch transfers. Isoch doesn't do back-to-back transfers, the TX should be done by the time the next transfer is scheduled. So skip this polling workaround for Isoch transfer. Fixes: a655f481d83d6 ("usb: musb: musb_cppi41: handle pre-mature TX complete interrupt") Reported-by: Alexandre Bailon Acked-by: Sebastian Andrzej Siewior Tested-by: Alexandre Bailon Signed-off-by: Bin Liu Signed-off-by: Greg Kroah-Hartman --- drivers/usb/musb/musb_cppi41.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/usb/musb/musb_cppi41.c b/drivers/usb/musb/musb_cppi41.c index e499b862a946..88f26ac2a185 100644 --- a/drivers/usb/musb/musb_cppi41.c +++ b/drivers/usb/musb/musb_cppi41.c @@ -250,8 +250,27 @@ static void cppi41_dma_callback(void *private_data) transferred < cppi41_channel->packet_sz) cppi41_channel->prog_len = 0; - if (cppi41_channel->is_tx) - empty = musb_is_tx_fifo_empty(hw_ep); + if (cppi41_channel->is_tx) { + u8 type; + + if (is_host_active(musb)) + type = hw_ep->out_qh->type; + else + type = hw_ep->ep_in.type; + + if (type == USB_ENDPOINT_XFER_ISOC) + /* + * Don't use the early-TX-interrupt workaround below + * for Isoch transfter. Since Isoch are periodic + * transfer, by the time the next transfer is + * scheduled, the current one should be done already. + * + * This avoids audio playback underrun issue. + */ + empty = true; + else + empty = musb_is_tx_fifo_empty(hw_ep); + } if (!cppi41_channel->is_tx || empty) { cppi41_trans_done(cppi41_channel); From 14a2032287d43bbffadf22752e40830000aad503 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 8 Mar 2017 10:19:36 -0800 Subject: [PATCH 119/521] usb: hub: Fix crash after failure to read BOS descriptor commit 7b2db29fbb4e766fcd02207eb2e2087170bd6ebc upstream. If usb_get_bos_descriptor() returns an error, usb->bos will be NULL. Nevertheless, it is dereferenced unconditionally in hub_set_initial_usb2_lpm_policy() if usb2_hw_lpm_capable is set. This results in a crash. usb 5-1: unable to get BOS descriptor ... Unable to handle kernel NULL pointer dereference at virtual address 00000008 pgd = ffffffc00165f000 [00000008] *pgd=000000000174f003, *pud=000000000174f003, *pmd=0000000001750003, *pte=00e8000001751713 Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac [ ... ] CPU: 5 PID: 3353 Comm: kworker/5:3 Tainted: G B 4.4.52 #480 Hardware name: Google Kevin (DT) Workqueue: events driver_set_config_work task: ffffffc0c3690000 ti: ffffffc0ae9a8000 task.ti: ffffffc0ae9a8000 PC is at hub_port_init+0xc3c/0xd10 LR is at hub_port_init+0xc3c/0xd10 ... Call trace: [] hub_port_init+0xc3c/0xd10 [] usb_reset_and_verify_device+0x15c/0x82c [] usb_reset_device+0xe4/0x298 [] rtl8152_probe+0x84/0x9b0 [r8152] [] usb_probe_interface+0x244/0x2f8 [] driver_probe_device+0x180/0x3b4 [] __device_attach_driver+0xb4/0xe0 [] bus_for_each_drv+0xb4/0xe4 [] __device_attach+0xd0/0x158 [] device_initial_probe+0x24/0x30 [] bus_probe_device+0x50/0xe4 [] device_add+0x414/0x738 [] usb_set_configuration+0x89c/0x914 [] driver_set_config_work+0xc0/0xf0 [] process_one_work+0x390/0x6b8 [] worker_thread+0x480/0x610 [] kthread+0x164/0x178 [] ret_from_fork+0x10/0x40 Since we don't know anything about LPM capabilities without BOS descriptor, don't attempt to enable LPM if it is not available. Fixes: 890dae886721 ("xhci: Enable LPM support only for hardwired ...") Cc: Mathias Nyman Signed-off-by: Guenter Roeck Acked-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index f52d8abf6979..9e62c93af96e 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -4199,7 +4199,7 @@ static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev) struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN; - if (!udev->usb2_hw_lpm_capable) + if (!udev->usb2_hw_lpm_capable || !udev->bos) return; if (hub) From 815321da2e267c5c44a2900b39ac92632a9d6e80 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Mar 2017 13:47:53 +0100 Subject: [PATCH 120/521] uwb: i1480-dfu: fix NULL-deref at probe commit 4ce362711d78a4999011add3115b8f4b0bc25e8c upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Note that the dereference happens in the cmd and wait_init_done callbacks which are called during probe. Fixes: 1ba47da52712 ("uwb: add the i1480 DFU driver") Cc: Inaky Perez-Gonzalez Cc: David Vrabel Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/uwb/i1480/dfu/usb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/uwb/i1480/dfu/usb.c b/drivers/uwb/i1480/dfu/usb.c index 2bfc846ac071..6345e85822a4 100644 --- a/drivers/uwb/i1480/dfu/usb.c +++ b/drivers/uwb/i1480/dfu/usb.c @@ -362,6 +362,9 @@ int i1480_usb_probe(struct usb_interface *iface, const struct usb_device_id *id) result); } + if (iface->cur_altsetting->desc.bNumEndpoints < 1) + return -ENODEV; + result = -ENOMEM; i1480_usb = kzalloc(sizeof(*i1480_usb), GFP_KERNEL); if (i1480_usb == NULL) { From 2c251e568e1a5dfbdab7156eaa848cd45b3cb127 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Mar 2017 13:47:52 +0100 Subject: [PATCH 121/521] uwb: hwa-rc: fix NULL-deref at probe commit daf229b15907fbfdb6ee183aac8ca428cb57e361 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Note that the dereference happens in the start callback which is called during probe. Fixes: de520b8bd552 ("uwb: add HWA radio controller driver") Cc: Inaky Perez-Gonzalez Cc: David Vrabel Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/uwb/hwa-rc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/uwb/hwa-rc.c b/drivers/uwb/hwa-rc.c index 0257f35cfb9d..e75bbe5a10cd 100644 --- a/drivers/uwb/hwa-rc.c +++ b/drivers/uwb/hwa-rc.c @@ -825,6 +825,9 @@ static int hwarc_probe(struct usb_interface *iface, struct hwarc *hwarc; struct device *dev = &iface->dev; + if (iface->cur_altsetting->desc.bNumEndpoints < 1) + return -ENODEV; + result = -ENOMEM; uwb_rc = uwb_rc_alloc(); if (uwb_rc == NULL) { From dcf879cb9ed37f4e4cb242aaa17316d6c37404dc Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Mar 2017 13:40:22 +0100 Subject: [PATCH 122/521] mmc: ushc: fix NULL-deref at probe commit 181302dc7239add8ab1449c23ecab193f52ee6ab upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Fixes: 53f3a9e26ed5 ("mmc: USB SD Host Controller (USHC) driver") Cc: David Vrabel Signed-off-by: Johan Hovold Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/ushc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/mmc/host/ushc.c b/drivers/mmc/host/ushc.c index d2c386f09d69..1d843357422e 100644 --- a/drivers/mmc/host/ushc.c +++ b/drivers/mmc/host/ushc.c @@ -426,6 +426,9 @@ static int ushc_probe(struct usb_interface *intf, const struct usb_device_id *id struct ushc_data *ushc; int ret; + if (intf->cur_altsetting->desc.bNumEndpoints < 1) + return -ENODEV; + mmc = mmc_alloc_host(sizeof(struct ushc_data), &intf->dev); if (mmc == NULL) return -ENOMEM; From 8f189e1d0ecac38ac69b44b89f2561c3bcffacbd Mon Sep 17 00:00:00 2001 From: Michael Engl Date: Tue, 3 Oct 2017 13:57:00 +0100 Subject: [PATCH 123/521] iio: adc: ti_am335x_adc: fix fifo overrun recovery commit e83bb3e6f3efa21f4a9d883a25d0ecd9dfb431e1 upstream. The tiadc_irq_h(int irq, void *private) function is handling FIFO overruns by clearing flags, disabling and enabling the ADC to recover. If the ADC is running in continuous mode a FIFO overrun happens regularly. If the disabling of the ADC happens concurrently with a new conversion. It might happen that the enabling of the ADC is ignored by the hardware. This stops the ADC permanently. No more interrupts are triggered. According to the AM335x Reference Manual (SPRUH73H October 2011 - Revised April 2013 - Chapter 12.4 and 12.5) it is necessary to check the ADC FSM bits in REG_ADCFSM before enabling the ADC again. Because the disabling of the ADC is done right after the current conversion has been finished. To trigger this bug it is necessary to run the ADC in continuous mode. The ADC values of all channels need to be read in an endless loop. The bug appears within the first 6 hours (~5.4 million handled FIFO overruns). The user space application will hang on reading new values from the character device. Fixes: ca9a563805f7a ("iio: ti_am335x_adc: Add continuous sampling support") Signed-off-by: Michael Engl Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/ti_am335x_adc.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/iio/adc/ti_am335x_adc.c b/drivers/iio/adc/ti_am335x_adc.c index 0470fc843d4e..9b6854607d73 100644 --- a/drivers/iio/adc/ti_am335x_adc.c +++ b/drivers/iio/adc/ti_am335x_adc.c @@ -151,7 +151,9 @@ static irqreturn_t tiadc_irq_h(int irq, void *private) { struct iio_dev *indio_dev = private; struct tiadc_device *adc_dev = iio_priv(indio_dev); - unsigned int status, config; + unsigned int status, config, adc_fsm; + unsigned short count = 0; + status = tiadc_readl(adc_dev, REG_IRQSTATUS); /* @@ -165,6 +167,15 @@ static irqreturn_t tiadc_irq_h(int irq, void *private) tiadc_writel(adc_dev, REG_CTRL, config); tiadc_writel(adc_dev, REG_IRQSTATUS, IRQENB_FIFO1OVRRUN | IRQENB_FIFO1UNDRFLW | IRQENB_FIFO1THRES); + + /* wait for idle state. + * ADC needs to finish the current conversion + * before disabling the module + */ + do { + adc_fsm = tiadc_readl(adc_dev, REG_ADCFSM); + } while (adc_fsm != 0x10 && count++ < 100); + tiadc_writel(adc_dev, REG_CTRL, (config | CNTRLREG_TSCSSENB)); return IRQ_HANDLED; } else if (status & IRQENB_FIFO1THRES) { From 7413d1f8991e7d5c240d89a3feb35e2a54d27baf Mon Sep 17 00:00:00 2001 From: Song Hongyan Date: Wed, 22 Feb 2017 17:17:38 +0800 Subject: [PATCH 124/521] iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3 commit 3bec247474469f769af41e8c80d3a100dd97dd76 upstream. In function _hid_sensor_power_state(), when hid_sensor_read_poll_value() is called, sensor's all properties will be updated by the value from sensor hardware/firmware. In some implementation, sensor hardware/firmware will do a power cycle during S3. In this case, after resume, once hid_sensor_read_poll_value() is called, sensor's all properties which are kept by driver during S3 will be changed to default value. But instead, if a set feature function is called first, sensor hardware/firmware will be recovered to the last status. So change the sensor_hub_set_feature() calling order to behind of set feature function to avoid sensor properties lose. Signed-off-by: Song Hongyan Acked-by: Srinivas Pandruvada Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/common/hid-sensors/hid-sensor-trigger.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iio/common/hid-sensors/hid-sensor-trigger.c b/drivers/iio/common/hid-sensors/hid-sensor-trigger.c index 595511022795..0a86ef43e781 100644 --- a/drivers/iio/common/hid-sensors/hid-sensor-trigger.c +++ b/drivers/iio/common/hid-sensors/hid-sensor-trigger.c @@ -51,8 +51,6 @@ static int _hid_sensor_power_state(struct hid_sensor_common *st, bool state) st->report_state.report_id, st->report_state.index, HID_USAGE_SENSOR_PROP_REPORTING_STATE_ALL_EVENTS_ENUM); - - poll_value = hid_sensor_read_poll_value(st); } else { int val; @@ -89,7 +87,9 @@ static int _hid_sensor_power_state(struct hid_sensor_common *st, bool state) sensor_hub_get_feature(st->hsdev, st->power_state.report_id, st->power_state.index, sizeof(state_val), &state_val); - if (state && poll_value) + if (state) + poll_value = hid_sensor_read_poll_value(st); + if (poll_value > 0) msleep_interruptible(poll_value * 2); return 0; From c7d1545c48ffbf19185753c1d786e5aab950d3e3 Mon Sep 17 00:00:00 2001 From: Sudip Mukherjee Date: Mon, 6 Mar 2017 23:23:42 +0000 Subject: [PATCH 125/521] parport: fix attempt to write duplicate procfiles commit 03270c6ac6207fc55bbf9d20d195029dca210c79 upstream. Usually every parallel port will have a single pardev registered with it. But ppdev driver is an exception. This userspace parallel port driver allows to create multiple parrallel port devices for a single parallel port. And as a result we were having a nice warning like: "sysctl table check failed: /dev/parport/parport0/devices/ppdev0/timeslice Sysctl already exists" Use the same logic as used in parport_register_device() and register the proc files only once for each parallel port. Fixes: 6fa45a226897 ("parport: add device-model to parport subsystem") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1414656 Bugzilla: https://bugs.archlinux.org/task/52322 Tested-by: James Feeney Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- drivers/parport/share.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/parport/share.c b/drivers/parport/share.c index 5ce5ef211bdb..754f21fd9768 100644 --- a/drivers/parport/share.c +++ b/drivers/parport/share.c @@ -936,8 +936,10 @@ parport_register_dev_model(struct parport *port, const char *name, * pardevice fields. -arca */ port->ops->init_state(par_dev, par_dev->state); - port->proc_device = par_dev; - parport_device_proc_register(par_dev); + if (!test_and_set_bit(PARPORT_DEVPROC_REGISTERED, &port->devflags)) { + port->proc_device = par_dev; + parport_device_proc_register(par_dev); + } return par_dev; From 27d9bf096406439ce406c82291cfe09c6653f94c Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 15 Mar 2017 14:52:02 -0400 Subject: [PATCH 126/521] ext4: mark inode dirty after converting inline directory commit b9cf625d6ecde0d372e23ae022feead72b4228a6 upstream. If ext4_convert_inline_data() was called on a directory with inline data, the filesystem was left in an inconsistent state (as considered by e2fsck) because the file size was not increased to cover the new block. This happened because the inode was not marked dirty after i_disksize was updated. Fix this by marking the inode dirty at the end of ext4_finish_convert_inline_dir(). This bug was probably not noticed before because most users mark the inode dirty afterwards for other reasons. But if userspace executed FS_IOC_SET_ENCRYPTION_POLICY with invalid parameters, as exercised by 'kvm-xfstests -c adv generic/396', then the inode was never marked dirty after updating i_disksize. Fixes: 3c47d54170b6a678875566b1b8d6dcf57904e49b Signed-off-by: Eric Biggers Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inline.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index d4be4e23bc21..dad8e7bdf0a6 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -1158,10 +1158,9 @@ static int ext4_finish_convert_inline_dir(handle_t *handle, set_buffer_uptodate(dir_block); err = ext4_handle_dirty_dirent_node(handle, inode, dir_block); if (err) - goto out; + return err; set_buffer_verified(dir_block); -out: - return err; + return ext4_mark_inode_dirty(handle, inode); } static int ext4_convert_inline_data_nolock(handle_t *handle, From 52e40a2fcc3952f1edd2f810c36d05eece984cba Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Mon, 20 Mar 2017 19:50:29 +0200 Subject: [PATCH 127/521] mmc: sdhci: Do not disable interrupts while waiting for clock commit e2ebfb2142acefecc2496e71360f50d25726040b upstream. Disabling interrupts for even a millisecond can cause problems for some devices. That can happen when sdhci changes clock frequency because it waits for the clock to become stable under a spin lock. The spin lock is not necessary here. Anything that is racing with changes to the I/O state is already broken. The mmc core already provides synchronization via "claiming" the host. Although the spin lock probably should be removed from the code paths that lead to this point, such a patch would touch too much code to be suitable for stable trees. Consequently, for this patch, just drop the spin lock while waiting. Signed-off-by: Adrian Hunter Signed-off-by: Ulf Hansson Tested-by: Ludovic Desroches Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index bda164089904..62d37d2ac557 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -1274,7 +1274,9 @@ void sdhci_set_clock(struct sdhci_host *host, unsigned int clock) return; } timeout--; - mdelay(1); + spin_unlock_irq(&host->lock); + usleep_range(900, 1100); + spin_lock_irq(&host->lock); } clk |= SDHCI_CLOCK_CARD_EN; From c856b66c8aac95c74c0ddad4ce1d55a6741e23db Mon Sep 17 00:00:00 2001 From: Ankur Arora Date: Tue, 21 Mar 2017 15:43:38 -0700 Subject: [PATCH 128/521] xen/acpi: upload PM state from init-domain to Xen commit 1914f0cd203c941bba72f9452c8290324f1ef3dc upstream. This was broken in commit cd979883b9ed ("xen/acpi-processor: fix enabling interrupts on syscore_resume"). do_suspend (from xen/manage.c) and thus xen_resume_notifier never get called on the initial-domain at resume (it is if running as guest.) The rationale for the breaking change was that upload_pm_data() potentially does blocking work in syscore_resume(). This patch addresses the original issue by scheduling upload_pm_data() to execute in workqueue context. Cc: Stanislaw Gruszka Based-on-patch-by: Konrad Wilk Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Stanislaw Gruszka Signed-off-by: Ankur Arora Signed-off-by: Boris Ostrovsky Signed-off-by: Greg Kroah-Hartman --- drivers/xen/xen-acpi-processor.c | 34 ++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/drivers/xen/xen-acpi-processor.c b/drivers/xen/xen-acpi-processor.c index 611f9c11da85..2e319d0c395d 100644 --- a/drivers/xen/xen-acpi-processor.c +++ b/drivers/xen/xen-acpi-processor.c @@ -27,10 +27,10 @@ #include #include #include +#include #include #include #include -#include #include #include @@ -466,15 +466,33 @@ static int xen_upload_processor_pm_data(void) return rc; } -static int xen_acpi_processor_resume(struct notifier_block *nb, - unsigned long action, void *data) +static void xen_acpi_processor_resume_worker(struct work_struct *dummy) { + int rc; + bitmap_zero(acpi_ids_done, nr_acpi_bits); - return xen_upload_processor_pm_data(); + + rc = xen_upload_processor_pm_data(); + if (rc != 0) + pr_info("ACPI data upload failed, error = %d\n", rc); } -struct notifier_block xen_acpi_processor_resume_nb = { - .notifier_call = xen_acpi_processor_resume, +static void xen_acpi_processor_resume(void) +{ + static DECLARE_WORK(wq, xen_acpi_processor_resume_worker); + + /* + * xen_upload_processor_pm_data() calls non-atomic code. + * However, the context for xen_acpi_processor_resume is syscore + * with only the boot CPU online and in an atomic context. + * + * So defer the upload for some point safer. + */ + schedule_work(&wq); +} + +static struct syscore_ops xap_syscore_ops = { + .resume = xen_acpi_processor_resume, }; static int __init xen_acpi_processor_init(void) @@ -527,7 +545,7 @@ static int __init xen_acpi_processor_init(void) if (rc) goto err_unregister; - xen_resume_notifier_register(&xen_acpi_processor_resume_nb); + register_syscore_ops(&xap_syscore_ops); return 0; err_unregister: @@ -544,7 +562,7 @@ static void __exit xen_acpi_processor_exit(void) { int i; - xen_resume_notifier_unregister(&xen_acpi_processor_resume_nb); + unregister_syscore_ops(&xap_syscore_ops); kfree(acpi_ids_done); kfree(acpi_id_present); kfree(acpi_id_cst_present); From 55b6c187cf9d12d8e667ccfa5386bd162fc7ae2b Mon Sep 17 00:00:00 2001 From: Koos Vriezen Date: Wed, 1 Mar 2017 21:02:50 +0100 Subject: [PATCH 129/521] iommu/vt-d: Fix NULL pointer dereference in device_to_iommu commit 5003ae1e735e6bfe4679d9bed6846274f322e77e upstream. The function device_to_iommu() in the Intel VT-d driver lacks a NULL-ptr check, resulting in this oops at boot on some platforms: BUG: unable to handle kernel NULL pointer dereference at 00000000000007ab IP: [] device_to_iommu+0x11a/0x1a0 PGD 0 [...] Call Trace: ? find_or_alloc_domain.constprop.29+0x1a/0x300 ? dw_dma_probe+0x561/0x580 [dw_dmac_core] ? __get_valid_domain_for_dev+0x39/0x120 ? __intel_map_single+0x138/0x180 ? intel_alloc_coherent+0xb6/0x120 ? sst_hsw_dsp_init+0x173/0x420 [snd_soc_sst_haswell_pcm] ? mutex_lock+0x9/0x30 ? kernfs_add_one+0xdb/0x130 ? devres_add+0x19/0x60 ? hsw_pcm_dev_probe+0x46/0xd0 [snd_soc_sst_haswell_pcm] ? platform_drv_probe+0x30/0x90 ? driver_probe_device+0x1ed/0x2b0 ? __driver_attach+0x8f/0xa0 ? driver_probe_device+0x2b0/0x2b0 ? bus_for_each_dev+0x55/0x90 ? bus_add_driver+0x110/0x210 ? 0xffffffffa11ea000 ? driver_register+0x52/0xc0 ? 0xffffffffa11ea000 ? do_one_initcall+0x32/0x130 ? free_vmap_area_noflush+0x37/0x70 ? kmem_cache_alloc+0x88/0xd0 ? do_init_module+0x51/0x1c4 ? load_module+0x1ee9/0x2430 ? show_taint+0x20/0x20 ? kernel_read_file+0xfd/0x190 ? SyS_finit_module+0xa3/0xb0 ? do_syscall_64+0x4a/0xb0 ? entry_SYSCALL64_slow_path+0x25/0x25 Code: 78 ff ff ff 4d 85 c0 74 ee 49 8b 5a 10 0f b6 9b e0 00 00 00 41 38 98 e0 00 00 00 77 da 0f b6 eb 49 39 a8 88 00 00 00 72 ce eb 8f <41> f6 82 ab 07 00 00 04 0f 85 76 ff ff ff 0f b6 4d 08 88 0e 49 RIP [] device_to_iommu+0x11a/0x1a0 RSP CR2: 00000000000007ab ---[ end trace 16f974b6d58d0aad ]--- Add the missing pointer check. Fixes: 1c387188c60f53b338c20eee32db055dfe022a9b ("iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions") Signed-off-by: Koos Vriezen Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/intel-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index f0fc6f7b5d98..0628372f3591 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -908,7 +908,7 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf * which we used for the IOMMU lookup. Strictly speaking * we could do this for all PCI devices; we only need to * get the BDF# from the scope table for ACPI matches. */ - if (pdev->is_virtfn) + if (pdev && pdev->is_virtfn) goto got_pdev; *bus = drhd->devices[i].bus; From 2705b183263bd6e2969a648d2c7353716ca1d7a8 Mon Sep 17 00:00:00 2001 From: Nicolas Ferre Date: Tue, 14 Mar 2017 09:38:04 +0100 Subject: [PATCH 130/521] ARM: at91: pm: cpu_idle: switch DDR to power-down mode commit 60b89f1928af80b546b5c3fd8714a62f6f4b8844 upstream. On some DDR controllers, compatible with the sama5d3 one, the sequence to enter/exit/re-enter the self-refresh mode adds more constrains than what is currently written in the at91_idle driver. An actual access to the DDR chip is needed between exit and re-enter of this mode which is somehow difficult to implement. This sequence can completely hang the SoC. It is particularly experienced on parts which embed a L2 cache if the code run between IDLE calls fits in it... Moreover, as the intention is to enter and exit pretty rapidly from IDLE, the power-down mode is a good candidate. So now we use power-down instead of self-refresh. As we can simplify the code for sama5d3 compatible DDR controllers, we instantiate a new sama5d3_ddr_standby() function. Signed-off-by: Nicolas Ferre Fixes: 017b5522d5e3 ("ARM: at91: Add new binding for sama5d3-ddramc") Signed-off-by: Alexandre Belloni Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-at91/pm.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/arm/mach-at91/pm.c b/arch/arm/mach-at91/pm.c index 23726fb31741..d687f860a2da 100644 --- a/arch/arm/mach-at91/pm.c +++ b/arch/arm/mach-at91/pm.c @@ -286,6 +286,22 @@ static void at91_ddr_standby(void) at91_ramc_write(1, AT91_DDRSDRC_LPR, saved_lpr1); } +static void sama5d3_ddr_standby(void) +{ + u32 lpr0; + u32 saved_lpr0; + + saved_lpr0 = at91_ramc_read(0, AT91_DDRSDRC_LPR); + lpr0 = saved_lpr0 & ~AT91_DDRSDRC_LPCB; + lpr0 |= AT91_DDRSDRC_LPCB_POWER_DOWN; + + at91_ramc_write(0, AT91_DDRSDRC_LPR, lpr0); + + cpu_do_idle(); + + at91_ramc_write(0, AT91_DDRSDRC_LPR, saved_lpr0); +} + /* We manage both DDRAM/SDRAM controllers, we need more than one value to * remember. */ @@ -320,7 +336,7 @@ static const struct of_device_id const ramc_ids[] __initconst = { { .compatible = "atmel,at91rm9200-sdramc", .data = at91rm9200_standby }, { .compatible = "atmel,at91sam9260-sdramc", .data = at91sam9_sdram_standby }, { .compatible = "atmel,at91sam9g45-ddramc", .data = at91_ddr_standby }, - { .compatible = "atmel,sama5d3-ddramc", .data = at91_ddr_standby }, + { .compatible = "atmel,sama5d3-ddramc", .data = sama5d3_ddr_standby }, { /*sentinel*/ } }; From e1af444e52ce1b08cd6534e61f8da7aa55b31880 Mon Sep 17 00:00:00 2001 From: Nicolas Ferre Date: Tue, 26 Jan 2016 17:30:18 +0100 Subject: [PATCH 131/521] ARM: dts: at91: sama5d2: add dma properties to UART nodes commit b1708b72a0959a032cd2eebb77fa9086ea3e0c84 upstream. The dmas/dma-names properties are added to the UART nodes. Note that additional properties are needed to enable them at the board level: check bindings for details. Signed-off-by: Nicolas Ferre Signed-off-by: Alexandre Belloni Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/sama5d2.dtsi | 35 ++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/arch/arm/boot/dts/sama5d2.dtsi b/arch/arm/boot/dts/sama5d2.dtsi index 4dfca8fc49b3..1bc61ece2589 100644 --- a/arch/arm/boot/dts/sama5d2.dtsi +++ b/arch/arm/boot/dts/sama5d2.dtsi @@ -856,6 +856,13 @@ uart0: serial@f801c000 { compatible = "atmel,at91sam9260-usart"; reg = <0xf801c000 0x100>; interrupts = <24 IRQ_TYPE_LEVEL_HIGH 7>; + dmas = <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(35))>, + <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(36))>; + dma-names = "tx", "rx"; clocks = <&uart0_clk>; clock-names = "usart"; status = "disabled"; @@ -865,6 +872,13 @@ uart1: serial@f8020000 { compatible = "atmel,at91sam9260-usart"; reg = <0xf8020000 0x100>; interrupts = <25 IRQ_TYPE_LEVEL_HIGH 7>; + dmas = <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(37))>, + <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(38))>; + dma-names = "tx", "rx"; clocks = <&uart1_clk>; clock-names = "usart"; status = "disabled"; @@ -874,6 +888,13 @@ uart2: serial@f8024000 { compatible = "atmel,at91sam9260-usart"; reg = <0xf8024000 0x100>; interrupts = <26 IRQ_TYPE_LEVEL_HIGH 7>; + dmas = <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(39))>, + <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(40))>; + dma-names = "tx", "rx"; clocks = <&uart2_clk>; clock-names = "usart"; status = "disabled"; @@ -985,6 +1006,13 @@ uart3: serial@fc008000 { compatible = "atmel,at91sam9260-usart"; reg = <0xfc008000 0x100>; interrupts = <27 IRQ_TYPE_LEVEL_HIGH 7>; + dmas = <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(41))>, + <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(42))>; + dma-names = "tx", "rx"; clocks = <&uart3_clk>; clock-names = "usart"; status = "disabled"; @@ -993,6 +1021,13 @@ uart3: serial@fc008000 { uart4: serial@fc00c000 { compatible = "atmel,at91sam9260-usart"; reg = <0xfc00c000 0x100>; + dmas = <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(43))>, + <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(44))>; + dma-names = "tx", "rx"; interrupts = <28 IRQ_TYPE_LEVEL_HIGH 7>; clocks = <&uart4_clk>; clock-names = "usart"; From 17503963206584333b674740ba75b5079ea7e196 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 21 Mar 2017 11:36:06 +0530 Subject: [PATCH 132/521] cpufreq: Restore policy min/max limits on CPU online commit ff010472fb75670cb5c08671e820eeea3af59c87 upstream. On CPU online the cpufreq core restores the previous governor (or the previous "policy" setting for ->setpolicy drivers), but it does not restore the min/max limits at the same time, which is confusing, inconsistent and real pain for users who set the limits and then suspend/resume the system (using full suspend), in which case the limits are reset on all CPUs except for the boot one. Fix this by making cpufreq_online() restore the limits when an inactive policy is brought online. The commit log and patch are inspired from Rafael's earlier work. Reported-by: Rafael J. Wysocki Signed-off-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/cpufreq/cpufreq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 86fa9fdc8323..38b363f4316b 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1186,6 +1186,9 @@ static int cpufreq_online(unsigned int cpu) for_each_cpu(j, policy->related_cpus) per_cpu(cpufreq_cpu_data, j) = policy; write_unlock_irqrestore(&cpufreq_driver_lock, flags); + } else { + policy->min = policy->user_policy.min; + policy->max = policy->user_policy.max; } if (cpufreq_driver->get && !cpufreq_driver->setpolicy) { From 73dd1edf50a6bdf33046c2e4aa0b1ad4fef71a71 Mon Sep 17 00:00:00 2001 From: Tomasz Majchrzak Date: Thu, 28 Jul 2016 10:28:25 +0200 Subject: [PATCH 133/521] raid10: increment write counter after bio is split commit 9b622e2bbcf049c82e2550d35fb54ac205965f50 upstream. md pending write counter must be incremented after bio is split, otherwise it gets decremented too many times in end bio callback and becomes negative. Signed-off-by: Tomasz Majchrzak Reviewed-by: Artur Paszkiewicz Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid10.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 122af340a531..a92979e704e3 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1072,6 +1072,8 @@ static void __make_request(struct mddev *mddev, struct bio *bio) int max_sectors; int sectors; + md_write_start(mddev, bio); + /* * Register the new request and wait if the reconstruction * thread has put up a bar for new requests. @@ -1455,8 +1457,6 @@ static void make_request(struct mddev *mddev, struct bio *bio) return; } - md_write_start(mddev, bio); - do { /* From 48da8f817b9db7909e5758257bdc84a6c611d99a Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Wed, 1 Mar 2017 17:33:27 +0100 Subject: [PATCH 134/521] libceph: don't set weight to IN when OSD is destroyed commit b581a5854eee4b7851dedb0f8c2ceb54fb902c06 upstream. Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info, osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted. This changes the result of applying an incremental for clients, not just OSDs. Because CRUSH computations are obviously affected, pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on object placement, resulting in misdirected requests. Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f. Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals") Link: http://tracker.ceph.com/issues/19122 Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman --- net/ceph/osdmap.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c index ddc3573894b0..bc95e48d5cfb 100644 --- a/net/ceph/osdmap.c +++ b/net/ceph/osdmap.c @@ -1265,7 +1265,6 @@ static int decode_new_up_state_weight(void **p, void *end, if ((map->osd_state[osd] & CEPH_OSD_EXISTS) && (xorstate & CEPH_OSD_EXISTS)) { pr_info("osd%d does not exist\n", osd); - map->osd_weight[osd] = CEPH_OSD_IN; ret = set_primary_affinity(map, osd, CEPH_OSD_DEFAULT_PRIMARY_AFFINITY); if (ret) From c4cf86f69597d4547a736e3edd5b88ae61b68fa2 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Mon, 5 Dec 2016 12:38:38 +1100 Subject: [PATCH 135/521] xfs: don't allow di_size with high bit set commit ef388e2054feedaeb05399ed654bdb06f385d294 upstream. The on-disk field di_size is used to set i_size, which is a signed integer of loff_t. If the high bit of di_size is set, we'll end up with a negative i_size, which will cause all sorts of problems. Since the VFS won't let us create a file with such length, we should catch them here in the verifier too. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Cc: Nikolay Borisov Signed-off-by: Greg Kroah-Hartman --- fs/xfs/libxfs/xfs_inode_buf.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 1aabfda669b0..7183b7ea065b 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -299,6 +299,14 @@ xfs_dinode_verify( if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC)) return false; + /* don't allow invalid i_size */ + if (be64_to_cpu(dip->di_size) & (1ULL << 63)) + return false; + + /* No zero-length symlinks. */ + if (S_ISLNK(be16_to_cpu(dip->di_mode)) && dip->di_size == 0) + return false; + /* only version 3 or greater inodes are extensively verified here */ if (dip->di_version < 3) return true; From 7922c1becb36b61827a24ee32ffe7c39cf444efb Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Tue, 8 Nov 2016 12:55:18 +1100 Subject: [PATCH 136/521] xfs: fix up xfs_swap_extent_forks inline extent handling commit 4dfce57db6354603641132fac3c887614e3ebe81 upstream. There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] #14 [ffff8800a57bbe00] evict at ffffffff811e1b67 #15 [ffff8800a57bbe28] iput at ffffffff811e23a5 #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 #17 [ffff8800a57bbe88] dput at ffffffff811dd06c #18 [ffff8800a57bbea8] __fput at ffffffff811c823b #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. [nborisov: backported to 4.4] Cc: stable@vger.kernel.org Signed-off-by: Eric Sandeen Reviewed-by: Brian Foster Signed-off-by: Dave Chinner Signed-off-by: Nikolay Borisov Signed-off-by: Greg Kroah-Hartman -- fs/xfs/xfs_bmap_util.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- fs/xfs/xfs_bmap_util.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index dbae6490a79a..832764ee035a 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1713,6 +1713,7 @@ xfs_swap_extents( xfs_trans_t *tp; xfs_bstat_t *sbp = &sxp->sx_stat; xfs_ifork_t *tempifp, *ifp, *tifp; + xfs_extnum_t nextents; int src_log_flags, target_log_flags; int error = 0; int aforkblks = 0; @@ -1899,7 +1900,8 @@ xfs_swap_extents( * pointer. Otherwise it's already NULL or * pointing to the extent. */ - if (ip->i_d.di_nextents <= XFS_INLINE_EXTS) { + nextents = ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t); + if (nextents <= XFS_INLINE_EXTS) { ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext; } @@ -1918,7 +1920,8 @@ xfs_swap_extents( * pointer. Otherwise it's already NULL or * pointing to the extent. */ - if (tip->i_d.di_nextents <= XFS_INLINE_EXTS) { + nextents = tip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t); + if (nextents <= XFS_INLINE_EXTS) { tifp->if_u1.if_extents = tifp->if_u2.if_inline_ext; } From 74c8dd066cc06da0a7ee1a4da0ba565e3536a53a Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 15 Mar 2017 14:26:04 +0100 Subject: [PATCH 137/521] nl80211: fix dumpit error path RTNL deadlocks commit ea90e0dc8cecba6359b481e24d9c37160f6f524f upstream. Sowmini pointed out Dmitry's RTNL deadlock report to me, and it turns out to be perfectly accurate - there are various error paths that miss unlock of the RTNL. To fix those, change the locking a bit to not be conditional in all those nl80211_prepare_*_dump() functions, but make those require the RTNL to start with, and fix the buggy error paths. This also let me use sparse (by appropriately overriding the rtnl_lock/rtnl_unlock functions) to validate the changes. Reported-by: Sowmini Varadhan Reported-by: Dmitry Vyukov Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman --- net/wireless/nl80211.c | 121 ++++++++++++++++++----------------------- 1 file changed, 53 insertions(+), 68 deletions(-) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 1f0de6d74daa..9d0953e5734f 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -492,21 +492,17 @@ static int nl80211_prepare_wdev_dump(struct sk_buff *skb, { int err; - rtnl_lock(); - if (!cb->args[0]) { err = nlmsg_parse(cb->nlh, GENL_HDRLEN + nl80211_fam.hdrsize, nl80211_fam.attrbuf, nl80211_fam.maxattr, nl80211_policy); if (err) - goto out_unlock; + return err; *wdev = __cfg80211_wdev_from_attrs(sock_net(skb->sk), nl80211_fam.attrbuf); - if (IS_ERR(*wdev)) { - err = PTR_ERR(*wdev); - goto out_unlock; - } + if (IS_ERR(*wdev)) + return PTR_ERR(*wdev); *rdev = wiphy_to_rdev((*wdev)->wiphy); /* 0 is the first index - add 1 to parse only once */ cb->args[0] = (*rdev)->wiphy_idx + 1; @@ -516,10 +512,8 @@ static int nl80211_prepare_wdev_dump(struct sk_buff *skb, struct wiphy *wiphy = wiphy_idx_to_wiphy(cb->args[0] - 1); struct wireless_dev *tmp; - if (!wiphy) { - err = -ENODEV; - goto out_unlock; - } + if (!wiphy) + return -ENODEV; *rdev = wiphy_to_rdev(wiphy); *wdev = NULL; @@ -530,21 +524,11 @@ static int nl80211_prepare_wdev_dump(struct sk_buff *skb, } } - if (!*wdev) { - err = -ENODEV; - goto out_unlock; - } + if (!*wdev) + return -ENODEV; } return 0; - out_unlock: - rtnl_unlock(); - return err; -} - -static void nl80211_finish_wdev_dump(struct cfg80211_registered_device *rdev) -{ - rtnl_unlock(); } /* IE validation */ @@ -3884,9 +3868,10 @@ static int nl80211_dump_station(struct sk_buff *skb, int sta_idx = cb->args[2]; int err; + rtnl_lock(); err = nl80211_prepare_wdev_dump(skb, cb, &rdev, &wdev); if (err) - return err; + goto out_err; if (!wdev->netdev) { err = -EINVAL; @@ -3922,7 +3907,7 @@ static int nl80211_dump_station(struct sk_buff *skb, cb->args[2] = sta_idx; err = skb->len; out_err: - nl80211_finish_wdev_dump(rdev); + rtnl_unlock(); return err; } @@ -4639,9 +4624,10 @@ static int nl80211_dump_mpath(struct sk_buff *skb, int path_idx = cb->args[2]; int err; + rtnl_lock(); err = nl80211_prepare_wdev_dump(skb, cb, &rdev, &wdev); if (err) - return err; + goto out_err; if (!rdev->ops->dump_mpath) { err = -EOPNOTSUPP; @@ -4675,7 +4661,7 @@ static int nl80211_dump_mpath(struct sk_buff *skb, cb->args[2] = path_idx; err = skb->len; out_err: - nl80211_finish_wdev_dump(rdev); + rtnl_unlock(); return err; } @@ -4835,9 +4821,10 @@ static int nl80211_dump_mpp(struct sk_buff *skb, int path_idx = cb->args[2]; int err; + rtnl_lock(); err = nl80211_prepare_wdev_dump(skb, cb, &rdev, &wdev); if (err) - return err; + goto out_err; if (!rdev->ops->dump_mpp) { err = -EOPNOTSUPP; @@ -4870,7 +4857,7 @@ static int nl80211_dump_mpp(struct sk_buff *skb, cb->args[2] = path_idx; err = skb->len; out_err: - nl80211_finish_wdev_dump(rdev); + rtnl_unlock(); return err; } @@ -6806,9 +6793,12 @@ static int nl80211_dump_scan(struct sk_buff *skb, struct netlink_callback *cb) int start = cb->args[2], idx = 0; int err; + rtnl_lock(); err = nl80211_prepare_wdev_dump(skb, cb, &rdev, &wdev); - if (err) + if (err) { + rtnl_unlock(); return err; + } wdev_lock(wdev); spin_lock_bh(&rdev->bss_lock); @@ -6831,7 +6821,7 @@ static int nl80211_dump_scan(struct sk_buff *skb, struct netlink_callback *cb) wdev_unlock(wdev); cb->args[2] = idx; - nl80211_finish_wdev_dump(rdev); + rtnl_unlock(); return skb->len; } @@ -6915,9 +6905,10 @@ static int nl80211_dump_survey(struct sk_buff *skb, struct netlink_callback *cb) int res; bool radio_stats; + rtnl_lock(); res = nl80211_prepare_wdev_dump(skb, cb, &rdev, &wdev); if (res) - return res; + goto out_err; /* prepare_wdev_dump parsed the attributes */ radio_stats = nl80211_fam.attrbuf[NL80211_ATTR_SURVEY_RADIO_STATS]; @@ -6958,7 +6949,7 @@ static int nl80211_dump_survey(struct sk_buff *skb, struct netlink_callback *cb) cb->args[2] = survey_idx; res = skb->len; out_err: - nl80211_finish_wdev_dump(rdev); + rtnl_unlock(); return res; } @@ -10158,17 +10149,13 @@ static int nl80211_prepare_vendor_dump(struct sk_buff *skb, void *data = NULL; unsigned int data_len = 0; - rtnl_lock(); - if (cb->args[0]) { /* subtract the 1 again here */ struct wiphy *wiphy = wiphy_idx_to_wiphy(cb->args[0] - 1); struct wireless_dev *tmp; - if (!wiphy) { - err = -ENODEV; - goto out_unlock; - } + if (!wiphy) + return -ENODEV; *rdev = wiphy_to_rdev(wiphy); *wdev = NULL; @@ -10189,13 +10176,11 @@ static int nl80211_prepare_vendor_dump(struct sk_buff *skb, nl80211_fam.attrbuf, nl80211_fam.maxattr, nl80211_policy); if (err) - goto out_unlock; + return err; if (!nl80211_fam.attrbuf[NL80211_ATTR_VENDOR_ID] || - !nl80211_fam.attrbuf[NL80211_ATTR_VENDOR_SUBCMD]) { - err = -EINVAL; - goto out_unlock; - } + !nl80211_fam.attrbuf[NL80211_ATTR_VENDOR_SUBCMD]) + return -EINVAL; *wdev = __cfg80211_wdev_from_attrs(sock_net(skb->sk), nl80211_fam.attrbuf); @@ -10204,10 +10189,8 @@ static int nl80211_prepare_vendor_dump(struct sk_buff *skb, *rdev = __cfg80211_rdev_from_attrs(sock_net(skb->sk), nl80211_fam.attrbuf); - if (IS_ERR(*rdev)) { - err = PTR_ERR(*rdev); - goto out_unlock; - } + if (IS_ERR(*rdev)) + return PTR_ERR(*rdev); vid = nla_get_u32(nl80211_fam.attrbuf[NL80211_ATTR_VENDOR_ID]); subcmd = nla_get_u32(nl80211_fam.attrbuf[NL80211_ATTR_VENDOR_SUBCMD]); @@ -10220,19 +10203,15 @@ static int nl80211_prepare_vendor_dump(struct sk_buff *skb, if (vcmd->info.vendor_id != vid || vcmd->info.subcmd != subcmd) continue; - if (!vcmd->dumpit) { - err = -EOPNOTSUPP; - goto out_unlock; - } + if (!vcmd->dumpit) + return -EOPNOTSUPP; vcmd_idx = i; break; } - if (vcmd_idx < 0) { - err = -EOPNOTSUPP; - goto out_unlock; - } + if (vcmd_idx < 0) + return -EOPNOTSUPP; if (nl80211_fam.attrbuf[NL80211_ATTR_VENDOR_DATA]) { data = nla_data(nl80211_fam.attrbuf[NL80211_ATTR_VENDOR_DATA]); @@ -10249,9 +10228,6 @@ static int nl80211_prepare_vendor_dump(struct sk_buff *skb, /* keep rtnl locked in successful case */ return 0; - out_unlock: - rtnl_unlock(); - return err; } static int nl80211_vendor_cmd_dump(struct sk_buff *skb, @@ -10266,9 +10242,10 @@ static int nl80211_vendor_cmd_dump(struct sk_buff *skb, int err; struct nlattr *vendor_data; + rtnl_lock(); err = nl80211_prepare_vendor_dump(skb, cb, &rdev, &wdev); if (err) - return err; + goto out; vcmd_idx = cb->args[2]; data = (void *)cb->args[3]; @@ -10277,18 +10254,26 @@ static int nl80211_vendor_cmd_dump(struct sk_buff *skb, if (vcmd->flags & (WIPHY_VENDOR_CMD_NEED_WDEV | WIPHY_VENDOR_CMD_NEED_NETDEV)) { - if (!wdev) - return -EINVAL; + if (!wdev) { + err = -EINVAL; + goto out; + } if (vcmd->flags & WIPHY_VENDOR_CMD_NEED_NETDEV && - !wdev->netdev) - return -EINVAL; + !wdev->netdev) { + err = -EINVAL; + goto out; + } if (vcmd->flags & WIPHY_VENDOR_CMD_NEED_RUNNING) { if (wdev->netdev && - !netif_running(wdev->netdev)) - return -ENETDOWN; - if (!wdev->netdev && !wdev->p2p_started) - return -ENETDOWN; + !netif_running(wdev->netdev)) { + err = -ENETDOWN; + goto out; + } + if (!wdev->netdev && !wdev->p2p_started) { + err = -ENETDOWN; + goto out; + } } } From f154de03f4167664808b002495a877dbe91dd798 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Tue, 14 Mar 2017 17:55:45 +0100 Subject: [PATCH 138/521] USB: usbtmc: add missing endpoint sanity check commit 687e0687f71ec00e0132a21fef802dee88c2f1ad upstream. USBTMC devices are required to have a bulk-in and a bulk-out endpoint, but the driver failed to verify this, something which could lead to the endpoint addresses being taken from uninitialised memory. Make sure to zero all private data as part of allocation, and add the missing endpoint sanity check. Note that this also addresses a more recently introduced issue, where the interrupt-in-presence flag would also be uninitialised whenever the optional interrupt-in endpoint is not present. This in turn could lead to an interrupt urb being allocated, initialised and submitted based on uninitialised values. Fixes: dbf3e7f654c0 ("Implement an ioctl to support the USMTMC-USB488 READ_STATUS_BYTE operation.") Fixes: 5b775f672cc9 ("USB: add USB test and measurement class driver") Signed-off-by: Johan Hovold [ johan: backport to v4.4 ] Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/usbtmc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/usb/class/usbtmc.c b/drivers/usb/class/usbtmc.c index deaddb950c20..24337ac3323f 100644 --- a/drivers/usb/class/usbtmc.c +++ b/drivers/usb/class/usbtmc.c @@ -1105,7 +1105,7 @@ static int usbtmc_probe(struct usb_interface *intf, dev_dbg(&intf->dev, "%s called\n", __func__); - data = kmalloc(sizeof(*data), GFP_KERNEL); + data = kzalloc(sizeof(*data), GFP_KERNEL); if (!data) return -ENOMEM; @@ -1163,6 +1163,12 @@ static int usbtmc_probe(struct usb_interface *intf, } } + if (!data->bulk_out || !data->bulk_in) { + dev_err(&intf->dev, "bulk endpoints not found\n"); + retcode = -ENODEV; + goto err_put; + } + retcode = get_capabilities(data); if (retcode) dev_err(&intf->dev, "can't read capabilities\n"); @@ -1186,6 +1192,7 @@ static int usbtmc_probe(struct usb_interface *intf, error_register: sysfs_remove_group(&intf->dev.kobj, &capability_attr_grp); sysfs_remove_group(&intf->dev.kobj, &data_attr_grp); +err_put: kref_put(&data->kref, usbtmc_delete); return retcode; } From 6d43e485e0067b682466eb4e3aff8ff9a6822966 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 25 Jan 2017 20:24:57 -0800 Subject: [PATCH 139/521] xfs: clear _XBF_PAGES from buffers when readahead page commit 2aa6ba7b5ad3189cc27f14540aa2f57f0ed8df4b upstream. If we try to allocate memory pages to back an xfs_buf that we're trying to read, it's possible that we'll be so short on memory that the page allocation fails. For a blocking read we'll just wait, but for readahead we simply dump all the pages we've collected so far. Unfortunately, after dumping the pages we neglect to clear the _XBF_PAGES state, which means that the subsequent call to xfs_buf_free thinks that b_pages still points to pages we own. It then double-frees the b_pages pages. This results in screaming about negative page refcounts from the memory manager, which xfs oughtn't be triggering. To reproduce this case, mount a filesystem where the size of the inodes far outweighs the availalble memory (a ~500M inode filesystem on a VM with 300MB memory did the trick here) and run bulkstat in parallel with other memory eating processes to put a huge load on the system. The "check summary" phase of xfs_scrub also works for this purpose. Signed-off-by: Darrick J. Wong Reviewed-by: Eric Sandeen Cc: Ivan Kozik Signed-off-by: Greg Kroah-Hartman --- fs/xfs/xfs_buf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index eb1b8c8acfcb..8146b0cf20ce 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -375,6 +375,7 @@ xfs_buf_allocate_memory( out_free_pages: for (i = 0; i < bp->b_page_count; i++) __free_page(bp->b_pages[i]); + bp->b_flags &= ~_XBF_PAGES; return error; } From ec52364445a497a0045c61145f0d795b606c23bb Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:01 +0530 Subject: [PATCH 140/521] xen: do not re-use pirq number cached in pci device msi msg data From: Dan Streetman [ Upstream commit c74fd80f2f41d05f350bb478151021f88551afe8 ] Revert the main part of commit: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") That commit introduced reading the pci device's msi message data to see if a pirq was previously configured for the device's msi/msix, and re-use that pirq. At the time, that was the correct behavior. However, a later change to Qemu caused it to call into the Xen hypervisor to unmap all pirqs for a pci device, when the pci device disables its MSI/MSIX vectors; specifically the Qemu commit: c976437c7dba9c7444fb41df45468968aaa326ad ("qemu-xen: free all the pirqs for msi/msix when driver unload") Once Qemu added this pirq unmapping, it was no longer correct for the kernel to re-use the pirq number cached in the pci device msi message data. All Qemu releases since 2.1.0 contain the patch that unmaps the pirqs when the pci device disables its MSI/MSIX vectors. This bug is causing failures to initialize multiple NVMe controllers under Xen, because the NVMe driver sets up a single MSIX vector for each controller (concurrently), and then after using that to talk to the controller for some configuration data, it disables the single MSIX vector and re-configures all the MSIX vectors it needs. So the MSIX setup code tries to re-use the cached pirq from the first vector for each controller, but the hypervisor has already given away that pirq to another controller, and its initialization fails. This is discussed in more detail at: https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") Signed-off-by: Dan Streetman Reviewed-by: Stefano Stabellini Acked-by: Konrad Rzeszutek Wilk Signed-off-by: Boris Ostrovsky Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- arch/x86/pci/xen.c | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index c6d6efed392a..7575f0798194 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -231,23 +231,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) return 1; for_each_pci_msi_entry(msidesc, dev) { - __pci_read_msi_msg(msidesc, &msg); - pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) | - ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff); - if (msg.data != XEN_PIRQ_MSI_DATA || - xen_irq_from_pirq(pirq) < 0) { - pirq = xen_allocate_pirq_msi(dev, msidesc); - if (pirq < 0) { - irq = -ENODEV; - goto error; - } - xen_msi_compose_msg(dev, pirq, &msg); - __pci_write_msi_msg(msidesc, &msg); - dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); - } else { - dev_dbg(&dev->dev, - "xen: msi already bound to pirq=%d\n", pirq); + pirq = xen_allocate_pirq_msi(dev, msidesc); + if (pirq < 0) { + irq = -ENODEV; + goto error; } + xen_msi_compose_msg(dev, pirq, &msg); + __pci_write_msi_msg(msidesc, &msg); + dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq, (type == PCI_CAP_ID_MSI) ? nvec : 1, (type == PCI_CAP_ID_MSIX) ? From 4db313df49466185211ea7d6d675f8c4f6724e23 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:02 +0530 Subject: [PATCH 141/521] igb: Workaround for igb i210 firmware issue From: Chris J Arges [ Upstream commit 4e684f59d760a2c7c716bb60190783546e2d08a1 ] Sometimes firmware may not properly initialize I347AT4_PAGE_SELECT causing the probe of an igb i210 NIC to fail. This patch adds an addition zeroing of this register during igb_get_phy_id to workaround this issue. Thanks for Jochen Henneberg for the idea and original patch. Signed-off-by: Chris J Arges Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/igb/e1000_phy.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/e1000_phy.c b/drivers/net/ethernet/intel/igb/e1000_phy.c index 23ec28f43f6d..13ad20b250bc 100644 --- a/drivers/net/ethernet/intel/igb/e1000_phy.c +++ b/drivers/net/ethernet/intel/igb/e1000_phy.c @@ -77,6 +77,10 @@ s32 igb_get_phy_id(struct e1000_hw *hw) s32 ret_val = 0; u16 phy_id; + /* ensure PHY page selection to fix misconfigured i210 */ + if (hw->mac.type == e1000_i210) + phy->ops.write_reg(hw, I347AT4_PAGE_SELECT, 0); + ret_val = phy->ops.read_reg(hw, PHY_ID1, &phy_id); if (ret_val) goto out; From ca7e3bdc9c7e01d8040422d9eae0b9f07c81419e Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:03 +0530 Subject: [PATCH 142/521] igb: add i211 to i210 PHY workaround From: Todd Fujinaka [ Upstream commit 5bc8c230e2a993b49244f9457499f17283da9ec7 ] i210 and i211 share the same PHY but have different PCI IDs. Don't forget i211 for any i210 workarounds. Signed-off-by: Todd Fujinaka Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/igb/e1000_phy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/igb/e1000_phy.c b/drivers/net/ethernet/intel/igb/e1000_phy.c index 13ad20b250bc..afaa98d1d4e4 100644 --- a/drivers/net/ethernet/intel/igb/e1000_phy.c +++ b/drivers/net/ethernet/intel/igb/e1000_phy.c @@ -78,7 +78,7 @@ s32 igb_get_phy_id(struct e1000_hw *hw) u16 phy_id; /* ensure PHY page selection to fix misconfigured i210 */ - if (hw->mac.type == e1000_i210) + if ((hw->mac.type == e1000_i210) || (hw->mac.type == e1000_i211)) phy->ops.write_reg(hw, I347AT4_PAGE_SELECT, 0); ret_val = phy->ops.read_reg(hw, PHY_ID1, &phy_id); From e4ce31c0265dc6086fb4f13d88deef50d20cdb24 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:04 +0530 Subject: [PATCH 143/521] x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic From: Vitaly Kuznetsov [ Upstream commit 59107e2f48831daedc46973ce4988605ab066de3 ] There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt') which injects NMI to the guest. We may want to crash the guest and do kdump on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to allow the kdump kernel to re-establish VMBus connection so it will see VMBus devices (storage, network,..). To properly unload VMBus making it possible to start over during kdump we need to do the following: - Send an 'unload' message to the hypervisor. This can be done on any CPU so we do this the crashing CPU. - Receive the 'unload finished' reply message. WS2012R2 delivers this message to the CPU which was used to establish VMBus connection during module load and this CPU may differ from the CPU sending 'unload'. Receiving a VMBus message means the following: - There is a per-CPU slot in memory for one message. This slot can in theory be accessed by any CPU. - We get an interrupt on the CPU when a message was placed into the slot. - When we read the message we need to clear the slot and signal the fact to the hypervisor. In case there are more messages to this CPU pending the hypervisor will deliver the next message. The signaling is done by writing to an MSR so this can only be done on the appropriate CPU. To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload() function which checks message slots for all CPUs in a loop waiting for the 'unload finished' messages. However, there is an issue which arises when these conditions are met: - We're crashing on a CPU which is different from the one which was used to initially contact the hypervisor. - The CPU which was used for the initial contact is blocked with interrupts disabled and there is a message pending in the message slot. In this case we won't be able to read the 'unload finished' message on the crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs simultaneously: the first CPU entering panic() will proceed to crash and all other CPUs will stop themselves with interrupts disabled. The suggested solution is to handle unknown NMIs for Hyper-V guests on the first CPU which gets them only. This will allow us to rely on VMBus interrupt handler being able to receive the 'unload finish' message in case it is delivered to a different CPU. The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the boot CPU only, WS2012R2 and earlier Hyper-V versions are affected. Signed-off-by: Vitaly Kuznetsov Acked-by: K. Y. Srinivasan Cc: devel@linuxdriverproject.org Cc: Haiyang Zhang Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com Signed-off-by: Thomas Gleixner Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mshyperv.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index cfc4a966e2b9..83b5f7a323a9 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -30,6 +30,7 @@ #include #include #include +#include struct ms_hyperv_info ms_hyperv; EXPORT_SYMBOL_GPL(ms_hyperv); @@ -157,6 +158,26 @@ static unsigned char hv_get_nmi_reason(void) return 0; } +#ifdef CONFIG_X86_LOCAL_APIC +/* + * Prior to WS2016 Debug-VM sends NMIs to all CPUs which makes + * it dificult to process CHANNELMSG_UNLOAD in case of crash. Handle + * unknown NMI on the first CPU which gets it. + */ +static int hv_nmi_unknown(unsigned int val, struct pt_regs *regs) +{ + static atomic_t nmi_cpu = ATOMIC_INIT(-1); + + if (!unknown_nmi_panic) + return NMI_DONE; + + if (atomic_cmpxchg(&nmi_cpu, -1, raw_smp_processor_id()) != -1) + return NMI_HANDLED; + + return NMI_DONE; +} +#endif + static void __init ms_hyperv_init_platform(void) { /* @@ -182,6 +203,9 @@ static void __init ms_hyperv_init_platform(void) printk(KERN_INFO "HyperV: LAPIC Timer Frequency: %#x\n", lapic_timer_frequency); } + + register_nmi_handler(NMI_UNKNOWN, hv_nmi_unknown, NMI_FLAG_FIRST, + "hv_nmi_unknown"); #endif if (ms_hyperv.features & HV_X64_MSR_TIME_REF_COUNT_AVAILABLE) From a87693ec42f24334ece33fac6ea639956f50bd90 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:05 +0530 Subject: [PATCH 144/521] PCI: Separate VF BAR updates from standard BAR updates From: Bjorn Helgaas [ Upstream commit 6ffa2489c51da77564a0881a73765ea2169f955d ] Previously pci_update_resource() used the same code path for updating standard BARs and VF BARs in SR-IOV capabilities. Split the VF BAR update into a new pci_iov_update_resource() internal interface, which makes it simpler to compute the BAR address (we can get rid of pci_resource_bar() and pci_iov_resource_bar()). This patch: - Renames pci_update_resource() to pci_std_update_resource(), - Adds pci_iov_update_resource(), - Makes pci_update_resource() a wrapper that calls the appropriate one, No functional change intended. Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/pci/iov.c | 50 +++++++++++++++++++++++++++++++++++++++++ drivers/pci/pci.h | 1 + drivers/pci/setup-res.c | 13 +++++++++-- 3 files changed, 62 insertions(+), 2 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 31f31d460fc9..a6b100168396 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -572,6 +572,56 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) 4 * (resno - PCI_IOV_RESOURCES); } +/** + * pci_iov_update_resource - update a VF BAR + * @dev: the PCI device + * @resno: the resource number + * + * Update a VF BAR in the SR-IOV capability of a PF. + */ +void pci_iov_update_resource(struct pci_dev *dev, int resno) +{ + struct pci_sriov *iov = dev->is_physfn ? dev->sriov : NULL; + struct resource *res = dev->resource + resno; + int vf_bar = resno - PCI_IOV_RESOURCES; + struct pci_bus_region region; + u32 new; + int reg; + + /* + * The generic pci_restore_bars() path calls this for all devices, + * including VFs and non-SR-IOV devices. If this is not a PF, we + * have nothing to do. + */ + if (!iov) + return; + + /* + * Ignore unimplemented BARs, unused resource slots for 64-bit + * BARs, and non-movable resources, e.g., those described via + * Enhanced Allocation. + */ + if (!res->flags) + return; + + if (res->flags & IORESOURCE_UNSET) + return; + + if (res->flags & IORESOURCE_PCI_FIXED) + return; + + pcibios_resource_to_bus(dev->bus, ®ion, res); + new = region.start; + new |= res->flags & ~PCI_BASE_ADDRESS_MEM_MASK; + + reg = iov->pos + PCI_SRIOV_BAR + 4 * vf_bar; + pci_write_config_dword(dev, reg, new); + if (res->flags & IORESOURCE_MEM_64) { + new = region.start >> 16 >> 16; + pci_write_config_dword(dev, reg + 4, new); + } +} + resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev, int resno) { diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index d390fc1475ec..eda77d1baec1 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -277,6 +277,7 @@ static inline void pci_restore_ats_state(struct pci_dev *dev) int pci_iov_init(struct pci_dev *dev); void pci_iov_release(struct pci_dev *dev); int pci_iov_resource_bar(struct pci_dev *dev, int resno); +void pci_iov_update_resource(struct pci_dev *dev, int resno); resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno); void pci_restore_iov_state(struct pci_dev *dev); int pci_iov_bus_range(struct pci_bus *bus); diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index 604011e047d6..ac58c566fb1a 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -25,8 +25,7 @@ #include #include "pci.h" - -void pci_update_resource(struct pci_dev *dev, int resno) +static void pci_std_update_resource(struct pci_dev *dev, int resno) { struct pci_bus_region region; bool disable; @@ -110,6 +109,16 @@ void pci_update_resource(struct pci_dev *dev, int resno) pci_write_config_word(dev, PCI_COMMAND, cmd); } +void pci_update_resource(struct pci_dev *dev, int resno) +{ + if (resno <= PCI_ROM_RESOURCE) + pci_std_update_resource(dev, resno); +#ifdef CONFIG_PCI_IOV + else if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END) + pci_iov_update_resource(dev, resno); +#endif +} + int pci_claim_resource(struct pci_dev *dev, int resource) { struct resource *res = &dev->resource[resource]; From cef498a2c75adca3b4e3fc348e47498496eec809 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:06 +0530 Subject: [PATCH 145/521] PCI: Remove pci_resource_bar() and pci_iov_resource_bar() From: Bjorn Helgaas [ Upstream commit 286c2378aaccc7343ebf17ec6cd86567659caf70 ] pci_std_update_resource() only deals with standard BARs, so we don't have to worry about the complications of VF BARs in an SR-IOV capability. Compute the BAR address inline and remove pci_resource_bar(). That makes pci_iov_resource_bar() unused, so remove that as well. Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/pci/iov.c | 18 ------------------ drivers/pci/pci.c | 30 ------------------------------ drivers/pci/pci.h | 6 ------ drivers/pci/setup-res.c | 13 +++++++------ 4 files changed, 7 insertions(+), 60 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index a6b100168396..2f8ea6f84c97 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -554,24 +554,6 @@ void pci_iov_release(struct pci_dev *dev) sriov_release(dev); } -/** - * pci_iov_resource_bar - get position of the SR-IOV BAR - * @dev: the PCI device - * @resno: the resource number - * - * Returns position of the BAR encapsulated in the SR-IOV capability. - */ -int pci_iov_resource_bar(struct pci_dev *dev, int resno) -{ - if (resno < PCI_IOV_RESOURCES || resno > PCI_IOV_RESOURCE_END) - return 0; - - BUG_ON(!dev->is_physfn); - - return dev->sriov->pos + PCI_SRIOV_BAR + - 4 * (resno - PCI_IOV_RESOURCES); -} - /** * pci_iov_update_resource - update a VF BAR * @dev: the PCI device diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e311a9bf2c90..a01e6d5fedec 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4472,36 +4472,6 @@ int pci_select_bars(struct pci_dev *dev, unsigned long flags) } EXPORT_SYMBOL(pci_select_bars); -/** - * pci_resource_bar - get position of the BAR associated with a resource - * @dev: the PCI device - * @resno: the resource number - * @type: the BAR type to be filled in - * - * Returns BAR position in config space, or 0 if the BAR is invalid. - */ -int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type) -{ - int reg; - - if (resno < PCI_ROM_RESOURCE) { - *type = pci_bar_unknown; - return PCI_BASE_ADDRESS_0 + 4 * resno; - } else if (resno == PCI_ROM_RESOURCE) { - *type = pci_bar_mem32; - return dev->rom_base_reg; - } else if (resno < PCI_BRIDGE_RESOURCES) { - /* device specific resource */ - *type = pci_bar_unknown; - reg = pci_iov_resource_bar(dev, resno); - if (reg) - return reg; - } - - dev_err(&dev->dev, "BAR %d: invalid resource\n", resno); - return 0; -} - /* Some architectures require additional programming to enable VGA */ static arch_set_vga_state_t arch_set_vga_state; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index eda77d1baec1..c43e448873ca 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -232,7 +232,6 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl, int pci_setup_device(struct pci_dev *dev); int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, struct resource *res, unsigned int reg); -int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type); void pci_configure_ari(struct pci_dev *dev); void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head); @@ -276,7 +275,6 @@ static inline void pci_restore_ats_state(struct pci_dev *dev) #ifdef CONFIG_PCI_IOV int pci_iov_init(struct pci_dev *dev); void pci_iov_release(struct pci_dev *dev); -int pci_iov_resource_bar(struct pci_dev *dev, int resno); void pci_iov_update_resource(struct pci_dev *dev, int resno); resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno); void pci_restore_iov_state(struct pci_dev *dev); @@ -291,10 +289,6 @@ static inline void pci_iov_release(struct pci_dev *dev) { } -static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno) -{ - return 0; -} static inline void pci_restore_iov_state(struct pci_dev *dev) { } diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index ac58c566fb1a..674e76c9334e 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -32,7 +32,6 @@ static void pci_std_update_resource(struct pci_dev *dev, int resno) u16 cmd; u32 new, check, mask; int reg; - enum pci_bar_type type; struct resource *res = dev->resource + resno; if (dev->is_virtfn) { @@ -66,14 +65,16 @@ static void pci_std_update_resource(struct pci_dev *dev, int resno) else mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; - reg = pci_resource_bar(dev, resno, &type); - if (!reg) - return; - if (type != pci_bar_unknown) { + if (resno < PCI_ROM_RESOURCE) { + reg = PCI_BASE_ADDRESS_0 + 4 * resno; + } else if (resno == PCI_ROM_RESOURCE) { if (!(res->flags & IORESOURCE_ROM_ENABLE)) return; + + reg = dev->rom_base_reg; new |= PCI_ROM_ADDRESS_ENABLE; - } + } else + return; /* * We can't update a 64-bit BAR atomically, so when possible, From 1278c9f87f1127b806a7733d3091df3ed2ab31c6 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:07 +0530 Subject: [PATCH 146/521] PCI: Add comments about ROM BAR updating From: Bjorn Helgaas [ Upstream commit 0b457dde3cf8b7c76a60f8e960f21bbd4abdc416 ] pci_update_resource() updates a hardware BAR so its address matches the kernel's struct resource UNLESS it's a disabled ROM BAR. We only update those when we enable the ROM. It's not obvious from the code why ROM BARs should be handled specially. Apparently there are Matrox devices with defective ROM BARs that read as zero when disabled. That means that if pci_enable_rom() reads the disabled BAR, sets PCI_ROM_ADDRESS_ENABLE (without re-inserting the address), and writes it back, it would enable the ROM at address zero. Add comments and references to explain why we can't make the code look more rational. The code changes are from 755528c860b0 ("Ignore disabled ROM resources at setup") and 8085ce084c0f ("[PATCH] Fix PCI ROM mapping"). Link: https://lkml.org/lkml/2005/8/30/138 Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman [sumits: minor fixup in rom.c for 4.4.y] Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/pci/rom.c | 5 +++++ drivers/pci/setup-res.c | 6 ++++++ 2 files changed, 11 insertions(+) diff --git a/drivers/pci/rom.c b/drivers/pci/rom.c index eb0ad530dc43..3eea7fc5e1a2 100644 --- a/drivers/pci/rom.c +++ b/drivers/pci/rom.c @@ -31,6 +31,11 @@ int pci_enable_rom(struct pci_dev *pdev) if (!res->flags) return -1; + /* + * Ideally pci_update_resource() would update the ROM BAR address, + * and we would only set the enable bit here. But apparently some + * devices have buggy ROM BARs that read as zero when disabled. + */ pcibios_resource_to_bus(pdev->bus, ®ion, res); pci_read_config_dword(pdev, pdev->rom_base_reg, &rom_addr); rom_addr &= ~PCI_ROM_ADDRESS_MASK; diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index 674e76c9334e..d1ba5e0067c5 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -68,6 +68,12 @@ static void pci_std_update_resource(struct pci_dev *dev, int resno) if (resno < PCI_ROM_RESOURCE) { reg = PCI_BASE_ADDRESS_0 + 4 * resno; } else if (resno == PCI_ROM_RESOURCE) { + + /* + * Apparently some Matrox devices have ROM BARs that read + * as zero when disabled, so don't update ROM BARs unless + * they're enabled. See https://lkml.org/lkml/2005/8/30/138. + */ if (!(res->flags & IORESOURCE_ROM_ENABLE)) return; From 40a85d68185f9d9e7d370919f8a3532b0d259266 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:08 +0530 Subject: [PATCH 147/521] PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE From: Bjorn Helgaas [ Upstream commit 7a6d312b50e63f598f5b5914c4fd21878ac2b595 ] Remove the assumption that IORESOURCE_ROM_ENABLE == PCI_ROM_ADDRESS_ENABLE. PCI_ROM_ADDRESS_ENABLE is the ROM enable bit defined by the PCI spec, so if we're reading or writing a BAR register value, that's what we should use. IORESOURCE_ROM_ENABLE is a corresponding bit in struct resource flags. Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/pci/probe.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 71d9a6d1bd56..b83df942794f 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -226,7 +226,8 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, mask64 = (u32)PCI_BASE_ADDRESS_MEM_MASK; } } else { - res->flags |= (l & IORESOURCE_ROM_ENABLE); + if (l & PCI_ROM_ADDRESS_ENABLE) + res->flags |= IORESOURCE_ROM_ENABLE; l64 = l & PCI_ROM_ADDRESS_MASK; sz64 = sz & PCI_ROM_ADDRESS_MASK; mask64 = (u32)PCI_ROM_ADDRESS_MASK; From 131f7969048b8ede0be57f64930e9ef8fee0c53b Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:09 +0530 Subject: [PATCH 148/521] PCI: Don't update VF BARs while VF memory space is enabled From: Bjorn Helgaas [ Upstream commit 546ba9f8f22f71b0202b6ba8967be5cc6dae4e21 ] If we update a VF BAR while it's enabled, there are two potential problems: 1) Any driver that's using the VF has a cached BAR value that is stale after the update, and 2) We can't update 64-bit BARs atomically, so the intermediate state (new lower dword with old upper dword) may conflict with another device, and an access by a driver unrelated to the VF may cause a bus error. Warn about attempts to update VF BARs while they are enabled. This is a programming error, so use dev_WARN() to get a backtrace. Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/pci/iov.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 2f8ea6f84c97..47c46d07f110 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -567,6 +567,7 @@ void pci_iov_update_resource(struct pci_dev *dev, int resno) struct resource *res = dev->resource + resno; int vf_bar = resno - PCI_IOV_RESOURCES; struct pci_bus_region region; + u16 cmd; u32 new; int reg; @@ -578,6 +579,13 @@ void pci_iov_update_resource(struct pci_dev *dev, int resno) if (!iov) return; + pci_read_config_word(dev, iov->pos + PCI_SRIOV_CTRL, &cmd); + if ((cmd & PCI_SRIOV_CTRL_VFE) && (cmd & PCI_SRIOV_CTRL_MSE)) { + dev_WARN(&dev->dev, "can't update enabled VF BAR%d %pR\n", + vf_bar, res); + return; + } + /* * Ignore unimplemented BARs, unused resource slots for 64-bit * BARs, and non-movable resources, e.g., those described via From d4f09ea7e35c02a765f58e900fbc159ff00c70e2 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:10 +0530 Subject: [PATCH 149/521] PCI: Update BARs using property bits appropriate for type From: Bjorn Helgaas [ Upstream commit 45d004f4afefdd8d79916ee6d97a9ecd94bb1ffe ] The BAR property bits (0-3 for memory BARs, 0-1 for I/O BARs) are supposed to be read-only, but we do save them in res->flags and include them when updating the BAR. Mask the I/O property bits with ~PCI_BASE_ADDRESS_IO_MASK (0x3) instead of PCI_REGION_FLAG_MASK (0xf) to make it obvious that we can't corrupt bits 2-3 of I/O addresses. Use PCI_ROM_ADDRESS_MASK for ROM BARs. This means we'll only check the top 21 bits (instead of the 28 bits we used to check) of a ROM BAR to see if the update was successful. Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/pci/setup-res.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index d1ba5e0067c5..032a6b1ea512 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -58,12 +58,17 @@ static void pci_std_update_resource(struct pci_dev *dev, int resno) return; pcibios_resource_to_bus(dev->bus, ®ion, res); + new = region.start; - new = region.start | (res->flags & PCI_REGION_FLAG_MASK); - if (res->flags & IORESOURCE_IO) + if (res->flags & IORESOURCE_IO) { mask = (u32)PCI_BASE_ADDRESS_IO_MASK; - else + new |= res->flags & ~PCI_BASE_ADDRESS_IO_MASK; + } else if (resno == PCI_ROM_RESOURCE) { + mask = (u32)PCI_ROM_ADDRESS_MASK; + } else { mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; + new |= res->flags & ~PCI_BASE_ADDRESS_MEM_MASK; + } if (resno < PCI_ROM_RESOURCE) { reg = PCI_BASE_ADDRESS_0 + 4 * resno; From bcbdcf48469b062b6ee00b560b44de28f387d2e0 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:11 +0530 Subject: [PATCH 150/521] PCI: Ignore BAR updates on virtual functions From: Bjorn Helgaas [ Upstream commit 63880b230a4af502c56dde3d4588634c70c66006 ] VF BARs are read-only zero, so updating VF BARs will not have any effect. See the SR-IOV spec r1.1, sec 3.4.1.11. We already ignore these updates because of 70675e0b6a1a ("PCI: Don't try to restore VF BARs"); this merely restructures it slightly to make it easier to split updates for standard and SR-IOV BARs. Signed-off-by: Bjorn Helgaas Reviewed-by: Gavin Shan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/pci/pci.c | 4 ---- drivers/pci/setup-res.c | 5 ++--- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index a01e6d5fedec..0e53488f8ec1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -519,10 +519,6 @@ static void pci_restore_bars(struct pci_dev *dev) { int i; - /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */ - if (dev->is_virtfn) - return; - for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) pci_update_resource(dev, i); } diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index 032a6b1ea512..25062966cbfa 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -34,10 +34,9 @@ static void pci_std_update_resource(struct pci_dev *dev, int resno) int reg; struct resource *res = dev->resource + resno; - if (dev->is_virtfn) { - dev_warn(&dev->dev, "can't update VF BAR%d\n", resno); + /* Per SR-IOV spec 3.4.1.11, VF BARs are RO zero */ + if (dev->is_virtfn) return; - } /* * Ignore resources for unimplemented BARs and unused resource slots From 4110080574acd69677c869ba49207150c09c9c0f Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:12 +0530 Subject: [PATCH 151/521] PCI: Do any VF BAR updates before enabling the BARs From: Gavin Shan [ Upstream commit f40ec3c748c6912f6266c56a7f7992de61b255ed ] Previously we enabled VFs and enable their memory space before calling pcibios_sriov_enable(). But pcibios_sriov_enable() may update the VF BARs: for example, on PPC PowerNV we may change them to manage the association of VFs to PEs. Because 64-bit BARs cannot be updated atomically, it's unsafe to update them while they're enabled. The half-updated state may conflict with other devices in the system. Call pcibios_sriov_enable() before enabling the VFs so any BAR updates happen while the VF BARs are disabled. [bhelgaas: changelog] Tested-by: Carol Soto Signed-off-by: Gavin Shan Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/pci/iov.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 47c46d07f110..357527712539 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -303,13 +303,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) return rc; } - pci_iov_set_numvfs(dev, nr_virtfn); - iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; - pci_cfg_access_lock(dev); - pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); - msleep(100); - pci_cfg_access_unlock(dev); - iov->initial_VFs = initial; if (nr_virtfn < initial) initial = nr_virtfn; @@ -320,6 +313,13 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) goto err_pcibios; } + pci_iov_set_numvfs(dev, nr_virtfn); + iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; + pci_cfg_access_lock(dev); + pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); + msleep(100); + pci_cfg_access_unlock(dev); + for (i = 0; i < initial; i++) { rc = virtfn_add(dev, i, 0); if (rc) From 9fd9e1436380419a9a74f7ad90d85e09b1ed8058 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:13 +0530 Subject: [PATCH 152/521] vfio/spapr: Postpone allocation of userspace version of TCE table From: Alexey Kardashevskiy [ Upstream commit 39701e56f5f16ea0cf8fc9e8472e645f8de91d23 ] The iommu_table struct manages a hardware TCE table and a vmalloc'd table with corresponding userspace addresses. Both are allocated when the default DMA window is created and this happens when the very first group is attached to a container. As we are going to allow the userspace to configure container in one memory context and pas container fd to another, we have to postpones such allocations till a container fd is passed to the destination user process so we would account locked memory limit against the actual container user constrainsts. This postpones the it_userspace array allocation till it is used first time for mapping. The unmapping patch already checks if the array is allocated. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson Acked-by: Alex Williamson Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/vfio/vfio_iommu_spapr_tce.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 0582b72ef377..1a9f18b40be6 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -511,6 +511,12 @@ static long tce_iommu_build_v2(struct tce_container *container, unsigned long hpa; enum dma_data_direction dirtmp; + if (!tbl->it_userspace) { + ret = tce_iommu_userspace_view_alloc(tbl); + if (ret) + return ret; + } + for (i = 0; i < pages; ++i) { struct mm_iommu_table_group_mem_t *mem = NULL; unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, @@ -584,15 +590,6 @@ static long tce_iommu_create_table(struct tce_container *container, WARN_ON(!ret && !(*ptbl)->it_ops->free); WARN_ON(!ret && ((*ptbl)->it_allocated_size != table_size)); - if (!ret && container->v2) { - ret = tce_iommu_userspace_view_alloc(*ptbl); - if (ret) - (*ptbl)->it_ops->free(*ptbl); - } - - if (ret) - decrement_locked_vm(table_size >> PAGE_SHIFT); - return ret; } @@ -1064,10 +1061,7 @@ static int tce_iommu_take_ownership(struct tce_container *container, if (!tbl || !tbl->it_map) continue; - rc = tce_iommu_userspace_view_alloc(tbl); - if (!rc) - rc = iommu_take_ownership(tbl); - + rc = iommu_take_ownership(tbl); if (rc) { for (j = 0; j < i; ++j) iommu_release_ownership( From 7023f502c8355717ad4e400144b0833dee105602 Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:14 +0530 Subject: [PATCH 153/521] block: allow WRITE_SAME commands with the SG_IO ioctl From: Mauricio Faria de Oliveira [ Upstream commit 25cdb64510644f3e854d502d69c73f21c6df88a9 ] The WRITE_SAME commands are not present in the blk_default_cmd_filter write_ok list, and thus are failed with -EPERM when the SG_IO ioctl() is executed without CAP_SYS_RAWIO capability (e.g., unprivileged users). [ sg_io() -> blk_fill_sghdr_rq() > blk_verify_command() -> -EPERM ] The problem can be reproduced with the sg_write_same command # sg_write_same --num 1 --xferlen 512 /dev/sda # # capsh --drop=cap_sys_rawio -- -c \ 'sg_write_same --num 1 --xferlen 512 /dev/sda' Write same: pass through os error: Operation not permitted # For comparison, the WRITE_VERIFY command does not observe this problem, since it is in that list: # capsh --drop=cap_sys_rawio -- -c \ 'sg_write_verify --num 1 --ilen 512 --lba 0 /dev/sda' # So, this patch adds the WRITE_SAME commands to the list, in order for the SG_IO ioctl to finish successfully: # capsh --drop=cap_sys_rawio -- -c \ 'sg_write_same --num 1 --xferlen 512 /dev/sda' # That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices (qemu "-device scsi-block" [1], libvirt "" [2]), which employs the SG_IO ioctl() and runs as an unprivileged user (libvirt-qemu). In that scenario, when a filesystem (e.g., ext4) performs its zero-out calls, which are translated to write-same calls in the guest kernel, and then into SG_IO ioctls to the host kernel, SCSI I/O errors may be observed in the guest: [...] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [...] sd 0:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current] [...] sd 0:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated [...] sd 0:0:0:0: [sda] tag#0 CDB: Write Same(10) 41 00 01 04 e0 78 00 00 08 00 [...] blk_update_request: I/O error, dev sda, sector 17096824 Links: [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52 [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device') Signed-off-by: Mauricio Faria de Oliveira Signed-off-by: Brahadambal Srinivasan Reported-by: Manjunatha H R Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- block/scsi_ioctl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index 0774799942e0..c6fee7437be4 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -182,6 +182,9 @@ static void blk_set_cmd_filter_defaults(struct blk_cmd_filter *filter) __set_bit(WRITE_16, filter->write_ok); __set_bit(WRITE_LONG, filter->write_ok); __set_bit(WRITE_LONG_2, filter->write_ok); + __set_bit(WRITE_SAME, filter->write_ok); + __set_bit(WRITE_SAME_16, filter->write_ok); + __set_bit(WRITE_SAME_32, filter->write_ok); __set_bit(ERASE, filter->write_ok); __set_bit(GPCMD_MODE_SELECT_10, filter->write_ok); __set_bit(MODE_SELECT, filter->write_ok); From ce5494107946450f79ffce4538c243c37b08d85f Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:15 +0530 Subject: [PATCH 154/521] s390/zcrypt: Introduce CEX6 toleration From: Harald Freudenberger [ Upstream commit b3e8652bcbfa04807e44708d4d0c8cdad39c9215 ] Signed-off-by: Harald Freudenberger Signed-off-by: Martin Schwidefsky Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/s390/crypto/ap_bus.c | 3 +++ drivers/s390/crypto/ap_bus.h | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c index 24ec282e15d8..7c3b8d3516e3 100644 --- a/drivers/s390/crypto/ap_bus.c +++ b/drivers/s390/crypto/ap_bus.c @@ -1651,6 +1651,9 @@ static void ap_scan_bus(struct work_struct *unused) ap_dev->queue_depth = queue_depth; ap_dev->raw_hwtype = device_type; ap_dev->device_type = device_type; + /* CEX6 toleration: map to CEX5 */ + if (device_type == AP_DEVICE_TYPE_CEX6) + ap_dev->device_type = AP_DEVICE_TYPE_CEX5; ap_dev->functions = device_functions; spin_lock_init(&ap_dev->lock); INIT_LIST_HEAD(&ap_dev->pendingq); diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h index 6adcbdf225d1..cc741e948170 100644 --- a/drivers/s390/crypto/ap_bus.h +++ b/drivers/s390/crypto/ap_bus.h @@ -105,6 +105,7 @@ static inline int ap_test_bit(unsigned int *ptr, unsigned int nr) #define AP_DEVICE_TYPE_CEX3C 9 #define AP_DEVICE_TYPE_CEX4 10 #define AP_DEVICE_TYPE_CEX5 11 +#define AP_DEVICE_TYPE_CEX6 12 /* * Known function facilities From 4e2c66bb6658f6f4583c8920adeecb7bcc90bd9f Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:16 +0530 Subject: [PATCH 155/521] uvcvideo: uvc_scan_fallback() for webcams with broken chain From: Henrik Ingo [ Upstream commit e950267ab802c8558f1100eafd4087fd039ad634 ] Some devices have invalid baSourceID references, causing uvc_scan_chain() to fail, but if we just take the entities we can find and put them together in the most sensible chain we can think of, turns out they do work anyway. Note: This heuristic assumes there is a single chain. At the time of writing, devices known to have such a broken chain are - Acer Integrated Camera (5986:055a) - Realtek rtl157a7 (0bda:57a7) Signed-off-by: Henrik Ingo Signed-off-by: Laurent Pinchart Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/media/usb/uvc/uvc_driver.c | 118 +++++++++++++++++++++++++++-- 1 file changed, 112 insertions(+), 6 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index 5cefca95734e..885f689ac870 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -1595,6 +1595,114 @@ static const char *uvc_print_chain(struct uvc_video_chain *chain) return buffer; } +static struct uvc_video_chain *uvc_alloc_chain(struct uvc_device *dev) +{ + struct uvc_video_chain *chain; + + chain = kzalloc(sizeof(*chain), GFP_KERNEL); + if (chain == NULL) + return NULL; + + INIT_LIST_HEAD(&chain->entities); + mutex_init(&chain->ctrl_mutex); + chain->dev = dev; + v4l2_prio_init(&chain->prio); + + return chain; +} + +/* + * Fallback heuristic for devices that don't connect units and terminals in a + * valid chain. + * + * Some devices have invalid baSourceID references, causing uvc_scan_chain() + * to fail, but if we just take the entities we can find and put them together + * in the most sensible chain we can think of, turns out they do work anyway. + * Note: This heuristic assumes there is a single chain. + * + * At the time of writing, devices known to have such a broken chain are + * - Acer Integrated Camera (5986:055a) + * - Realtek rtl157a7 (0bda:57a7) + */ +static int uvc_scan_fallback(struct uvc_device *dev) +{ + struct uvc_video_chain *chain; + struct uvc_entity *iterm = NULL; + struct uvc_entity *oterm = NULL; + struct uvc_entity *entity; + struct uvc_entity *prev; + + /* + * Start by locating the input and output terminals. We only support + * devices with exactly one of each for now. + */ + list_for_each_entry(entity, &dev->entities, list) { + if (UVC_ENTITY_IS_ITERM(entity)) { + if (iterm) + return -EINVAL; + iterm = entity; + } + + if (UVC_ENTITY_IS_OTERM(entity)) { + if (oterm) + return -EINVAL; + oterm = entity; + } + } + + if (iterm == NULL || oterm == NULL) + return -EINVAL; + + /* Allocate the chain and fill it. */ + chain = uvc_alloc_chain(dev); + if (chain == NULL) + return -ENOMEM; + + if (uvc_scan_chain_entity(chain, oterm) < 0) + goto error; + + prev = oterm; + + /* + * Add all Processing and Extension Units with two pads. The order + * doesn't matter much, use reverse list traversal to connect units in + * UVC descriptor order as we build the chain from output to input. This + * leads to units appearing in the order meant by the manufacturer for + * the cameras known to require this heuristic. + */ + list_for_each_entry_reverse(entity, &dev->entities, list) { + if (entity->type != UVC_VC_PROCESSING_UNIT && + entity->type != UVC_VC_EXTENSION_UNIT) + continue; + + if (entity->num_pads != 2) + continue; + + if (uvc_scan_chain_entity(chain, entity) < 0) + goto error; + + prev->baSourceID[0] = entity->id; + prev = entity; + } + + if (uvc_scan_chain_entity(chain, iterm) < 0) + goto error; + + prev->baSourceID[0] = iterm->id; + + list_add_tail(&chain->list, &dev->chains); + + uvc_trace(UVC_TRACE_PROBE, + "Found a video chain by fallback heuristic (%s).\n", + uvc_print_chain(chain)); + + return 0; + +error: + kfree(chain); + return -EINVAL; +} + /* * Scan the device for video chains and register video devices. * @@ -1617,15 +1725,10 @@ static int uvc_scan_device(struct uvc_device *dev) if (term->chain.next || term->chain.prev) continue; - chain = kzalloc(sizeof(*chain), GFP_KERNEL); + chain = uvc_alloc_chain(dev); if (chain == NULL) return -ENOMEM; - INIT_LIST_HEAD(&chain->entities); - mutex_init(&chain->ctrl_mutex); - chain->dev = dev; - v4l2_prio_init(&chain->prio); - term->flags |= UVC_ENTITY_FLAG_DEFAULT; if (uvc_scan_chain(chain, term) < 0) { @@ -1639,6 +1742,9 @@ static int uvc_scan_device(struct uvc_device *dev) list_add_tail(&chain->list, &dev->chains); } + if (list_empty(&dev->chains)) + uvc_scan_fallback(dev); + if (list_empty(&dev->chains)) { uvc_printk(KERN_INFO, "No valid video chain found.\n"); return -1; From d3607fc2976e34f6b067508b608fefaa66fbecee Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:17 +0530 Subject: [PATCH 156/521] ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520 From: Alex Hung [ Upstream commit 9523b9bf6dceef6b0215e90b2348cd646597f796 ] Precision 5520 and 3520 either hang at login and during suspend or reboot. It turns out that that adding them to acpi_rev_dmi_table[] helps to work around those issues. Signed-off-by: Alex Hung [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/blacklist.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c index 96809cd99ace..b2e9395e095c 100644 --- a/drivers/acpi/blacklist.c +++ b/drivers/acpi/blacklist.c @@ -346,6 +346,22 @@ static struct dmi_system_id acpi_osi_dmi_table[] __initdata = { DMI_MATCH(DMI_PRODUCT_NAME, "XPS 13 9343"), }, }, + { + .callback = dmi_enable_rev_override, + .ident = "DELL Precision 5520", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Precision 5520"), + }, + }, + { + .callback = dmi_enable_rev_override, + .ident = "DELL Precision 3520", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Precision 3520"), + }, + }, #endif {} }; From b8687d83b34cf372b943c5639d8960703aeb2b8e Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:18 +0530 Subject: [PATCH 157/521] ACPI / blacklist: Make Dell Latitude 3350 ethernet work From: Michael Pobega [ Upstream commit 708f5dcc21ae9b35f395865fc154b0105baf4de4 ] The Dell Latitude 3350's ethernet card attempts to use a reserved IRQ (18), resulting in ACPI being unable to enable the ethernet. Adding it to acpi_rev_dmi_table[] helps to work around this problem. Signed-off-by: Michael Pobega [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/blacklist.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c index b2e9395e095c..2f24b578bcaf 100644 --- a/drivers/acpi/blacklist.c +++ b/drivers/acpi/blacklist.c @@ -362,6 +362,18 @@ static struct dmi_system_id acpi_osi_dmi_table[] __initdata = { DMI_MATCH(DMI_PRODUCT_NAME, "Precision 3520"), }, }, + /* + * Resolves a quirk with the Dell Latitude 3350 that + * causes the ethernet adapter to not function. + */ + { + .callback = dmi_enable_rev_override, + .ident = "DELL Latitude 3350", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Latitude 3350"), + }, + }, #endif {} }; From ac601978a2aad7fbb617f0187268011b577a127f Mon Sep 17 00:00:00 2001 From: Sumit Semwal Date: Sat, 25 Mar 2017 21:48:19 +0530 Subject: [PATCH 158/521] serial: 8250_pci: Detach low-level driver during PCI error recovery From: Gabriel Krisman Bertazi [ Upstream commit f209fa03fc9d131b3108c2e4936181eabab87416 ] During a PCI error recovery, like the ones provoked by EEH in the ppc64 platform, all IO to the device must be blocked while the recovery is completed. Current 8250_pci implementation only suspends the port instead of detaching it, which doesn't prevent incoming accesses like TIOCMGET and TIOCMSET calls from reaching the device. Those end up racing with the EEH recovery, crashing it. Similar races were also observed when opening the device and when shutting it down during recovery. This patch implements a more robust IO blockage for the 8250_pci recovery by unregistering the port at the beginning of the procedure and re-adding it afterwards. Since the port is detached from the uart layer, we can be sure that no request will make through to the device during recovery. This is similar to the solution used by the JSM serial driver. I thank Peter Hurley for valuable input on this one over one year ago. Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_pci.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c index 5b24ffd93649..83ff1724ec79 100644 --- a/drivers/tty/serial/8250/8250_pci.c +++ b/drivers/tty/serial/8250/8250_pci.c @@ -57,6 +57,7 @@ struct serial_private { unsigned int nr; void __iomem *remapped_bar[PCI_NUM_BAR_RESOURCES]; struct pci_serial_quirk *quirk; + const struct pciserial_board *board; int line[0]; }; @@ -4058,6 +4059,7 @@ pciserial_init_ports(struct pci_dev *dev, const struct pciserial_board *board) } } priv->nr = i; + priv->board = board; return priv; err_deinit: @@ -4068,7 +4070,7 @@ pciserial_init_ports(struct pci_dev *dev, const struct pciserial_board *board) } EXPORT_SYMBOL_GPL(pciserial_init_ports); -void pciserial_remove_ports(struct serial_private *priv) +void pciserial_detach_ports(struct serial_private *priv) { struct pci_serial_quirk *quirk; int i; @@ -4088,7 +4090,11 @@ void pciserial_remove_ports(struct serial_private *priv) quirk = find_quirk(priv->dev); if (quirk->exit) quirk->exit(priv->dev); +} +void pciserial_remove_ports(struct serial_private *priv) +{ + pciserial_detach_ports(priv); kfree(priv); } EXPORT_SYMBOL_GPL(pciserial_remove_ports); @@ -5819,7 +5825,7 @@ static pci_ers_result_t serial8250_io_error_detected(struct pci_dev *dev, return PCI_ERS_RESULT_DISCONNECT; if (priv) - pciserial_suspend_ports(priv); + pciserial_detach_ports(priv); pci_disable_device(dev); @@ -5844,9 +5850,18 @@ static pci_ers_result_t serial8250_io_slot_reset(struct pci_dev *dev) static void serial8250_io_resume(struct pci_dev *dev) { struct serial_private *priv = pci_get_drvdata(dev); + const struct pciserial_board *board; - if (priv) - pciserial_resume_ports(priv); + if (!priv) + return; + + board = priv->board; + kfree(priv); + priv = pciserial_init_ports(dev, board); + + if (!IS_ERR(priv)) { + pci_set_drvdata(dev, priv); + } } static const struct pci_error_handlers serial8250_err_handler = { From 540d6d756ff82a23eb5bb73aa8149bab15eb407a Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 11 Jan 2017 17:09:50 +0100 Subject: [PATCH 159/521] fbcon: Fix vc attr at deinit commit 8aac7f34369726d1a158788ae8aff3002d5eb528 upstream. fbcon can deal with vc_hi_font_mask (the upper 256 chars) and adjust the vc attrs dynamically when vc_hi_font_mask is changed at fbcon_init(). When the vc_hi_font_mask is set, it remaps the attrs in the existing console buffer with one bit shift up (for 9 bits), while it remaps with one bit shift down (for 8 bits) when the value is cleared. It works fine as long as the font gets updated after fbcon was initialized. However, we hit a bizarre problem when the console is switched to another fb driver (typically from vesafb or efifb to drmfb). At switching to the new fb driver, we temporarily rebind the console to the dummy console, then rebind to the new driver. During the switching, we leave the modified attrs as is. Thus, the new fbcon takes over the old buffer as if it were to contain 8 bits chars (although the attrs are still shifted for 9 bits), and effectively this results in the yellow color texts instead of the original white color, as found in the bugzilla entry below. An easy fix for this is to re-adjust the attrs before leaving the fbcon at con_deinit callback. Since the code to adjust the attrs is already present in the current fbcon code, in this patch, we simply factor out the relevant code, and call it from fbcon_deinit(). Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000619 Signed-off-by: Takashi Iwai Signed-off-by: Bartlomiej Zolnierkiewicz Cc: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- drivers/video/console/fbcon.c | 67 +++++++++++++++++++++-------------- 1 file changed, 40 insertions(+), 27 deletions(-) diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c index 6e92917ba77a..4e3c78d88832 100644 --- a/drivers/video/console/fbcon.c +++ b/drivers/video/console/fbcon.c @@ -1168,6 +1168,8 @@ static void fbcon_free_font(struct display *p, bool freefont) p->userfont = 0; } +static void set_vc_hi_font(struct vc_data *vc, bool set); + static void fbcon_deinit(struct vc_data *vc) { struct display *p = &fb_display[vc->vc_num]; @@ -1203,6 +1205,9 @@ static void fbcon_deinit(struct vc_data *vc) if (free_font) vc->vc_font.data = NULL; + if (vc->vc_hi_font_mask) + set_vc_hi_font(vc, false); + if (!con_is_bound(&fb_con)) fbcon_exit(); @@ -2439,32 +2444,10 @@ static int fbcon_get_font(struct vc_data *vc, struct console_font *font) return 0; } -static int fbcon_do_set_font(struct vc_data *vc, int w, int h, - const u8 * data, int userfont) +/* set/clear vc_hi_font_mask and update vc attrs accordingly */ +static void set_vc_hi_font(struct vc_data *vc, bool set) { - struct fb_info *info = registered_fb[con2fb_map[vc->vc_num]]; - struct fbcon_ops *ops = info->fbcon_par; - struct display *p = &fb_display[vc->vc_num]; - int resize; - int cnt; - char *old_data = NULL; - - if (CON_IS_VISIBLE(vc) && softback_lines) - fbcon_set_origin(vc); - - resize = (w != vc->vc_font.width) || (h != vc->vc_font.height); - if (p->userfont) - old_data = vc->vc_font.data; - if (userfont) - cnt = FNTCHARCNT(data); - else - cnt = 256; - vc->vc_font.data = (void *)(p->fontdata = data); - if ((p->userfont = userfont)) - REFCOUNT(data)++; - vc->vc_font.width = w; - vc->vc_font.height = h; - if (vc->vc_hi_font_mask && cnt == 256) { + if (!set) { vc->vc_hi_font_mask = 0; if (vc->vc_can_do_color) { vc->vc_complement_mask >>= 1; @@ -2487,7 +2470,7 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, ((c & 0xfe00) >> 1) | (c & 0xff); vc->vc_attr >>= 1; } - } else if (!vc->vc_hi_font_mask && cnt == 512) { + } else { vc->vc_hi_font_mask = 0x100; if (vc->vc_can_do_color) { vc->vc_complement_mask <<= 1; @@ -2519,8 +2502,38 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, } else vc->vc_video_erase_char = c & ~0x100; } - } +} + +static int fbcon_do_set_font(struct vc_data *vc, int w, int h, + const u8 * data, int userfont) +{ + struct fb_info *info = registered_fb[con2fb_map[vc->vc_num]]; + struct fbcon_ops *ops = info->fbcon_par; + struct display *p = &fb_display[vc->vc_num]; + int resize; + int cnt; + char *old_data = NULL; + + if (CON_IS_VISIBLE(vc) && softback_lines) + fbcon_set_origin(vc); + + resize = (w != vc->vc_font.width) || (h != vc->vc_font.height); + if (p->userfont) + old_data = vc->vc_font.data; + if (userfont) + cnt = FNTCHARCNT(data); + else + cnt = 256; + vc->vc_font.data = (void *)(p->fontdata = data); + if ((p->userfont = userfont)) + REFCOUNT(data)++; + vc->vc_font.width = w; + vc->vc_font.height = h; + if (vc->vc_hi_font_mask && cnt == 256) + set_vc_hi_font(vc, false); + else if (!vc->vc_hi_font_mask && cnt == 512) + set_vc_hi_font(vc, true); if (resize) { int cols, rows; From f8a62dbc790239d9cb8bb8757f43a9d2e09f747c Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Thu, 15 Dec 2016 14:31:01 +0100 Subject: [PATCH 160/521] crypto: algif_hash - avoid zero-sized array commit 6207119444595d287b1e9e83a2066c17209698f3 upstream. With this reproducer: struct sockaddr_alg alg = { .salg_family = 0x26, .salg_type = "hash", .salg_feat = 0xf, .salg_mask = 0x5, .salg_name = "digest_null", }; int sock, sock2; sock = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(sock, (struct sockaddr *)&alg, sizeof(alg)); sock2 = accept(sock, NULL, NULL); setsockopt(sock, SOL_ALG, ALG_SET_KEY, "\x9b\xca", 2); accept(sock2, NULL, NULL); ==== 8< ======== 8< ======== 8< ======== 8< ==== one can immediatelly see an UBSAN warning: UBSAN: Undefined behaviour in crypto/algif_hash.c:187:7 variable length array bound value 0 <= 0 CPU: 0 PID: 15949 Comm: syz-executor Tainted: G E 4.4.30-0-default #1 ... Call Trace: ... [] ? __ubsan_handle_vla_bound_not_positive+0x13d/0x188 [] ? __ubsan_handle_out_of_bounds+0x1bc/0x1bc [] ? hash_accept+0x5bd/0x7d0 [algif_hash] [] ? hash_accept_nokey+0x3f/0x51 [algif_hash] [] ? hash_accept_parent_nokey+0x4a0/0x4a0 [algif_hash] [] ? SyS_accept+0x2b/0x40 It is a correct warning, as hash state is propagated to accept as zero, but creating a zero-length variable array is not allowed in C. Fix this as proposed by Herbert -- do "?: 1" on that site. No sizeof or similar happens in the code there, so we just allocate one byte even though we do not use the array. Signed-off-by: Jiri Slaby Cc: Herbert Xu Cc: "David S. Miller" (maintainer:CRYPTO API) Reported-by: Sasha Levin Signed-off-by: Herbert Xu Cc: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- crypto/algif_hash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index 68a5ceaa04c8..8d8b3eeba725 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -184,7 +184,7 @@ static int hash_accept(struct socket *sock, struct socket *newsock, int flags) struct alg_sock *ask = alg_sk(sk); struct hash_ctx *ctx = ask->private; struct ahash_request *req = &ctx->req; - char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req))]; + char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req)) ? : 1]; struct sock *sk2; struct alg_sock *ask2; struct hash_ctx *ctx2; From 0a5766a6a73b1eb6a0dfa74adc40272e555ac2f0 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 30 Mar 2017 09:36:33 +0200 Subject: [PATCH 161/521] Linux 4.4.58 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 841675e63a38..3efe2ea99e2d 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 57 +SUBLEVEL = 58 EXTRAVERSION = NAME = Blurry Fish Butt From a9a76a3e318ea559365210916378380109199121 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 8 Feb 2017 11:52:29 +0100 Subject: [PATCH 162/521] xfrm: policy: init locks early commit c282222a45cb9503cbfbebfdb60491f06ae84b49 upstream. Dmitry reports following splat: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1 [..] spin_lock_bh include/linux/spinlock.h:304 [inline] xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963 xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041 xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091 ops_init+0x10a/0x530 net/core/net_namespace.c:115 setup_net+0x2ed/0x690 net/core/net_namespace.c:291 copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396 create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205 SYSC_unshare kernel/fork.c:2281 [inline] Problem is that when we get error during xfrm_net_init we will call xfrm_policy_fini which will acquire xfrm_policy_lock before it was initialized. Just move it around so locks get set up first. Reported-by: Dmitry Vyukov Fixes: 283bc9f35bbbcb0e9 ("xfrm: Namespacify xfrm state/policy locks") Signed-off-by: Florian Westphal Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman --- net/xfrm/xfrm_policy.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index b5e665b3cfb0..36a50ef9295d 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -3030,6 +3030,11 @@ static int __net_init xfrm_net_init(struct net *net) { int rv; + /* Initialize the per-net locks here */ + spin_lock_init(&net->xfrm.xfrm_state_lock); + rwlock_init(&net->xfrm.xfrm_policy_lock); + mutex_init(&net->xfrm.xfrm_cfg_mutex); + rv = xfrm_statistics_init(net); if (rv < 0) goto out_statistics; @@ -3046,11 +3051,6 @@ static int __net_init xfrm_net_init(struct net *net) if (rv < 0) goto out; - /* Initialize the per-net locks here */ - spin_lock_init(&net->xfrm.xfrm_state_lock); - rwlock_init(&net->xfrm.xfrm_policy_lock); - mutex_init(&net->xfrm.xfrm_cfg_mutex); - return 0; out: From cce7e56dd73f75fef0a7f594fb129285a660fec0 Mon Sep 17 00:00:00 2001 From: Andy Whitcroft Date: Wed, 22 Mar 2017 07:29:31 +0000 Subject: [PATCH 163/521] xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window commit 677e806da4d916052585301785d847c3b3e6186a upstream. When a new xfrm state is created during an XFRM_MSG_NEWSA call we validate the user supplied replay_esn to ensure that the size is valid and to ensure that the replay_window size is within the allocated buffer. However later it is possible to update this replay_esn via a XFRM_MSG_NEWAE call. There we again validate the size of the supplied buffer matches the existing state and if so inject the contents. We do not at this point check that the replay_window is within the allocated memory. This leads to out-of-bounds reads and writes triggered by netlink packets. This leads to memory corruption and the potential for priviledge escalation. We already attempt to validate the incoming replay information in xfrm_new_ae() via xfrm_replay_verify_len(). This confirms that the user is not trying to change the size of the replay state buffer which includes the replay_esn. It however does not check the replay_window remains within that buffer. Add validation of the contained replay_window. CVE-2017-7184 Signed-off-by: Andy Whitcroft Acked-by: Steffen Klassert Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- net/xfrm/xfrm_user.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 805681a7d356..0e1f833bc77d 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -415,6 +415,9 @@ static inline int xfrm_replay_verify_len(struct xfrm_replay_state_esn *replay_es if (nla_len(rp) < ulen || xfrm_replay_state_esn_len(replay_esn) != ulen) return -EINVAL; + if (up->replay_window > up->bmp_len * sizeof(__u32) * 8) + return -EINVAL; + return 0; } From 22c9e7c092f63335ba7a7301e0e0b4c4ebed53a8 Mon Sep 17 00:00:00 2001 From: Andy Whitcroft Date: Thu, 23 Mar 2017 07:45:44 +0000 Subject: [PATCH 164/521] xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder commit f843ee6dd019bcece3e74e76ad9df0155655d0df upstream. Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to wrapping issues. To ensure we are correctly ensuring that the two ESN structures are the same size compare both the overall size as reported by xfrm_replay_state_esn_len() and the internal length are the same. CVE-2017-7184 Signed-off-by: Andy Whitcroft Acked-by: Steffen Klassert Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- net/xfrm/xfrm_user.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 0e1f833bc77d..7a5a64e70b4d 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -412,7 +412,11 @@ static inline int xfrm_replay_verify_len(struct xfrm_replay_state_esn *replay_es up = nla_data(rp); ulen = xfrm_replay_state_esn_len(up); - if (nla_len(rp) < ulen || xfrm_replay_state_esn_len(replay_esn) != ulen) + /* Check the overall length and the internal bitmap length to avoid + * potential overflow. */ + if (nla_len(rp) < ulen || + xfrm_replay_state_esn_len(replay_esn) != ulen || + replay_esn->bmp_len != up->bmp_len) return -EINVAL; if (up->replay_window > up->bmp_len * sizeof(__u32) * 8) From 927d04793f8a587532a5c26057bcdcb33bc8f5ba Mon Sep 17 00:00:00 2001 From: Ladi Prosek Date: Thu, 23 Mar 2017 08:04:18 +0100 Subject: [PATCH 165/521] virtio_balloon: init 1st buffer in stats vq commit fc8653228c8588a120f6b5dad6983b7b61ff669e upstream. When init_vqs runs, virtio_balloon.stats is either uninitialized or contains stale values. The host updates its state with garbage data because it has no way of knowing that this is just a marker buffer used for signaling. This patch updates the stats before pushing the initial buffer. Alternative fixes: * Push an empty buffer in init_vqs. Not easily done with the current virtio implementation and violates the spec "Driver MUST supply the same subset of statistics in all buffers submitted to the statsq". * Push a buffer with invalid tags in init_vqs. Violates the same spec clause, plus "invalid tag" is not really defined. Note: the spec says: When using the legacy interface, the device SHOULD ignore all values in the first buffer in the statsq supplied by the driver after device initialization. Note: Historically, drivers supplied an uninitialized buffer in the first buffer. Unfortunately QEMU does not seem to implement the recommendation even for the legacy interface. Signed-off-by: Ladi Prosek Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/virtio/virtio_balloon.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 56f7e2521202..01d15dca940e 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -416,6 +416,8 @@ static int init_vqs(struct virtio_balloon *vb) * Prime this virtqueue with one buffer so the hypervisor can * use it to signal us later (it can't be broken yet!). */ + update_balloon_stats(vb); + sg_init_one(&sg, vb->stats, sizeof vb->stats); if (virtqueue_add_outbuf(vb->stats_vq, &sg, 1, vb, GFP_KERNEL) < 0) From 800791e7e0fd9835be2f55c55147c379888b7442 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Tue, 14 Mar 2017 08:23:26 -0700 Subject: [PATCH 166/521] pinctrl: qcom: Don't clear status bit on irq_unmask commit a6566710adaa4a7dd5e0d99820ff9c9c30ee5951 upstream. Clearing the status bit on irq_unmask will discard any pending interrupt that did arrive after the irq_ack, i.e. while the IRQ handler function was executing. Fixes: f365be092572 ("pinctrl: Add Qualcomm TLMM driver") Cc: Stephen Boyd Reported-by: Timur Tabi Signed-off-by: Bjorn Andersson Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/qcom/pinctrl-msm.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c b/drivers/pinctrl/qcom/pinctrl-msm.c index 146264a41ec8..9736f9be5447 100644 --- a/drivers/pinctrl/qcom/pinctrl-msm.c +++ b/drivers/pinctrl/qcom/pinctrl-msm.c @@ -597,10 +597,6 @@ static void msm_gpio_irq_unmask(struct irq_data *d) spin_lock_irqsave(&pctrl->lock, flags); - val = readl(pctrl->regs + g->intr_status_reg); - val &= ~BIT(g->intr_status_bit); - writel(val, pctrl->regs + g->intr_status_reg); - val = readl(pctrl->regs + g->intr_cfg_reg); val |= BIT(g->intr_enable_bit); writel(val, pctrl->regs + g->intr_cfg_reg); From 6e174bbd0631865acc193804fa4043852f3198c5 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 27 Mar 2017 15:10:53 +0100 Subject: [PATCH 167/521] c6x/ptrace: Remove useless PTRACE_SETREGSET implementation commit fb411b837b587a32046dc4f369acb93a10b1def8 upstream. gpr_set won't work correctly and can never have been tested, and the correct behaviour is not clear due to the endianness-dependent task layout. So, just remove it. The core code will now return -EOPNOTSUPPORT when trying to set NT_PRSTATUS on this architecture until/unless a correct implementation is supplied. Signed-off-by: Dave Martin Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/c6x/kernel/ptrace.c | 41 ---------------------------------------- 1 file changed, 41 deletions(-) diff --git a/arch/c6x/kernel/ptrace.c b/arch/c6x/kernel/ptrace.c index 3c494e84444d..a511ac16a8e3 100644 --- a/arch/c6x/kernel/ptrace.c +++ b/arch/c6x/kernel/ptrace.c @@ -69,46 +69,6 @@ static int gpr_get(struct task_struct *target, 0, sizeof(*regs)); } -static int gpr_set(struct task_struct *target, - const struct user_regset *regset, - unsigned int pos, unsigned int count, - const void *kbuf, const void __user *ubuf) -{ - int ret; - struct pt_regs *regs = task_pt_regs(target); - - /* Don't copyin TSR or CSR */ - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, - ®s, - 0, PT_TSR * sizeof(long)); - if (ret) - return ret; - - ret = user_regset_copyin_ignore(&pos, &count, &kbuf, &ubuf, - PT_TSR * sizeof(long), - (PT_TSR + 1) * sizeof(long)); - if (ret) - return ret; - - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, - ®s, - (PT_TSR + 1) * sizeof(long), - PT_CSR * sizeof(long)); - if (ret) - return ret; - - ret = user_regset_copyin_ignore(&pos, &count, &kbuf, &ubuf, - PT_CSR * sizeof(long), - (PT_CSR + 1) * sizeof(long)); - if (ret) - return ret; - - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, - ®s, - (PT_CSR + 1) * sizeof(long), -1); - return ret; -} - enum c6x_regset { REGSET_GPR, }; @@ -120,7 +80,6 @@ static const struct user_regset c6x_regsets[] = { .size = sizeof(u32), .align = sizeof(u32), .get = gpr_get, - .set = gpr_set }, }; From e1dc8904b33b8c01f22d904fed4cb5f2060f5da3 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 27 Mar 2017 15:10:54 +0100 Subject: [PATCH 168/521] h8300/ptrace: Fix incorrect register transfer count commit 502585c7555083d4a949c08350306b9ec196779e upstream. regs_set() and regs_get() are vulnerable to an off-by-1 buffer overrun if CONFIG_CPU_H8S is set, since this adds an extra entry to register_offset[] but not to user_regs_struct. So, iterate over user_regs_struct based on its actual size, not based on the length of register_offset[]. Signed-off-by: Dave Martin Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/h8300/kernel/ptrace.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/h8300/kernel/ptrace.c b/arch/h8300/kernel/ptrace.c index 92075544a19a..0dc1c8f622bc 100644 --- a/arch/h8300/kernel/ptrace.c +++ b/arch/h8300/kernel/ptrace.c @@ -95,7 +95,8 @@ static int regs_get(struct task_struct *target, long *reg = (long *)®s; /* build user regs in buffer */ - for (r = 0; r < ARRAY_SIZE(register_offset); r++) + BUILD_BUG_ON(sizeof(regs) % sizeof(long) != 0); + for (r = 0; r < sizeof(regs) / sizeof(long); r++) *reg++ = h8300_get_reg(target, r); return user_regset_copyout(&pos, &count, &kbuf, &ubuf, @@ -113,7 +114,8 @@ static int regs_set(struct task_struct *target, long *reg; /* build user regs in buffer */ - for (reg = (long *)®s, r = 0; r < ARRAY_SIZE(register_offset); r++) + BUILD_BUG_ON(sizeof(regs) % sizeof(long) != 0); + for (reg = (long *)®s, r = 0; r < sizeof(regs) / sizeof(long); r++) *reg++ = h8300_get_reg(target, r); ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, @@ -122,7 +124,7 @@ static int regs_set(struct task_struct *target, return ret; /* write back to pt_regs */ - for (reg = (long *)®s, r = 0; r < ARRAY_SIZE(register_offset); r++) + for (reg = (long *)®s, r = 0; r < sizeof(regs) / sizeof(long); r++) h8300_put_reg(target, r, *reg++); return 0; } From c8693666856c0db4a1e07235d98ce0b3bde98d9e Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 27 Mar 2017 15:10:58 +0100 Subject: [PATCH 169/521] mips/ptrace: Preserve previous registers for short regset write commit d614fd58a2834cfe4efa472c33c8f3ce2338b09b upstream. Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill all the registers, the thread's old registers are preserved. Signed-off-by: Dave Martin Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/ptrace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c index 74d581569778..c95bf18260f8 100644 --- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -485,7 +485,8 @@ static int fpr_set(struct task_struct *target, &target->thread.fpu, 0, sizeof(elf_fpregset_t)); - for (i = 0; i < NUM_FPU_REGS; i++) { + BUILD_BUG_ON(sizeof(fpr_val) != sizeof(elf_fpreg_t)); + for (i = 0; i < NUM_FPU_REGS && count >= sizeof(elf_fpreg_t); i++) { err = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &fpr_val, i * sizeof(elf_fpreg_t), (i + 1) * sizeof(elf_fpreg_t)); From 962b95a88574359b081e24815fae6aba92fff98d Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 27 Mar 2017 15:10:59 +0100 Subject: [PATCH 170/521] sparc/ptrace: Preserve previous registers for short regset write commit d3805c546b275c8cc7d40f759d029ae92c7175f2 upstream. Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill all the registers, the thread's old registers are preserved. Signed-off-by: Dave Martin Acked-by: David S. Miller Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel/ptrace_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sparc/kernel/ptrace_64.c b/arch/sparc/kernel/ptrace_64.c index 9ddc4928a089..c1566170964f 100644 --- a/arch/sparc/kernel/ptrace_64.c +++ b/arch/sparc/kernel/ptrace_64.c @@ -311,7 +311,7 @@ static int genregs64_set(struct task_struct *target, } if (!ret) { - unsigned long y; + unsigned long y = regs->y; ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &y, From 2d9bc3695012f1ef7465f56302c1e60c48dccde8 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 27 Mar 2017 15:10:55 +0100 Subject: [PATCH 171/521] metag/ptrace: Preserve previous registers for short regset write commit a78ce80d2c9178351b34d78fec805140c29c193e upstream. Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill all the registers, the thread's old registers are preserved. Signed-off-by: Dave Martin Acked-by: James Hogan Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/metag/kernel/ptrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/metag/kernel/ptrace.c b/arch/metag/kernel/ptrace.c index 7563628822bd..ae659ba61948 100644 --- a/arch/metag/kernel/ptrace.c +++ b/arch/metag/kernel/ptrace.c @@ -303,7 +303,7 @@ static int metag_tls_set(struct task_struct *target, const void *kbuf, const void __user *ubuf) { int ret; - void __user *tls; + void __user *tls = target->thread.tls_ptr; ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &tls, 0, -1); if (ret) From e441102d8c074d63d44329a59f3278573cdc1477 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 27 Mar 2017 15:10:56 +0100 Subject: [PATCH 172/521] metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS commit 5fe81fe98123ce41265c65e95d34418d30d005d1 upstream. Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill TXSTATUS, a well-defined default value is used, based on the task's current value. Suggested-by: James Hogan Signed-off-by: Dave Martin Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/metag/kernel/ptrace.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/metag/kernel/ptrace.c b/arch/metag/kernel/ptrace.c index ae659ba61948..2e4dfc15abd3 100644 --- a/arch/metag/kernel/ptrace.c +++ b/arch/metag/kernel/ptrace.c @@ -24,6 +24,16 @@ * user_regset definitions. */ +static unsigned long user_txstatus(const struct pt_regs *regs) +{ + unsigned long data = (unsigned long)regs->ctx.Flags; + + if (regs->ctx.SaveMask & TBICTX_CBUF_BIT) + data |= USER_GP_REGS_STATUS_CATCH_BIT; + + return data; +} + int metag_gp_regs_copyout(const struct pt_regs *regs, unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf) @@ -62,9 +72,7 @@ int metag_gp_regs_copyout(const struct pt_regs *regs, if (ret) goto out; /* TXSTATUS */ - data = (unsigned long)regs->ctx.Flags; - if (regs->ctx.SaveMask & TBICTX_CBUF_BIT) - data |= USER_GP_REGS_STATUS_CATCH_BIT; + data = user_txstatus(regs); ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &data, 4*25, 4*26); if (ret) @@ -119,6 +127,7 @@ int metag_gp_regs_copyin(struct pt_regs *regs, if (ret) goto out; /* TXSTATUS */ + data = user_txstatus(regs); ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &data, 4*25, 4*26); if (ret) From 573341eba9c44b0b2198373cb453bbbb5b3f066a Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 27 Mar 2017 15:10:57 +0100 Subject: [PATCH 173/521] metag/ptrace: Reject partial NT_METAG_RPIPE writes commit 7195ee3120d878259e8d94a5d9f808116f34d5ea upstream. It's not clear what behaviour is sensible when doing partial write of NT_METAG_RPIPE, so just don't bother. This patch assumes that userspace will never rely on a partial SETREGSET in this case, since it's not clear what should happen anyway. Signed-off-by: Dave Martin Acked-by: James Hogan Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/metag/kernel/ptrace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/metag/kernel/ptrace.c b/arch/metag/kernel/ptrace.c index 2e4dfc15abd3..5e2dc7defd2c 100644 --- a/arch/metag/kernel/ptrace.c +++ b/arch/metag/kernel/ptrace.c @@ -253,6 +253,8 @@ int metag_rp_state_copyin(struct pt_regs *regs, unsigned long long *ptr; int ret, i; + if (count < 4*13) + return -EINVAL; /* Read the entire pipeline before making any changes */ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &rp, 0, 4*13); From 7a5202190810dde1467718235c1f650fcf57592a Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 21 Feb 2017 15:07:11 -0800 Subject: [PATCH 174/521] fscrypt: remove broken support for detecting keyring key revocation commit 1b53cf9815bb4744958d41f3795d5d5a1d365e2d upstream. Filesystem encryption ostensibly supported revoking a keyring key that had been used to "unlock" encrypted files, causing those files to become "locked" again. This was, however, buggy for several reasons, the most severe of which was that when key revocation happened to be detected for an inode, its fscrypt_info was immediately freed, even while other threads could be using it for encryption or decryption concurrently. This could be exploited to crash the kernel or worse. This patch fixes the use-after-free by removing the code which detects the keyring key having been revoked, invalidated, or expired. Instead, an encrypted inode that is "unlocked" now simply remains unlocked until it is evicted from memory. Note that this is no worse than the case for block device-level encryption, e.g. dm-crypt, and it still remains possible for a privileged user to evict unused pages, inodes, and dentries by running 'sync; echo 3 > /proc/sys/vm/drop_caches', or by simply unmounting the filesystem. In fact, one of those actions was already needed anyway for key revocation to work even somewhat sanely. This change is not expected to break any applications. In the future I'd like to implement a real API for fscrypt key revocation that interacts sanely with ongoing filesystem operations --- waiting for existing operations to complete and blocking new operations, and invalidating and sanitizing key material and plaintext from the VFS caches. But this is a hard problem, and for now this bug must be fixed. This bug affected almost all versions of ext4, f2fs, and ubifs encryption, and it was potentially reachable in any kernel configured with encryption support (CONFIG_EXT4_ENCRYPTION=y, CONFIG_EXT4_FS_ENCRYPTION=y, CONFIG_F2FS_FS_ENCRYPTION=y, or CONFIG_UBIFS_FS_ENCRYPTION=y). Note that older kernels did not use the shared fs/crypto/ code, but due to the potential security implications of this bug, it may still be worthwhile to backport this fix to them. Fixes: b7236e21d55f ("ext4 crypto: reorganize how we store keys in the inode") Signed-off-by: Eric Biggers Signed-off-by: Theodore Ts'o Acked-by: Michael Halcrow Signed-off-by: Greg Kroah-Hartman --- fs/ext4/crypto_key.c | 28 +++++++--------------------- fs/ext4/ext4.h | 14 +------------- fs/ext4/ext4_crypto.h | 1 - fs/f2fs/crypto_key.c | 28 +++++++--------------------- fs/f2fs/f2fs.h | 14 +------------- fs/f2fs/f2fs_crypto.h | 1 - 6 files changed, 16 insertions(+), 70 deletions(-) diff --git a/fs/ext4/crypto_key.c b/fs/ext4/crypto_key.c index 9a16d1e75a49..505f8afde57c 100644 --- a/fs/ext4/crypto_key.c +++ b/fs/ext4/crypto_key.c @@ -88,8 +88,6 @@ void ext4_free_crypt_info(struct ext4_crypt_info *ci) if (!ci) return; - if (ci->ci_keyring_key) - key_put(ci->ci_keyring_key); crypto_free_ablkcipher(ci->ci_ctfm); kmem_cache_free(ext4_crypt_info_cachep, ci); } @@ -111,7 +109,7 @@ void ext4_free_encryption_info(struct inode *inode, ext4_free_crypt_info(ci); } -int _ext4_get_encryption_info(struct inode *inode) +int ext4_get_encryption_info(struct inode *inode) { struct ext4_inode_info *ei = EXT4_I(inode); struct ext4_crypt_info *crypt_info; @@ -128,22 +126,15 @@ int _ext4_get_encryption_info(struct inode *inode) char mode; int res; + if (ei->i_crypt_info) + return 0; + if (!ext4_read_workqueue) { res = ext4_init_crypto(); if (res) return res; } -retry: - crypt_info = ACCESS_ONCE(ei->i_crypt_info); - if (crypt_info) { - if (!crypt_info->ci_keyring_key || - key_validate(crypt_info->ci_keyring_key) == 0) - return 0; - ext4_free_encryption_info(inode, crypt_info); - goto retry; - } - res = ext4_xattr_get(inode, EXT4_XATTR_INDEX_ENCRYPTION, EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, &ctx, sizeof(ctx)); @@ -166,7 +157,6 @@ int _ext4_get_encryption_info(struct inode *inode) crypt_info->ci_data_mode = ctx.contents_encryption_mode; crypt_info->ci_filename_mode = ctx.filenames_encryption_mode; crypt_info->ci_ctfm = NULL; - crypt_info->ci_keyring_key = NULL; memcpy(crypt_info->ci_master_key, ctx.master_key_descriptor, sizeof(crypt_info->ci_master_key)); if (S_ISREG(inode->i_mode)) @@ -206,7 +196,6 @@ int _ext4_get_encryption_info(struct inode *inode) keyring_key = NULL; goto out; } - crypt_info->ci_keyring_key = keyring_key; if (keyring_key->type != &key_type_logon) { printk_once(KERN_WARNING "ext4: key type must be logon\n"); @@ -253,16 +242,13 @@ int _ext4_get_encryption_info(struct inode *inode) ext4_encryption_key_size(mode)); if (res) goto out; - memzero_explicit(raw_key, sizeof(raw_key)); - if (cmpxchg(&ei->i_crypt_info, NULL, crypt_info) != NULL) { - ext4_free_crypt_info(crypt_info); - goto retry; - } - return 0; + if (cmpxchg(&ei->i_crypt_info, NULL, crypt_info) == NULL) + crypt_info = NULL; out: if (res == -ENOKEY) res = 0; + key_put(keyring_key); ext4_free_crypt_info(crypt_info); memzero_explicit(raw_key, sizeof(raw_key)); return res; diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index cd5914495ad7..362d59b24f1d 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2330,23 +2330,11 @@ static inline void ext4_fname_free_filename(struct ext4_filename *fname) { } /* crypto_key.c */ void ext4_free_crypt_info(struct ext4_crypt_info *ci); void ext4_free_encryption_info(struct inode *inode, struct ext4_crypt_info *ci); -int _ext4_get_encryption_info(struct inode *inode); #ifdef CONFIG_EXT4_FS_ENCRYPTION int ext4_has_encryption_key(struct inode *inode); -static inline int ext4_get_encryption_info(struct inode *inode) -{ - struct ext4_crypt_info *ci = EXT4_I(inode)->i_crypt_info; - - if (!ci || - (ci->ci_keyring_key && - (ci->ci_keyring_key->flags & ((1 << KEY_FLAG_INVALIDATED) | - (1 << KEY_FLAG_REVOKED) | - (1 << KEY_FLAG_DEAD))))) - return _ext4_get_encryption_info(inode); - return 0; -} +int ext4_get_encryption_info(struct inode *inode); static inline struct ext4_crypt_info *ext4_encryption_info(struct inode *inode) { diff --git a/fs/ext4/ext4_crypto.h b/fs/ext4/ext4_crypto.h index ac7d4e813796..1b17b05b9f4d 100644 --- a/fs/ext4/ext4_crypto.h +++ b/fs/ext4/ext4_crypto.h @@ -78,7 +78,6 @@ struct ext4_crypt_info { char ci_filename_mode; char ci_flags; struct crypto_ablkcipher *ci_ctfm; - struct key *ci_keyring_key; char ci_master_key[EXT4_KEY_DESCRIPTOR_SIZE]; }; diff --git a/fs/f2fs/crypto_key.c b/fs/f2fs/crypto_key.c index 5de2d866a25c..18595d7a0efc 100644 --- a/fs/f2fs/crypto_key.c +++ b/fs/f2fs/crypto_key.c @@ -92,7 +92,6 @@ static void f2fs_free_crypt_info(struct f2fs_crypt_info *ci) if (!ci) return; - key_put(ci->ci_keyring_key); crypto_free_ablkcipher(ci->ci_ctfm); kmem_cache_free(f2fs_crypt_info_cachep, ci); } @@ -113,7 +112,7 @@ void f2fs_free_encryption_info(struct inode *inode, struct f2fs_crypt_info *ci) f2fs_free_crypt_info(ci); } -int _f2fs_get_encryption_info(struct inode *inode) +int f2fs_get_encryption_info(struct inode *inode) { struct f2fs_inode_info *fi = F2FS_I(inode); struct f2fs_crypt_info *crypt_info; @@ -129,18 +128,12 @@ int _f2fs_get_encryption_info(struct inode *inode) char mode; int res; + if (fi->i_crypt_info) + return 0; + res = f2fs_crypto_initialize(); if (res) return res; -retry: - crypt_info = ACCESS_ONCE(fi->i_crypt_info); - if (crypt_info) { - if (!crypt_info->ci_keyring_key || - key_validate(crypt_info->ci_keyring_key) == 0) - return 0; - f2fs_free_encryption_info(inode, crypt_info); - goto retry; - } res = f2fs_getxattr(inode, F2FS_XATTR_INDEX_ENCRYPTION, F2FS_XATTR_NAME_ENCRYPTION_CONTEXT, @@ -159,7 +152,6 @@ int _f2fs_get_encryption_info(struct inode *inode) crypt_info->ci_data_mode = ctx.contents_encryption_mode; crypt_info->ci_filename_mode = ctx.filenames_encryption_mode; crypt_info->ci_ctfm = NULL; - crypt_info->ci_keyring_key = NULL; memcpy(crypt_info->ci_master_key, ctx.master_key_descriptor, sizeof(crypt_info->ci_master_key)); if (S_ISREG(inode->i_mode)) @@ -197,7 +189,6 @@ int _f2fs_get_encryption_info(struct inode *inode) keyring_key = NULL; goto out; } - crypt_info->ci_keyring_key = keyring_key; BUG_ON(keyring_key->type != &key_type_logon); ukp = user_key_payload(keyring_key); if (ukp->datalen != sizeof(struct f2fs_encryption_key)) { @@ -230,17 +221,12 @@ int _f2fs_get_encryption_info(struct inode *inode) if (res) goto out; - memzero_explicit(raw_key, sizeof(raw_key)); - if (cmpxchg(&fi->i_crypt_info, NULL, crypt_info) != NULL) { - f2fs_free_crypt_info(crypt_info); - goto retry; - } - return 0; - + if (cmpxchg(&fi->i_crypt_info, NULL, crypt_info) == NULL) + crypt_info = NULL; out: if (res == -ENOKEY && !S_ISREG(inode->i_mode)) res = 0; - + key_put(keyring_key); f2fs_free_crypt_info(crypt_info); memzero_explicit(raw_key, sizeof(raw_key)); return res; diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 9db5500d63d9..b1aeca83f4be 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2149,7 +2149,6 @@ void f2fs_end_io_crypto_work(struct f2fs_crypto_ctx *, struct bio *); /* crypto_key.c */ void f2fs_free_encryption_info(struct inode *, struct f2fs_crypt_info *); -int _f2fs_get_encryption_info(struct inode *inode); /* crypto_fname.c */ bool f2fs_valid_filenames_enc_mode(uint32_t); @@ -2170,18 +2169,7 @@ void f2fs_exit_crypto(void); int f2fs_has_encryption_key(struct inode *); -static inline int f2fs_get_encryption_info(struct inode *inode) -{ - struct f2fs_crypt_info *ci = F2FS_I(inode)->i_crypt_info; - - if (!ci || - (ci->ci_keyring_key && - (ci->ci_keyring_key->flags & ((1 << KEY_FLAG_INVALIDATED) | - (1 << KEY_FLAG_REVOKED) | - (1 << KEY_FLAG_DEAD))))) - return _f2fs_get_encryption_info(inode); - return 0; -} +int f2fs_get_encryption_info(struct inode *inode); void f2fs_fname_crypto_free_buffer(struct f2fs_str *); int f2fs_fname_setup_filename(struct inode *, const struct qstr *, diff --git a/fs/f2fs/f2fs_crypto.h b/fs/f2fs/f2fs_crypto.h index c2c1c2b63b25..f113f1a1e8c1 100644 --- a/fs/f2fs/f2fs_crypto.h +++ b/fs/f2fs/f2fs_crypto.h @@ -79,7 +79,6 @@ struct f2fs_crypt_info { char ci_filename_mode; char ci_flags; struct crypto_ablkcipher *ci_ctfm; - struct key *ci_keyring_key; char ci_master_key[F2FS_KEY_DESCRIPTOR_SIZE]; }; From 2bed5987692cb6dc3bf3ce15d8abeb79fdf4ab2a Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Tue, 24 Jan 2017 15:40:06 +0100 Subject: [PATCH 175/521] sched/rt: Add a missing rescheduling point commit 619bd4a71874a8fd78eb6ccf9f272c5e98bcc7b7 upstream. Since the change in commit: fd7a4bed1835 ("sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks") ... we don't reschedule a task under certain circumstances: Lets say task-A, SCHED_OTHER, is running on CPU0 (and it may run only on CPU0) and holds a PI lock. This task is removed from the CPU because it used up its time slice and another SCHED_OTHER task is running. Task-B on CPU1 runs at RT priority and asks for the lock owned by task-A. This results in a priority boost for task-A. Task-B goes to sleep until the lock has been made available. Task-A is already runnable (but not active), so it receives no wake up. The reality now is that task-A gets on the CPU once the scheduler decides to remove the current task despite the fact that a high priority task is enqueued and waiting. This may take a long time. The desired behaviour is that CPU0 immediately reschedules after the priority boost which made task-A the task with the lowest priority. Suggested-by: Peter Zijlstra Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Mike Galbraith Cc: Thomas Gleixner Fixes: fd7a4bed1835 ("sched, rt: Convert switched_{from, to}_rt() prio_changed_rt() to balance callbacks") Link: http://lkml.kernel.org/r/20170124144006.29821-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/sched/deadline.c | 3 +-- kernel/sched/rt.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 8b0a15e285f9..e984f059e5fc 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1771,12 +1771,11 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p) #ifdef CONFIG_SMP if (p->nr_cpus_allowed > 1 && rq->dl.overloaded) queue_push_tasks(rq); -#else +#endif if (dl_task(rq->curr)) check_preempt_curr_dl(rq, p, 0); else resched_curr(rq); -#endif } } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 8ec86abe0ea1..78ae5c1d9412 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2136,10 +2136,9 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p) #ifdef CONFIG_SMP if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) queue_push_tasks(rq); -#else +#endif /* CONFIG_SMP */ if (p->prio < rq->curr->prio) resched_curr(rq); -#endif /* CONFIG_SMP */ } } From 61a4577c9a4419b99e647744923517d47255da35 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 31 Mar 2017 10:17:09 +0200 Subject: [PATCH 176/521] Linux 4.4.59 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 3efe2ea99e2d..083724c6ca4d 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 58 +SUBLEVEL = 59 EXTRAVERSION = NAME = Blurry Fish Butt From b041dac75629fd37b9cbf9993234b0fd8c77c638 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Thu, 8 Sep 2016 09:57:07 +0200 Subject: [PATCH 177/521] fs/proc/kcore.c: Make bounce buffer global for read Next patch adds bounce buffer for ktext area, so it's convenient to have single bounce buffer for both vmalloc/module and ktext cases. Suggested-by: Linus Torvalds Signed-off-by: Jiri Olsa Acked-by: Kees Cook Signed-off-by: Linus Torvalds (cherry picked from commit f5beeb1851ea6f8cfcf2657f26cb24c0582b4945) Signed-off-by: Alex Shi --- fs/proc/kcore.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index d1e5a66a33ba..21f198aa0961 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -430,6 +430,7 @@ static void elf_kcore_store_hdr(char *bufp, int nphdr, int dataoff) static ssize_t read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) { + char *buf = file->private_data; ssize_t acc = 0; size_t size, tsz; size_t elf_buflen; @@ -500,18 +501,10 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) if (clear_user(buffer, tsz)) return -EFAULT; } else if (is_vmalloc_or_module_addr((void *)start)) { - char * elf_buf; - - elf_buf = kzalloc(tsz, GFP_KERNEL); - if (!elf_buf) - return -ENOMEM; - vread(elf_buf, (char *)start, tsz); + vread(buf, (char *)start, tsz); /* we have to zero-fill user buffer even if no read */ - if (copy_to_user(buffer, elf_buf, tsz)) { - kfree(elf_buf); + if (copy_to_user(buffer, buf, tsz)) return -EFAULT; - } - kfree(elf_buf); } else { if (kern_addr_valid(start)) { unsigned long n; @@ -554,6 +547,11 @@ static int open_kcore(struct inode *inode, struct file *filp) { if (!capable(CAP_SYS_RAWIO)) return -EPERM; + + filp->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL); + if (!filp->private_data) + return -ENOMEM; + if (kcore_need_update) kcore_update_ram(); if (i_size_read(inode) != proc_root_kcore->size) { @@ -564,10 +562,16 @@ static int open_kcore(struct inode *inode, struct file *filp) return 0; } +static int release_kcore(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + return 0; +} static const struct file_operations proc_kcore_operations = { .read = read_kcore, .open = open_kcore, + .release = release_kcore, .llseek = default_llseek, }; From ba46d8fab00a8e1538df241681d9161c8ec85778 Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Tue, 21 Mar 2017 13:44:28 +0100 Subject: [PATCH 178/521] libceph: force GFP_NOIO for socket allocations commit 633ee407b9d15a75ac9740ba9d3338815e1fcb95 upstream. sock_alloc_inode() allocates socket+inode and socket_wq with GFP_KERNEL, which is not allowed on the writeback path: Workqueue: ceph-msgr con_work [libceph] ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000 0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00 ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148 Call Trace: [] schedule+0x29/0x70 [] schedule_timeout+0x1bd/0x200 [] ? ttwu_do_wakeup+0x2c/0x120 [] ? ttwu_do_activate.constprop.135+0x66/0x70 [] wait_for_completion+0xbf/0x180 [] ? try_to_wake_up+0x390/0x390 [] flush_work+0x165/0x250 [] ? worker_detach_from_pool+0xd0/0xd0 [] xlog_cil_force_lsn+0x81/0x200 [xfs] [] ? __slab_free+0xee/0x234 [] _xfs_log_force_lsn+0x4d/0x2c0 [xfs] [] ? lookup_page_cgroup_used+0xe/0x30 [] ? xfs_reclaim_inode+0xa3/0x330 [xfs] [] xfs_log_force_lsn+0x3f/0xf0 [xfs] [] ? xfs_reclaim_inode+0xa3/0x330 [xfs] [] xfs_iunpin_wait+0xc6/0x1a0 [xfs] [] ? wake_atomic_t_function+0x40/0x40 [] xfs_reclaim_inode+0xa3/0x330 [xfs] [] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs] [] xfs_reclaim_inodes_nr+0x33/0x40 [xfs] [] xfs_fs_free_cached_objects+0x15/0x20 [xfs] [] super_cache_scan+0x178/0x180 [] shrink_slab_node+0x14e/0x340 [] ? mem_cgroup_iter+0x16b/0x450 [] shrink_slab+0x100/0x140 [] do_try_to_free_pages+0x335/0x490 [] try_to_free_pages+0xb9/0x1f0 [] ? __alloc_pages_direct_compact+0x69/0x1be [] __alloc_pages_nodemask+0x69a/0xb40 [] alloc_pages_current+0x9e/0x110 [] new_slab+0x2c5/0x390 [] __slab_alloc+0x33b/0x459 [] ? sock_alloc_inode+0x2d/0xd0 [] ? inet_sendmsg+0x71/0xc0 [] ? sock_alloc_inode+0x2d/0xd0 [] kmem_cache_alloc+0x1a2/0x1b0 [] sock_alloc_inode+0x2d/0xd0 [] alloc_inode+0x26/0xa0 [] new_inode_pseudo+0x1a/0x70 [] sock_alloc+0x1e/0x80 [] __sock_create+0x95/0x220 [] sock_create_kern+0x24/0x30 [] con_work+0xef9/0x2050 [libceph] [] ? rbd_img_request_submit+0x4c/0x60 [rbd] [] process_one_work+0x159/0x4f0 [] worker_thread+0x11b/0x530 [] ? create_worker+0x1d0/0x1d0 [] kthread+0xc9/0xe0 [] ? flush_kthread_worker+0x90/0x90 [] ret_from_fork+0x58/0x90 [] ? flush_kthread_worker+0x90/0x90 Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here. Link: http://tracker.ceph.com/issues/19309 Reported-by: Sergey Jerusalimov Signed-off-by: Ilya Dryomov Reviewed-by: Jeff Layton Signed-off-by: Greg Kroah-Hartman --- net/ceph/messenger.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index b8d927c56494..a6b2f2138c9d 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -478,11 +479,16 @@ static int ceph_tcp_connect(struct ceph_connection *con) { struct sockaddr_storage *paddr = &con->peer_addr.in_addr; struct socket *sock; + unsigned int noio_flag; int ret; BUG_ON(con->sock); + + /* sock_create_kern() allocates with GFP_KERNEL */ + noio_flag = memalloc_noio_save(); ret = sock_create_kern(read_pnet(&con->msgr->net), paddr->ss_family, SOCK_STREAM, IPPROTO_TCP, &sock); + memalloc_noio_restore(noio_flag); if (ret) return ret; sock->sk->sk_allocation = GFP_NOFS; From 1eed198ce16b6e05c05ee381e5d90fac35ea67a7 Mon Sep 17 00:00:00 2001 From: Ross Lagerwall Date: Mon, 12 Dec 2016 14:35:13 +0000 Subject: [PATCH 179/521] xen/setup: Don't relocate p2m over existing one commit 7ecec8503af37de6be4f96b53828d640a968705f upstream. When relocating the p2m, take special care not to relocate it so that is overlaps with the current location of the p2m/initrd. This is needed since the full extent of the current location is not marked as a reserved region in the e820. This was seen to happen to a dom0 with a large initial p2m and a small reserved region in the middle of the initial p2m. Signed-off-by: Ross Lagerwall Reviewed-by: Juergen Gross Signed-off-by: Juergen Gross Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/setup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index e345891450c3..df8844a1853a 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -713,10 +713,9 @@ static void __init xen_reserve_xen_mfnlist(void) size = PFN_PHYS(xen_start_info->nr_p2m_frames); } - if (!xen_is_e820_reserved(start, size)) { - memblock_reserve(start, size); + memblock_reserve(start, size); + if (!xen_is_e820_reserved(start, size)) return; - } #ifdef CONFIG_X86_32 /* @@ -727,6 +726,7 @@ static void __init xen_reserve_xen_mfnlist(void) BUG(); #else xen_relocate_p2m(); + memblock_free(start, size); #endif } From 18639c4bad72218954e728e9ca65c33b13ba673a Mon Sep 17 00:00:00 2001 From: James Bottomley Date: Sun, 1 Jan 2017 09:39:24 -0800 Subject: [PATCH 180/521] scsi: mpt3sas: fix hang on ata passthrough commands commit ffb58456589443ca572221fabbdef3db8483a779 upstream. mpt3sas has a firmware failure where it can only handle one pass through ATA command at a time. If another comes in, contrary to the SAT standard, it will hang until the first one completes (causing long commands like secure erase to timeout). The original fix was to block the device when an ATA command came in, but this caused a regression with commit 669f044170d8933c3d66d231b69ea97cb8447338 Author: Bart Van Assche Date: Tue Nov 22 16:17:13 2016 -0800 scsi: srp_transport: Move queuecommand() wait code to SCSI core So fix the original fix of the secure erase timeout by properly returning SAM_STAT_BUSY like the SAT recommends. The original patch also had a concurrency problem since scsih_qcmd is lockless at that point (this is fixed by using atomic bitops to set and test the flag). [mkp: addressed feedback wrt. test_bit and fixed whitespace] Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination) Signed-off-by: James Bottomley Acked-by: Sreekanth Reddy Reviewed-by: Christoph Hellwig Reported-by: Ingo Molnar Tested-by: Ingo Molnar Signed-off-by: Martin K. Petersen Cc: Joe Korty Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/mpt3sas/mpt3sas_base.h | 12 +++++++++ drivers/scsi/mpt3sas/mpt3sas_scsih.c | 40 ++++++++++++++++++---------- 2 files changed, 38 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h index 92648a5ea2d2..63f5965acc89 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -390,6 +390,7 @@ struct MPT3SAS_TARGET { * @eedp_enable: eedp support enable bit * @eedp_type: 0(type_1), 1(type_2), 2(type_3) * @eedp_block_length: block size + * @ata_command_pending: SATL passthrough outstanding for device */ struct MPT3SAS_DEVICE { struct MPT3SAS_TARGET *sas_target; @@ -398,6 +399,17 @@ struct MPT3SAS_DEVICE { u8 configured_lun; u8 block; u8 tlr_snoop_check; + /* + * Bug workaround for SATL handling: the mpt2/3sas firmware + * doesn't return BUSY or TASK_SET_FULL for subsequent + * commands while a SATL pass through is in operation as the + * spec requires, it simply does nothing with them until the + * pass through completes, causing them possibly to timeout if + * the passthrough is a long executing command (like format or + * secure erase). This variable allows us to do the right + * thing while a SATL command is pending. + */ + unsigned long ata_command_pending; }; #define MPT3_CMD_NOT_USED 0x8000 /* free */ diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index f6a8e9958e75..8a5fbdb45cfd 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3707,9 +3707,18 @@ _scsih_temp_threshold_events(struct MPT3SAS_ADAPTER *ioc, } } -static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd) +static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending) { - return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16); + struct MPT3SAS_DEVICE *priv = scmd->device->hostdata; + + if (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16) + return 0; + + if (pending) + return test_and_set_bit(0, &priv->ata_command_pending); + + clear_bit(0, &priv->ata_command_pending); + return 0; } /** @@ -3733,9 +3742,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) if (!scmd) continue; count++; - if (ata_12_16_cmd(scmd)) - scsi_internal_device_unblock(scmd->device, - SDEV_RUNNING); + _scsih_set_satl_pending(scmd, false); mpt3sas_base_free_smid(ioc, smid); scsi_dma_unmap(scmd); if (ioc->pci_error_recovery) @@ -3866,13 +3873,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) if (ioc->logging_level & MPT_DEBUG_SCSI) scsi_print_command(scmd); - /* - * Lock the device for any subsequent command until command is - * done. - */ - if (ata_12_16_cmd(scmd)) - scsi_internal_device_block(scmd->device); - sas_device_priv_data = scmd->device->hostdata; if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { scmd->result = DID_NO_CONNECT << 16; @@ -3886,6 +3886,19 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) return 0; } + /* + * Bug work around for firmware SATL handling. The loop + * is based on atomic operations and ensures consistency + * since we're lockless at this point + */ + do { + if (test_bit(0, &sas_device_priv_data->ata_command_pending)) { + scmd->result = SAM_STAT_BUSY; + scmd->scsi_done(scmd); + return 0; + } + } while (_scsih_set_satl_pending(scmd, true)); + sas_target_priv_data = sas_device_priv_data->sas_target; /* invalid device handle */ @@ -4445,8 +4458,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) if (scmd == NULL) return 1; - if (ata_12_16_cmd(scmd)) - scsi_internal_device_unblock(scmd->device, SDEV_RUNNING); + _scsih_set_satl_pending(scmd, false); mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); From a92f411914cad6532e82e4607bc4075a5ffaa366 Mon Sep 17 00:00:00 2001 From: peter chang Date: Wed, 15 Feb 2017 14:11:54 -0800 Subject: [PATCH 181/521] scsi: sg: check length passed to SG_NEXT_CMD_LEN commit bf33f87dd04c371ea33feb821b60d63d754e3124 upstream. The user can control the size of the next command passed along, but the value passed to the ioctl isn't checked against the usable max command size. Signed-off-by: Peter Chang Acked-by: Douglas Gilbert Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sg.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index dedcff9cabb5..6514636431ab 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1008,6 +1008,8 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg) result = get_user(val, ip); if (result) return result; + if (val > SG_MAX_CDB_SIZE) + return -ENOMEM; sfp->next_cmd_len = (val > 0) ? val : 0; return 0; case SG_GET_VERSION_NUM: From 75a03869c93a443ae068eae9aca0c0df8b33dff5 Mon Sep 17 00:00:00 2001 From: John Garry Date: Thu, 16 Mar 2017 23:07:28 +0800 Subject: [PATCH 182/521] scsi: libsas: fix ata xfer length commit 9702c67c6066f583b629cf037d2056245bb7a8e6 upstream. The total ata xfer length may not be calculated properly, in that we do not use the proper method to get an sg element dma length. According to the code comment, sg_dma_len() should be used after dma_map_sg() is called. This issue was found by turning on the SMMUv3 in front of the hisi_sas controller in hip07. Multiple sg elements were being combined into a single element, but the original first element length was being use as the total xfer length. Fixes: ff2aeb1eb64c8a4770a6 ("libata: convert to chained sg") Signed-off-by: John Garry Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/libsas/sas_ata.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 9c706d8c1441..6f5e2720ffad 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -218,7 +218,7 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc) task->num_scatter = qc->n_elem; } else { for_each_sg(qc->sg, sg, qc->n_elem, si) - xfer += sg->length; + xfer += sg_dma_len(sg); task->total_xfer_len = xfer; task->num_scatter = si; From a90d7447e4a154ad26e3b9e09a0878680be49339 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Fri, 24 Mar 2017 17:07:57 +0100 Subject: [PATCH 183/521] ALSA: seq: Fix race during FIFO resize commit 2d7d54002e396c180db0c800c1046f0a3c471597 upstream. When a new event is queued while processing to resize the FIFO in snd_seq_fifo_clear(), it may lead to a use-after-free, as the old pool that is being queued gets removed. For avoiding this race, we need to close the pool to be deleted and sync its usage before actually deleting it. The issue was spotted by syzkaller. Reported-by: Dmitry Vyukov Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/seq/seq_fifo.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sound/core/seq/seq_fifo.c b/sound/core/seq/seq_fifo.c index 3f4efcb85df5..3490d21ab9e7 100644 --- a/sound/core/seq/seq_fifo.c +++ b/sound/core/seq/seq_fifo.c @@ -265,6 +265,10 @@ int snd_seq_fifo_resize(struct snd_seq_fifo *f, int poolsize) /* NOTE: overflow flag is not cleared */ spin_unlock_irqrestore(&f->lock, flags); + /* close the old pool and wait until all users are gone */ + snd_seq_pool_mark_closing(oldpool); + snd_use_lock_sync(&f->use_lock); + /* release cells in old pool */ for (cell = oldhead; cell; cell = next) { next = cell->next; From ce3dcfdbff04bab023806ef7a342c657ec08915d Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Fri, 31 Mar 2017 10:31:40 +0800 Subject: [PATCH 184/521] ALSA: hda - fix a problem for lineout on a Dell AIO machine commit 2f726aec19a9d2c63bec9a8a53a3910ffdcd09f8 upstream. On this Dell AIO machine, the lineout jack does not work. We found the pin 0x1a is assigned to lineout on this machine, and in the past, we applied ALC298_FIXUP_DELL1_MIC_NO_PRESENCE to fix the heaset-set mic problem for this machine, this fixup will redefine the pin 0x1a to headphone-mic, as a result the lineout doesn't work anymore. After consulting with Dell, they told us this machine doesn't support microphone via headset jack, so we add a new fixup which only defines the pin 0x18 as the headset-mic. [rearranged the fixup insertion position by tiwai in order to make the merge with other branches easier -- tiwai] Fixes: 59ec4b57bcae ("ALSA: hda - Fix headset mic detection problem for two dell machines") Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 1d4f34379f56..46a34039ecdc 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -4831,6 +4831,7 @@ enum { ALC292_FIXUP_DISABLE_AAMIX, ALC293_FIXUP_DISABLE_AAMIX_MULTIJACK, ALC298_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC298_FIXUP_DELL_AIO_MIC_NO_PRESENCE, ALC275_FIXUP_DELL_XPS, ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE, ALC293_FIXUP_LENOVO_SPK_NOISE, @@ -5429,6 +5430,15 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_HEADSET_MODE }, + [ALC298_FIXUP_DELL_AIO_MIC_NO_PRESENCE] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x18, 0x01a1913c }, /* use as headset mic, without its own jack detect */ + { } + }, + .chained = true, + .chain_id = ALC269_FIXUP_HEADSET_MODE + }, [ALC275_FIXUP_DELL_XPS] = { .type = HDA_FIXUP_VERBS, .v.verbs = (const struct hda_verb[]) { @@ -5501,7 +5511,7 @@ static const struct hda_fixup alc269_fixups[] = { .type = HDA_FIXUP_FUNC, .v.func = alc298_fixup_speaker_volume, .chained = true, - .chain_id = ALC298_FIXUP_DELL1_MIC_NO_PRESENCE, + .chain_id = ALC298_FIXUP_DELL_AIO_MIC_NO_PRESENCE, }, [ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER] = { .type = HDA_FIXUP_PINS, From ab48ab614b8c83f3a3b0f83f7882b1d2766962d3 Mon Sep 17 00:00:00 2001 From: Songjun Wu Date: Fri, 24 Feb 2017 15:10:43 +0800 Subject: [PATCH 185/521] ASoC: atmel-classd: fix audio clock rate commit cd3ac9affc43b44f49d7af70d275f0bd426ba643 upstream. Fix the audio clock rate according to the datasheet. Reported-by: Dushara Jayasinghe Signed-off-by: Songjun Wu Acked-by: Nicolas Ferre Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/atmel/atmel-classd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/atmel/atmel-classd.c b/sound/soc/atmel/atmel-classd.c index 8276675730ef..78a985629607 100644 --- a/sound/soc/atmel/atmel-classd.c +++ b/sound/soc/atmel/atmel-classd.c @@ -343,7 +343,7 @@ static int atmel_classd_codec_dai_digital_mute(struct snd_soc_dai *codec_dai, } #define CLASSD_ACLK_RATE_11M2896_MPY_8 (112896 * 100 * 8) -#define CLASSD_ACLK_RATE_12M288_MPY_8 (12228 * 1000 * 8) +#define CLASSD_ACLK_RATE_12M288_MPY_8 (12288 * 1000 * 8) static struct { int rate; From 3342857ac074768e14e361392ac09fbbd70d840e Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Thu, 16 Mar 2017 08:56:28 -0500 Subject: [PATCH 186/521] ACPI: Fix incompatibility with mcount-based function graph tracing commit 61b79e16c68d703dde58c25d3935d67210b7d71b upstream. Paul Menzel reported a warning: WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0 Bad frame pointer: expected f6919d98, received f6919db0 from func acpi_pm_device_sleep_wake return to c43b6f9d The warning means that function graph tracing is broken for the acpi_pm_device_sleep_wake() function. That's because the ACPI Makefile unconditionally sets the '-Os' gcc flag to optimize for size. That's an issue because mcount-based function graph tracing is incompatible with '-Os' on x86, thanks to the following gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109 I have another patch pending which will ensure that mcount-based function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on x86. But this patch is needed in addition to that one because the ACPI Makefile overrides that config option for no apparent reason. It has had this flag since the beginning of git history, and there's no related comment, so I don't know why it's there. As far as I can tell, there's no reason for it to be there. The appropriate behavior is for it to honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the kernel. Reported-by: Paul Menzel Signed-off-by: Josh Poimboeuf Acked-by: Steven Rostedt (VMware) Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/Makefile | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 675eaf337178..b9cebca376f9 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -2,7 +2,6 @@ # Makefile for the Linux ACPI interpreter # -ccflags-y := -Os ccflags-$(CONFIG_ACPI_DEBUG) += -DACPI_DEBUG_OUTPUT # From 566a8711a7dd11960fa0bf3a4fd89c742eb359f3 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 22 Mar 2017 18:33:25 +0100 Subject: [PATCH 187/521] ACPI: Do not create a platform_device for IOAPIC/IOxAPIC commit 08f63d97749185fab942a3a47ed80f5bd89b8b7d upstream. No platform-device is required for IO(x)APICs, so don't even create them. [ rjw: This fixes a problem with leaking platform device objects after IOAPIC/IOxAPIC hot-removal events.] Signed-off-by: Joerg Roedel Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/acpi_platform.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c index 296b7a14893a..5365ff6e69c1 100644 --- a/drivers/acpi/acpi_platform.c +++ b/drivers/acpi/acpi_platform.c @@ -24,9 +24,11 @@ ACPI_MODULE_NAME("platform"); static const struct acpi_device_id forbidden_id_list[] = { - {"PNP0000", 0}, /* PIC */ - {"PNP0100", 0}, /* Timer */ - {"PNP0200", 0}, /* AT DMA Controller */ + {"PNP0000", 0}, /* PIC */ + {"PNP0100", 0}, /* Timer */ + {"PNP0200", 0}, /* AT DMA Controller */ + {"ACPI0009", 0}, /* IOxAPIC */ + {"ACPI000A", 0}, /* IOAPIC */ {"", 0}, }; From 74b8fc017d7689d1a60c9e234b2cfe3550b7f414 Mon Sep 17 00:00:00 2001 From: Richard Genoud Date: Mon, 20 Mar 2017 11:52:41 +0100 Subject: [PATCH 188/521] tty/serial: atmel: fix race condition (TX+DMA) commit 31ca2c63fdc0aee725cbd4f207c1256f5deaabde upstream. If uart_flush_buffer() is called between atmel_tx_dma() and atmel_complete_tx_dma(), the circular buffer has been cleared, but not atmel_port->tx_len. That leads to a circular buffer overflow (dumping (UART_XMIT_SIZE - atmel_port->tx_len) bytes). Tested-by: Nicolas Ferre Signed-off-by: Richard Genoud Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/atmel_serial.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c index a0f911641b04..156a262b6b65 100644 --- a/drivers/tty/serial/atmel_serial.c +++ b/drivers/tty/serial/atmel_serial.c @@ -1987,6 +1987,11 @@ static void atmel_flush_buffer(struct uart_port *port) atmel_uart_writel(port, ATMEL_PDC_TCR, 0); atmel_port->pdc_tx.ofs = 0; } + /* + * in uart_flush_buffer(), the xmit circular buffer has just + * been cleared, so we have to reset tx_len accordingly. + */ + atmel_port->tx_len = 0; } /* From 0a1757cfa5ba3b46f6ee7a74ddb7a5c0bd5d7c2f Mon Sep 17 00:00:00 2001 From: Nicolas Ferre Date: Mon, 20 Mar 2017 16:38:57 +0100 Subject: [PATCH 189/521] tty/serial: atmel: fix TX path in atmel_console_write() commit 497e1e16f45c70574dc9922c7f75c642c2162119 upstream. A side effect of 89d8232411a8 ("tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx") is that the console can be called with TX path disabled. Then the system would hang trying to push charecters out in atmel_console_putchar(). Signed-off-by: Nicolas Ferre Fixes: 89d8232411a8 ("tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx") Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/atmel_serial.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c index 156a262b6b65..a15070a7fcd6 100644 --- a/drivers/tty/serial/atmel_serial.c +++ b/drivers/tty/serial/atmel_serial.c @@ -2504,6 +2504,9 @@ static void atmel_console_write(struct console *co, const char *s, u_int count) pdc_tx = atmel_uart_readl(port, ATMEL_PDC_PTSR) & ATMEL_PDC_TXTEN; atmel_uart_writel(port, ATMEL_PDC_PTCR, ATMEL_PDC_TXTDIS); + /* Make sure that tx path is actually able to send characters */ + atmel_uart_writel(port, ATMEL_US_CR, ATMEL_US_TXEN); + uart_console_write(port, s, count, atmel_console_putchar); /* From eac3ab3e69151c21a0a71ec8711600022cc12fa3 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Fri, 24 Mar 2017 13:38:28 -0400 Subject: [PATCH 190/521] USB: fix linked-list corruption in rh_call_control() commit 1633682053a7ee8058e10c76722b9b28e97fb73f upstream. Using KASAN, Dmitry found a bug in the rh_call_control() routine: If buffer allocation fails, the routine returns immediately without unlinking its URB from the control endpoint, eventually leading to linked-list corruption. This patch fixes the problem by jumping to the end of the routine (where the URB is unlinked) when an allocation failure occurs. Signed-off-by: Alan Stern Reported-and-tested-by: Dmitry Vyukov Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hcd.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index 5724d7c41e29..ca2cbdb3aa67 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -499,8 +499,10 @@ static int rh_call_control (struct usb_hcd *hcd, struct urb *urb) */ tbuf_size = max_t(u16, sizeof(struct usb_hub_descriptor), wLength); tbuf = kzalloc(tbuf_size, GFP_KERNEL); - if (!tbuf) - return -ENOMEM; + if (!tbuf) { + status = -ENOMEM; + goto err_alloc; + } bufp = tbuf; @@ -705,6 +707,7 @@ static int rh_call_control (struct usb_hcd *hcd, struct urb *urb) } kfree(tbuf); + err_alloc: /* any errors get returned through the urb completion */ spin_lock_irq(&hcd_root_hub_lock); From 3eb392056aeb4a0beca5fcead9ad3d6b6ff0816e Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Wed, 15 Mar 2017 16:01:17 +0800 Subject: [PATCH 191/521] KVM: x86: clear bus pointer when destroyed MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit df630b8c1e851b5e265dc2ca9c87222e342c093b upstream. When releasing the bus, let's clear the bus pointers to mark it out. If any further device unregister happens on this bus, we know that we're done if we found the bus being released already. Signed-off-by: Peter Xu Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman --- virt/kvm/kvm_main.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 336ed267c407..1ac5b7be7282 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -654,8 +654,10 @@ static void kvm_destroy_vm(struct kvm *kvm) list_del(&kvm->vm_list); spin_unlock(&kvm_lock); kvm_free_irq_routing(kvm); - for (i = 0; i < KVM_NR_BUSES; i++) + for (i = 0; i < KVM_NR_BUSES; i++) { kvm_io_bus_destroy(kvm->buses[i]); + kvm->buses[i] = NULL; + } kvm_coalesced_mmio_free(kvm); #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm); @@ -3376,6 +3378,14 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, struct kvm_io_bus *new_bus, *bus; bus = kvm->buses[bus_idx]; + + /* + * It's possible the bus being released before hand. If so, + * we're done here. + */ + if (!bus) + return 0; + r = -ENOENT; for (i = 0; i < bus->dev_count; i++) if (bus->range[i].dev == dev) { From ef55c3df5dbd60eb3daab7797feac850bd1e6fe3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Fri, 24 Mar 2017 19:01:09 +0900 Subject: [PATCH 192/521] drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit ce4b4f228e51219b0b79588caf73225b08b5b779 upstream. We were accidentally only overriding the first VRAM placement. For BOs with the RADEON_GEM_NO_CPU_ACCESS flag set, radeon_ttm_placement_from_domain creates a second VRAM placment with fpfn == 0. If VRAM is almost full, the first VRAM placement with fpfn > 0 may not work, but the second one with fpfn == 0 always will (the BO's current location trivially satisfies it). Because "moving" the BO to its current location puts it back on the LRU list, this results in an infinite loop. Fixes: 2a85aedd117c ("drm/radeon: Try evicting from CPU accessible to inaccessible VRAM first") Reported-by: Zachary Michaels Reported-and-Tested-by: Julien Isorce Reviewed-by: Christian König Reviewed-by: Alex Deucher Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/radeon/radeon_ttm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 35310336dd0a..d684e2b79d2b 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -213,8 +213,8 @@ static void radeon_evict_flags(struct ttm_buffer_object *bo, rbo->placement.num_busy_placement = 0; for (i = 0; i < rbo->placement.num_placement; i++) { if (rbo->placements[i].flags & TTM_PL_FLAG_VRAM) { - if (rbo->placements[0].fpfn < fpfn) - rbo->placements[0].fpfn = fpfn; + if (rbo->placements[i].fpfn < fpfn) + rbo->placements[i].fpfn = fpfn; } else { rbo->placement.busy_placement = &rbo->placements[i]; From 47e2fe17d14d1a7da3a405b1f364c094c271d35c Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Fri, 31 Mar 2017 15:11:55 -0700 Subject: [PATCH 193/521] mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() commit c9d398fa237882ea07167e23bcfc5e6847066518 upstream. I found the race condition which triggers the following bug when move_pages() and soft offline are called on a single hugetlb page concurrently. Soft offlining page 0x119400 at 0x700000000000 BUG: unable to handle kernel paging request at ffffea0011943820 IP: follow_huge_pmd+0x143/0x190 PGD 7ffd2067 PUD 7ffd1067 PMD 0 [61163.582052] Oops: 0000 [#1] SMP Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check] CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P OE 4.11.0-rc2-mm1+ #2 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:follow_huge_pmd+0x143/0x190 RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202 RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000 RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80 RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000 R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800 R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000 FS: 00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0 Call Trace: follow_page_mask+0x270/0x550 SYSC_move_pages+0x4ea/0x8f0 SyS_move_pages+0xe/0x10 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fc976e03949 RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949 RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827 RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004 R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650 R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000 Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0 CR2: ffffea0011943820 ---[ end trace e4f81353a2d23232 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This bug is triggered when pmd_present() returns true for non-present hugetlb, so fixing the present check in follow_huge_pmd() prevents it. Using pmd_present() to determine present/non-present for hugetlb is not correct, because pmd_present() checks multiple bits (not only _PAGE_PRESENT) for historical reason and it can misjudge hugetlb state. Fixes: e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()") Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: Hugh Dickins Cc: Michal Hocko Cc: "Kirill A. Shutemov" Cc: Mike Kravetz Cc: Christian Borntraeger Cc: Gerald Schaefer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/hugetlb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ea11123a9249..7294301d8495 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4362,6 +4362,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, { struct page *page = NULL; spinlock_t *ptl; + pte_t pte; retry: ptl = pmd_lockptr(mm, pmd); spin_lock(ptl); @@ -4371,12 +4372,13 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address, */ if (!pmd_huge(*pmd)) goto out; - if (pmd_present(*pmd)) { + pte = huge_ptep_get((pte_t *)pmd); + if (pte_present(pte)) { page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT); if (flags & FOLL_GET) get_page(page); } else { - if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pmd))) { + if (is_hugetlb_entry_migration(pte)) { spin_unlock(ptl); __migration_entry_wait(mm, (pte_t *)pmd, ptl); goto retry; From 6280ac931a23d3fa40cd26057576abcf90a4f22d Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Thu, 19 Jan 2017 12:28:22 +0100 Subject: [PATCH 194/521] MIPS: Lantiq: Fix cascaded IRQ setup commit 6c356eda225e3ee134ed4176b9ae3a76f793f4dd upstream. With the IRQ stack changes integrated, the XRX200 devices started emitting a constant stream of kernel messages like this: [ 565.415310] Spurious IRQ: CAUSE=0x1100c300 This is caused by IP0 getting handled by plat_irq_dispatch() rather than its vectored interrupt handler, which is fixed by commit de856416e714 ("MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch"). Fix plat_irq_dispatch() to handle non-vectored IPI interrupts correctly by setting up IP2-6 as proper chained IRQ handlers and calling do_IRQ for all MIPS CPU interrupts. Signed-off-by: Felix Fietkau Acked-by: John Crispin Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15077/ [james.hogan@imgtec.com: tweaked commit message] Signed-off-by: James Hogan Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- arch/mips/lantiq/irq.c | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/arch/mips/lantiq/irq.c b/arch/mips/lantiq/irq.c index 2e7f60c9fc5d..51cdc46a87e2 100644 --- a/arch/mips/lantiq/irq.c +++ b/arch/mips/lantiq/irq.c @@ -269,6 +269,11 @@ static void ltq_hw5_irqdispatch(void) DEFINE_HWx_IRQDISPATCH(5) #endif +static void ltq_hw_irq_handler(struct irq_desc *desc) +{ + ltq_hw_irqdispatch(irq_desc_get_irq(desc) - 2); +} + #ifdef CONFIG_MIPS_MT_SMP void __init arch_init_ipiirq(int irq, struct irqaction *action) { @@ -313,23 +318,19 @@ static struct irqaction irq_call = { asmlinkage void plat_irq_dispatch(void) { unsigned int pending = read_c0_status() & read_c0_cause() & ST0_IM; - unsigned int i; + int irq; - if ((MIPS_CPU_TIMER_IRQ == 7) && (pending & CAUSEF_IP7)) { - do_IRQ(MIPS_CPU_TIMER_IRQ); - goto out; - } else { - for (i = 0; i < MAX_IM; i++) { - if (pending & (CAUSEF_IP2 << i)) { - ltq_hw_irqdispatch(i); - goto out; - } - } + if (!pending) { + spurious_interrupt(); + return; } - pr_alert("Spurious IRQ: CAUSE=0x%08x\n", read_c0_status()); -out: - return; + pending >>= CAUSEB_IP; + while (pending) { + irq = fls(pending) - 1; + do_IRQ(MIPS_CPU_IRQ_BASE + irq); + pending &= ~BIT(irq); + } } static int icu_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hw) @@ -354,11 +355,6 @@ static const struct irq_domain_ops irq_domain_ops = { .map = icu_map, }; -static struct irqaction cascade = { - .handler = no_action, - .name = "cascade", -}; - int __init icu_of_init(struct device_node *node, struct device_node *parent) { struct device_node *eiu_node; @@ -390,7 +386,7 @@ int __init icu_of_init(struct device_node *node, struct device_node *parent) mips_cpu_irq_init(); for (i = 0; i < MAX_IM; i++) - setup_irq(i + 2, &cascade); + irq_set_chained_handler(i + 2, ltq_hw_irq_handler); if (cpu_has_vint) { pr_info("Setting up vectored interrupts\n"); From b3ed3864912e8809e228ddea259e8e0fa1deadf5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Sat, 2 Jul 2016 17:28:08 +0200 Subject: [PATCH 195/521] rtc: s35390a: fix reading out alarm MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit f87e904ddd8f0ef120e46045b0addeb1cc88354e upstream. There are several issues fixed in this patch: - When alarm isn't enabled, set .enabled to zero instead of returning -EINVAL. - Ignore how IRQ1 is configured when determining if IRQ2 is on. - The three alarm registers have an enable flag which must be evaluated. - The chip always triggers when the seconds register gets 0. Note that the rtc framework however doesn't handle the result correctly because it doesn't check wday being initialized and so interprets an alarm being set for 10:00 AM in three days as 10:00 AM tomorrow (or today if that's not over yet). Signed-off-by: Uwe Kleine-König Signed-off-by: Alexandre Belloni Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-s35390a.c | 40 ++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/drivers/rtc/rtc-s35390a.c b/drivers/rtc/rtc-s35390a.c index f40afdd0e5f5..6507a01cf9ad 100644 --- a/drivers/rtc/rtc-s35390a.c +++ b/drivers/rtc/rtc-s35390a.c @@ -242,6 +242,8 @@ static int s35390a_set_alarm(struct i2c_client *client, struct rtc_wkalrm *alm) if (alm->time.tm_wday != -1) buf[S35390A_ALRM_BYTE_WDAY] = bin2bcd(alm->time.tm_wday) | 0x80; + else + buf[S35390A_ALRM_BYTE_WDAY] = 0; buf[S35390A_ALRM_BYTE_HOURS] = s35390a_hr2reg(s35390a, alm->time.tm_hour) | 0x80; @@ -269,23 +271,43 @@ static int s35390a_read_alarm(struct i2c_client *client, struct rtc_wkalrm *alm) if (err < 0) return err; - if (bitrev8(sts) != S35390A_INT2_MODE_ALARM) - return -EINVAL; + if ((bitrev8(sts) & S35390A_INT2_MODE_MASK) != S35390A_INT2_MODE_ALARM) { + /* + * When the alarm isn't enabled, the register to configure + * the alarm time isn't accessible. + */ + alm->enabled = 0; + return 0; + } else { + alm->enabled = 1; + } err = s35390a_get_reg(s35390a, S35390A_CMD_INT2_REG1, buf, sizeof(buf)); if (err < 0) return err; /* This chip returns the bits of each byte in reverse order */ - for (i = 0; i < 3; ++i) { + for (i = 0; i < 3; ++i) buf[i] = bitrev8(buf[i]); - buf[i] &= ~0x80; - } - alm->time.tm_wday = bcd2bin(buf[S35390A_ALRM_BYTE_WDAY]); - alm->time.tm_hour = s35390a_reg2hr(s35390a, - buf[S35390A_ALRM_BYTE_HOURS]); - alm->time.tm_min = bcd2bin(buf[S35390A_ALRM_BYTE_MINS]); + /* + * B0 of the three matching registers is an enable flag. Iff it is set + * the configured value is used for matching. + */ + if (buf[S35390A_ALRM_BYTE_WDAY] & 0x80) + alm->time.tm_wday = + bcd2bin(buf[S35390A_ALRM_BYTE_WDAY] & ~0x80); + + if (buf[S35390A_ALRM_BYTE_HOURS] & 0x80) + alm->time.tm_hour = + s35390a_reg2hr(s35390a, + buf[S35390A_ALRM_BYTE_HOURS] & ~0x80); + + if (buf[S35390A_ALRM_BYTE_MINS] & 0x80) + alm->time.tm_min = bcd2bin(buf[S35390A_ALRM_BYTE_MINS] & ~0x80); + + /* alarm triggers always at s=0 */ + alm->time.tm_sec = 0; dev_dbg(&client->dev, "%s: alm is mins=%d, hours=%d, wday=%d\n", __func__, alm->time.tm_min, alm->time.tm_hour, From fdd4bc9313e59a1757cfc8ac5836cff55ec03eeb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Mon, 3 Apr 2017 23:32:38 +0200 Subject: [PATCH 196/521] rtc: s35390a: make sure all members in the output are set MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The rtc core calls the .read_alarm with all fields initialized to 0. As the s35390a driver doesn't touch some fields the returned date is interpreted as a date in January 1900. So make sure all fields are set to -1; some of them are then overwritten with the right data depending on the hardware state. In mainline this is done by commit d68778b80dd7 ("rtc: initialize output parameter for read alarm to "uninitialized"") in the core. This is considered to dangerous for stable as it might have side effects for other rtc drivers that might for example rely on alarm->time.tm_sec being initialized to 0. Signed-off-by: Uwe Kleine-König Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-s35390a.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/rtc/rtc-s35390a.c b/drivers/rtc/rtc-s35390a.c index 6507a01cf9ad..47b88bbe4ce7 100644 --- a/drivers/rtc/rtc-s35390a.c +++ b/drivers/rtc/rtc-s35390a.c @@ -267,6 +267,20 @@ static int s35390a_read_alarm(struct i2c_client *client, struct rtc_wkalrm *alm) char buf[3], sts; int i, err; + /* + * initialize all members to -1 to signal the core that they are not + * defined by the hardware. + */ + alm->time.tm_sec = -1; + alm->time.tm_min = -1; + alm->time.tm_hour = -1; + alm->time.tm_mday = -1; + alm->time.tm_mon = -1; + alm->time.tm_year = -1; + alm->time.tm_wday = -1; + alm->time.tm_yday = -1; + alm->time.tm_isdst = -1; + err = s35390a_get_reg(s35390a, S35390A_CMD_STATUS2, &sts, sizeof(sts)); if (err < 0) return err; From a55ae9d1937b0bf4004e5416cfa15750cd6d2b22 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Sat, 2 Jul 2016 17:28:09 +0200 Subject: [PATCH 197/521] rtc: s35390a: implement reset routine as suggested by the reference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 8e6583f1b5d1f5f129b873f1428b7e414263d847 upstream. There were two deviations from the reference manual: you have to wait half a second when POC is active and you might have to repeat initialization when POC or BLD are still set after the sequence. Note however that as POC and BLD are cleared by read the driver might not be able to detect that a reset is necessary. I don't have a good idea how to fix this. Additionally report the value read from STATUS1 to the caller. This prepares the next patch. Signed-off-by: Uwe Kleine-König Signed-off-by: Alexandre Belloni Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-s35390a.c | 65 +++++++++++++++++++++++++++++++++------ 1 file changed, 55 insertions(+), 10 deletions(-) diff --git a/drivers/rtc/rtc-s35390a.c b/drivers/rtc/rtc-s35390a.c index 47b88bbe4ce7..c7c1fce69635 100644 --- a/drivers/rtc/rtc-s35390a.c +++ b/drivers/rtc/rtc-s35390a.c @@ -15,6 +15,7 @@ #include #include #include +#include #define S35390A_CMD_STATUS1 0 #define S35390A_CMD_STATUS2 1 @@ -94,19 +95,63 @@ static int s35390a_get_reg(struct s35390a *s35390a, int reg, char *buf, int len) return 0; } -static int s35390a_reset(struct s35390a *s35390a) +/* + * Returns <0 on error, 0 if rtc is setup fine and 1 if the chip was reset. + * To keep the information if an irq is pending, pass the value read from + * STATUS1 to the caller. + */ +static int s35390a_reset(struct s35390a *s35390a, char *status1) { - char buf[1]; + char buf; + int ret; + unsigned initcount = 0; - if (s35390a_get_reg(s35390a, S35390A_CMD_STATUS1, buf, sizeof(buf)) < 0) - return -EIO; + ret = s35390a_get_reg(s35390a, S35390A_CMD_STATUS1, status1, 1); + if (ret < 0) + return ret; - if (!(buf[0] & (S35390A_FLAG_POC | S35390A_FLAG_BLD))) + if (*status1 & S35390A_FLAG_POC) + /* + * Do not communicate for 0.5 seconds since the power-on + * detection circuit is in operation. + */ + msleep(500); + else if (!(*status1 & S35390A_FLAG_BLD)) + /* + * If both POC and BLD are unset everything is fine. + */ return 0; - buf[0] |= (S35390A_FLAG_RESET | S35390A_FLAG_24H); - buf[0] &= 0xf0; - return s35390a_set_reg(s35390a, S35390A_CMD_STATUS1, buf, sizeof(buf)); + /* + * At least one of POC and BLD are set, so reinitialise chip. Keeping + * this information in the hardware to know later that the time isn't + * valid is unfortunately not possible because POC and BLD are cleared + * on read. So the reset is best done now. + * + * The 24H bit is kept over reset, so set it already here. + */ +initialize: + *status1 = S35390A_FLAG_24H; + buf = S35390A_FLAG_RESET | S35390A_FLAG_24H; + ret = s35390a_set_reg(s35390a, S35390A_CMD_STATUS1, &buf, 1); + + if (ret < 0) + return ret; + + ret = s35390a_get_reg(s35390a, S35390A_CMD_STATUS1, &buf, 1); + if (ret < 0) + return ret; + + if (buf & (S35390A_FLAG_POC | S35390A_FLAG_BLD)) { + /* Try up to five times to reset the chip */ + if (initcount < 5) { + ++initcount; + goto initialize; + } else + return -EIO; + } + + return 1; } static int s35390a_disable_test_mode(struct s35390a *s35390a) @@ -367,7 +412,7 @@ static int s35390a_probe(struct i2c_client *client, unsigned int i; struct s35390a *s35390a; struct rtc_time tm; - char buf[1]; + char buf[1], status1; if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) { err = -ENODEV; @@ -396,7 +441,7 @@ static int s35390a_probe(struct i2c_client *client, } } - err = s35390a_reset(s35390a); + err = s35390a_reset(s35390a, &status1); if (err < 0) { dev_err(&client->dev, "error resetting chip\n"); goto exit_dummy; From 3a1246b46df5210164ee43d4c5c560d0dc9ed2ce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Sat, 2 Jul 2016 17:28:10 +0200 Subject: [PATCH 198/521] rtc: s35390a: improve irq handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 3bd32722c827d00eafe8e6d5b83e9f3148ea7c7e upstream. On some QNAP NAS devices the rtc can wake the machine. Several people noticed that once the machine was woken this way it fails to shut down. That's because the driver fails to acknowledge the interrupt and so it keeps active and restarts the machine immediatly after shutdown. See https://bugs.debian.org/794266 for a bug report. Doing this correctly requires to interpret the INT2 flag of the first read of the STATUS1 register because this bit is cleared by read. Note this is not maximally robust though because a pending irq isn't detected when the STATUS1 register was already read (and so INT2 is not set) but the irq was not disabled. But that is a hardware imposed problem that cannot easily be fixed by software. Signed-off-by: Uwe Kleine-König Signed-off-by: Alexandre Belloni Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-s35390a.c | 48 +++++++++++++++++++++++++-------------- 1 file changed, 31 insertions(+), 17 deletions(-) diff --git a/drivers/rtc/rtc-s35390a.c b/drivers/rtc/rtc-s35390a.c index c7c1fce69635..00662dd28d66 100644 --- a/drivers/rtc/rtc-s35390a.c +++ b/drivers/rtc/rtc-s35390a.c @@ -35,10 +35,14 @@ #define S35390A_ALRM_BYTE_HOURS 1 #define S35390A_ALRM_BYTE_MINS 2 +/* flags for STATUS1 */ #define S35390A_FLAG_POC 0x01 #define S35390A_FLAG_BLD 0x02 +#define S35390A_FLAG_INT2 0x04 #define S35390A_FLAG_24H 0x40 #define S35390A_FLAG_RESET 0x80 + +/* flag for STATUS2 */ #define S35390A_FLAG_TEST 0x01 #define S35390A_INT2_MODE_MASK 0xF0 @@ -408,11 +412,11 @@ static struct i2c_driver s35390a_driver; static int s35390a_probe(struct i2c_client *client, const struct i2c_device_id *id) { - int err; + int err, err_reset; unsigned int i; struct s35390a *s35390a; struct rtc_time tm; - char buf[1], status1; + char buf, status1; if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) { err = -ENODEV; @@ -441,29 +445,35 @@ static int s35390a_probe(struct i2c_client *client, } } - err = s35390a_reset(s35390a, &status1); - if (err < 0) { + err_reset = s35390a_reset(s35390a, &status1); + if (err_reset < 0) { + err = err_reset; dev_err(&client->dev, "error resetting chip\n"); goto exit_dummy; } - err = s35390a_disable_test_mode(s35390a); - if (err < 0) { - dev_err(&client->dev, "error disabling test mode\n"); - goto exit_dummy; - } - - err = s35390a_get_reg(s35390a, S35390A_CMD_STATUS1, buf, sizeof(buf)); - if (err < 0) { - dev_err(&client->dev, "error checking 12/24 hour mode\n"); - goto exit_dummy; - } - if (buf[0] & S35390A_FLAG_24H) + if (status1 & S35390A_FLAG_24H) s35390a->twentyfourhour = 1; else s35390a->twentyfourhour = 0; - if (s35390a_get_datetime(client, &tm) < 0) + if (status1 & S35390A_FLAG_INT2) { + /* disable alarm (and maybe test mode) */ + buf = 0; + err = s35390a_set_reg(s35390a, S35390A_CMD_STATUS2, &buf, 1); + if (err < 0) { + dev_err(&client->dev, "error disabling alarm"); + goto exit_dummy; + } + } else { + err = s35390a_disable_test_mode(s35390a); + if (err < 0) { + dev_err(&client->dev, "error disabling test mode\n"); + goto exit_dummy; + } + } + + if (err_reset > 0 || s35390a_get_datetime(client, &tm) < 0) dev_warn(&client->dev, "clock needs to be set\n"); device_set_wakeup_capable(&client->dev, 1); @@ -476,6 +486,10 @@ static int s35390a_probe(struct i2c_client *client, err = PTR_ERR(s35390a->rtc); goto exit_dummy; } + + if (status1 & S35390A_FLAG_INT2) + rtc_update_irq(s35390a->rtc, 1, RTC_AF); + return 0; exit_dummy: From 42462d23e60b89a3c2f7d8d63f5f4e464ba77727 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 23 Mar 2017 18:24:19 +0100 Subject: [PATCH 199/521] KVM: kvm_io_bus_unregister_dev() should never fail commit 90db10434b163e46da413d34db8d0e77404cc645 upstream. No caller currently checks the return value of kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on freeing their device. A stale reference will remain in the io_bus, getting at least used again, when the iobus gets teared down on kvm_destroy_vm() - leading to use after free errors. There is nothing the callers could do, except retrying over and over again. So let's simply remove the bus altogether, print an error and make sure no one can access this broken bus again (returning -ENOMEM on any attempt to access it). Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU") Reported-by: Dmitry Vyukov Reviewed-by: Cornelia Huck Signed-off-by: David Hildenbrand Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- include/linux/kvm_host.h | 4 ++-- virt/kvm/eventfd.c | 3 ++- virt/kvm/kvm_main.c | 40 +++++++++++++++++++++++----------------- 3 files changed, 27 insertions(+), 20 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c923350ca20a..d7ce4e3280db 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -182,8 +182,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr, int len, void *val); int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int len, struct kvm_io_device *dev); -int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, - struct kvm_io_device *dev); +void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, + struct kvm_io_device *dev); #ifdef CONFIG_KVM_ASYNC_PF struct kvm_async_pf { diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 46dbc0a7dfc1..49001fa84ead 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -868,7 +868,8 @@ kvm_deassign_ioeventfd_idx(struct kvm *kvm, enum kvm_bus bus_idx, continue; kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev); - kvm->buses[bus_idx]->ioeventfd_count--; + if (kvm->buses[bus_idx]) + kvm->buses[bus_idx]->ioeventfd_count--; ioeventfd_release(p); ret = 0; break; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1ac5b7be7282..cb092bd9965b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -655,7 +655,8 @@ static void kvm_destroy_vm(struct kvm *kvm) spin_unlock(&kvm_lock); kvm_free_irq_routing(kvm); for (i = 0; i < KVM_NR_BUSES; i++) { - kvm_io_bus_destroy(kvm->buses[i]); + if (kvm->buses[i]) + kvm_io_bus_destroy(kvm->buses[i]); kvm->buses[i] = NULL; } kvm_coalesced_mmio_free(kvm); @@ -3273,6 +3274,8 @@ int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr, }; bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu); + if (!bus) + return -ENOMEM; r = __kvm_io_bus_write(vcpu, bus, &range, val); return r < 0 ? r : 0; } @@ -3290,6 +3293,8 @@ int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, }; bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu); + if (!bus) + return -ENOMEM; /* First try the device referenced by cookie. */ if ((cookie >= 0) && (cookie < bus->dev_count) && @@ -3340,6 +3345,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr, }; bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu); + if (!bus) + return -ENOMEM; r = __kvm_io_bus_read(vcpu, bus, &range, val); return r < 0 ? r : 0; } @@ -3352,6 +3359,9 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, struct kvm_io_bus *new_bus, *bus; bus = kvm->buses[bus_idx]; + if (!bus) + return -ENOMEM; + /* exclude ioeventfd which is limited by maximum fd */ if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1) return -ENOSPC; @@ -3371,45 +3381,41 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, } /* Caller must hold slots_lock. */ -int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, - struct kvm_io_device *dev) +void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, + struct kvm_io_device *dev) { - int i, r; + int i; struct kvm_io_bus *new_bus, *bus; bus = kvm->buses[bus_idx]; - - /* - * It's possible the bus being released before hand. If so, - * we're done here. - */ if (!bus) - return 0; + return; - r = -ENOENT; for (i = 0; i < bus->dev_count; i++) if (bus->range[i].dev == dev) { - r = 0; break; } - if (r) - return r; + if (i == bus->dev_count) + return; new_bus = kmalloc(sizeof(*bus) + ((bus->dev_count - 1) * sizeof(struct kvm_io_range)), GFP_KERNEL); - if (!new_bus) - return -ENOMEM; + if (!new_bus) { + pr_err("kvm: failed to shrink bus, removing it completely\n"); + goto broken; + } memcpy(new_bus, bus, sizeof(*bus) + i * sizeof(struct kvm_io_range)); new_bus->dev_count--; memcpy(new_bus->range + i, bus->range + i + 1, (new_bus->dev_count - i) * sizeof(struct kvm_io_range)); +broken: rcu_assign_pointer(kvm->buses[bus_idx], new_bus); synchronize_srcu_expedited(&kvm->srcu); kfree(bus); - return r; + return; } static struct notifier_block kvm_cpu_notifier = { From 063d30f187f5c492aa4a6cca88b8afa08f5a170c Mon Sep 17 00:00:00 2001 From: Alexandre Belloni Date: Tue, 25 Oct 2016 11:37:59 +0200 Subject: [PATCH 200/521] power: reset: at91-poweroff: timely shutdown LPDDR memories commit 0b0408745e7ff24757cbfd571d69026c0ddb803c upstream. LPDDR memories can only handle up to 400 uncontrolled power off. Ensure the proper power off sequence is used before shutting down the platform. Signed-off-by: Alexandre Belloni Signed-off-by: Sebastian Reichel Signed-off-by: Greg Kroah-Hartman --- drivers/power/reset/at91-poweroff.c | 54 ++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/drivers/power/reset/at91-poweroff.c b/drivers/power/reset/at91-poweroff.c index e9e24df35f26..2579f025b90b 100644 --- a/drivers/power/reset/at91-poweroff.c +++ b/drivers/power/reset/at91-poweroff.c @@ -14,9 +14,12 @@ #include #include #include +#include #include #include +#include + #define AT91_SHDW_CR 0x00 /* Shut Down Control Register */ #define AT91_SHDW_SHDW BIT(0) /* Shut Down command */ #define AT91_SHDW_KEY (0xa5 << 24) /* KEY Password */ @@ -50,6 +53,7 @@ static const char *shdwc_wakeup_modes[] = { static void __iomem *at91_shdwc_base; static struct clk *sclk; +static void __iomem *mpddrc_base; static void __init at91_wakeup_status(void) { @@ -73,6 +77,29 @@ static void at91_poweroff(void) writel(AT91_SHDW_KEY | AT91_SHDW_SHDW, at91_shdwc_base + AT91_SHDW_CR); } +static void at91_lpddr_poweroff(void) +{ + asm volatile( + /* Align to cache lines */ + ".balign 32\n\t" + + /* Ensure AT91_SHDW_CR is in the TLB by reading it */ + " ldr r6, [%2, #" __stringify(AT91_SHDW_CR) "]\n\t" + + /* Power down SDRAM0 */ + " str %1, [%0, #" __stringify(AT91_DDRSDRC_LPR) "]\n\t" + /* Shutdown CPU */ + " str %3, [%2, #" __stringify(AT91_SHDW_CR) "]\n\t" + + " b .\n\t" + : + : "r" (mpddrc_base), + "r" cpu_to_le32(AT91_DDRSDRC_LPDDR2_PWOFF), + "r" (at91_shdwc_base), + "r" cpu_to_le32(AT91_SHDW_KEY | AT91_SHDW_SHDW) + : "r0"); +} + static int at91_poweroff_get_wakeup_mode(struct device_node *np) { const char *pm; @@ -124,6 +151,8 @@ static void at91_poweroff_dt_set_wakeup_mode(struct platform_device *pdev) static int __init at91_poweroff_probe(struct platform_device *pdev) { struct resource *res; + struct device_node *np; + u32 ddr_type; int ret; res = platform_get_resource(pdev, IORESOURCE_MEM, 0); @@ -150,12 +179,30 @@ static int __init at91_poweroff_probe(struct platform_device *pdev) pm_power_off = at91_poweroff; + np = of_find_compatible_node(NULL, NULL, "atmel,sama5d3-ddramc"); + if (!np) + return 0; + + mpddrc_base = of_iomap(np, 0); + of_node_put(np); + + if (!mpddrc_base) + return 0; + + ddr_type = readl(mpddrc_base + AT91_DDRSDRC_MDR) & AT91_DDRSDRC_MD; + if ((ddr_type == AT91_DDRSDRC_MD_LPDDR2) || + (ddr_type == AT91_DDRSDRC_MD_LPDDR3)) + pm_power_off = at91_lpddr_poweroff; + else + iounmap(mpddrc_base); + return 0; } static int __exit at91_poweroff_remove(struct platform_device *pdev) { - if (pm_power_off == at91_poweroff) + if (pm_power_off == at91_poweroff || + pm_power_off == at91_lpddr_poweroff) pm_power_off = NULL; clk_disable_unprepare(sclk); @@ -163,6 +210,11 @@ static int __exit at91_poweroff_remove(struct platform_device *pdev) return 0; } +static const struct of_device_id at91_ramc_of_match[] = { + { .compatible = "atmel,sama5d3-ddramc", }, + { /* sentinel */ } +}; + static const struct of_device_id at91_poweroff_of_match[] = { { .compatible = "atmel,at91sam9260-shdwc", }, { .compatible = "atmel,at91sam9rl-shdwc", }, From 2cbd78f4239bd28b86c6ff8e3b7867db72762f1a Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 8 Mar 2017 07:38:05 +1100 Subject: [PATCH 201/521] blk: improve order of bio handling in generic_make_request() commit 79bd99596b7305ab08109a8bf44a6a4511dbf1cd upstream. To avoid recursion on the kernel stack when stacked block devices are in use, generic_make_request() will, when called recursively, queue new requests for later handling. They will be handled when the make_request_fn for the current bio completes. If any bios are submitted by a make_request_fn, these will ultimately be handled seqeuntially. If the handling of one of those generates further requests, they will be added to the end of the queue. This strict first-in-first-out behaviour can lead to deadlocks in various ways, normally because a request might need to wait for a previous request to the same device to complete. This can happen when they share a mempool, and can happen due to interdependencies particular to the device. Both md and dm have examples where this happens. These deadlocks can be erradicated by more selective ordering of bios. Specifically by handling them in depth-first order. That is: when the handling of one bio generates one or more further bios, they are handled immediately after the parent, before any siblings of the parent. That way, when generic_make_request() calls make_request_fn for some particular device, we can be certain that all previously submited requests for that device have been completely handled and are not waiting for anything in the queue of requests maintained in generic_make_request(). An easy way to achieve this would be to use a last-in-first-out stack instead of a queue. However this will change the order of consecutive bios submitted by a make_request_fn, which could have unexpected consequences. Instead we take a slightly more complex approach. A fresh queue is created for each call to a make_request_fn. After it completes, any bios for a different device are placed on the front of the main queue, followed by any bios for the same device, followed by all bios that were already on the queue before the make_request_fn was called. This provides the depth-first approach without reordering bios on the same level. This, by itself, it not enough to remove all deadlocks. It just makes it possible for drivers to take the extra step required themselves. To avoid deadlocks, drivers must never risk waiting for a request after submitting one to generic_make_request. This includes never allocing from a mempool twice in the one call to a make_request_fn. A common pattern in drivers is to call bio_split() in a loop, handling the first part and then looping around to possibly split the next part. Instead, a driver that finds it needs to split a bio should queue (with generic_make_request) the second part, handle the first part, and then return. The new code in generic_make_request will ensure the requests to underlying bios are processed first, then the second bio that was split off. If it splits again, the same process happens. In each case one bio will be completely handled before the next one is attempted. With this is place, it should be possible to disable the punt_bios_to_recover() recovery thread for many block devices, and eventually it may be possible to remove it completely. Ref: http://www.spinics.net/lists/raid/msg54680.html Tested-by: Jinpu Wang Inspired-by: Lars Ellenberg Signed-off-by: NeilBrown Signed-off-by: Jens Axboe [jwang: backport to 4.4] Signed-off-by: Jack Wang Signed-off-by: Greg Kroah-Hartman --- block/blk-core.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 4fab5d610805..7a58a22d1595 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2063,18 +2063,33 @@ blk_qc_t generic_make_request(struct bio *bio) struct request_queue *q = bdev_get_queue(bio->bi_bdev); if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) { + struct bio_list lower, same, hold; + + /* Create a fresh bio_list for all subordinate requests */ + hold = bio_list_on_stack; + bio_list_init(&bio_list_on_stack); ret = q->make_request_fn(q, bio); blk_queue_exit(q); - - bio = bio_list_pop(current->bio_list); + /* sort new bios into those for a lower level + * and those for the same level + */ + bio_list_init(&lower); + bio_list_init(&same); + while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL) + if (q == bdev_get_queue(bio->bi_bdev)) + bio_list_add(&same, bio); + else + bio_list_add(&lower, bio); + /* now assemble so we handle the lowest level first */ + bio_list_merge(&bio_list_on_stack, &lower); + bio_list_merge(&bio_list_on_stack, &same); + bio_list_merge(&bio_list_on_stack, &hold); } else { - struct bio *bio_next = bio_list_pop(current->bio_list); - bio_io_error(bio); - bio = bio_next; } + bio = bio_list_pop(current->bio_list); } while (bio); current->bio_list = NULL; /* deactivate */ From 5cca175b6cda16b68b18967210872327b1cadf4f Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 10 Mar 2017 17:00:47 +1100 Subject: [PATCH 202/521] blk: Ensure users for current->bio_list can see the full list. commit f5fe1b51905df7cfe4fdfd85c5fb7bc5b71a094f upstream. Commit 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()") changed current->bio_list so that it did not contain *all* of the queued bios, but only those submitted by the currently running make_request_fn. There are two places which walk the list and requeue selected bios, and others that check if the list is empty. These are no longer correct. So redefine current->bio_list to point to an array of two lists, which contain all queued bios, and adjust various code to test or walk both lists. Signed-off-by: NeilBrown Fixes: 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()") Signed-off-by: Jens Axboe [jwang: backport to 4.4] Signed-off-by: Jack Wang Signed-off-by: Greg Kroah-Hartman [bwh: Restore changes in device-mapper from upstream version] Signed-off-by: Ben Hutchings --- block/bio.c | 12 +++++++++--- block/blk-core.c | 31 +++++++++++++++++++------------ drivers/md/dm.c | 27 +++++++++++++++------------ drivers/md/raid1.c | 3 ++- drivers/md/raid10.c | 3 ++- 5 files changed, 47 insertions(+), 29 deletions(-) diff --git a/block/bio.c b/block/bio.c index 46e2cc1d4016..14263fab94d3 100644 --- a/block/bio.c +++ b/block/bio.c @@ -373,10 +373,14 @@ static void punt_bios_to_rescuer(struct bio_set *bs) bio_list_init(&punt); bio_list_init(&nopunt); - while ((bio = bio_list_pop(current->bio_list))) + while ((bio = bio_list_pop(¤t->bio_list[0]))) bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio); + current->bio_list[0] = nopunt; - *current->bio_list = nopunt; + bio_list_init(&nopunt); + while ((bio = bio_list_pop(¤t->bio_list[1]))) + bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio); + current->bio_list[1] = nopunt; spin_lock(&bs->rescue_lock); bio_list_merge(&bs->rescue_list, &punt); @@ -464,7 +468,9 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) * we retry with the original gfp_flags. */ - if (current->bio_list && !bio_list_empty(current->bio_list)) + if (current->bio_list && + (!bio_list_empty(¤t->bio_list[0]) || + !bio_list_empty(¤t->bio_list[1]))) gfp_mask &= ~__GFP_DIRECT_RECLAIM; p = mempool_alloc(bs->bio_pool, gfp_mask); diff --git a/block/blk-core.c b/block/blk-core.c index 7a58a22d1595..ef083e7a37c5 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2021,7 +2021,14 @@ generic_make_request_checks(struct bio *bio) */ blk_qc_t generic_make_request(struct bio *bio) { - struct bio_list bio_list_on_stack; + /* + * bio_list_on_stack[0] contains bios submitted by the current + * make_request_fn. + * bio_list_on_stack[1] contains bios that were submitted before + * the current make_request_fn, but that haven't been processed + * yet. + */ + struct bio_list bio_list_on_stack[2]; blk_qc_t ret = BLK_QC_T_NONE; if (!generic_make_request_checks(bio)) @@ -2038,7 +2045,7 @@ blk_qc_t generic_make_request(struct bio *bio) * should be added at the tail */ if (current->bio_list) { - bio_list_add(current->bio_list, bio); + bio_list_add(¤t->bio_list[0], bio); goto out; } @@ -2057,17 +2064,17 @@ blk_qc_t generic_make_request(struct bio *bio) * bio_list, and call into ->make_request() again. */ BUG_ON(bio->bi_next); - bio_list_init(&bio_list_on_stack); - current->bio_list = &bio_list_on_stack; + bio_list_init(&bio_list_on_stack[0]); + current->bio_list = bio_list_on_stack; do { struct request_queue *q = bdev_get_queue(bio->bi_bdev); if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) { - struct bio_list lower, same, hold; + struct bio_list lower, same; /* Create a fresh bio_list for all subordinate requests */ - hold = bio_list_on_stack; - bio_list_init(&bio_list_on_stack); + bio_list_on_stack[1] = bio_list_on_stack[0]; + bio_list_init(&bio_list_on_stack[0]); ret = q->make_request_fn(q, bio); @@ -2077,19 +2084,19 @@ blk_qc_t generic_make_request(struct bio *bio) */ bio_list_init(&lower); bio_list_init(&same); - while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL) + while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL) if (q == bdev_get_queue(bio->bi_bdev)) bio_list_add(&same, bio); else bio_list_add(&lower, bio); /* now assemble so we handle the lowest level first */ - bio_list_merge(&bio_list_on_stack, &lower); - bio_list_merge(&bio_list_on_stack, &same); - bio_list_merge(&bio_list_on_stack, &hold); + bio_list_merge(&bio_list_on_stack[0], &lower); + bio_list_merge(&bio_list_on_stack[0], &same); + bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]); } else { bio_io_error(bio); } - bio = bio_list_pop(current->bio_list); + bio = bio_list_pop(&bio_list_on_stack[0]); } while (bio); current->bio_list = NULL; /* deactivate */ diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 397f0454100b..320eb3c4bb6b 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1481,26 +1481,29 @@ static void flush_current_bio_list(struct blk_plug_cb *cb, bool from_schedule) struct dm_offload *o = container_of(cb, struct dm_offload, cb); struct bio_list list; struct bio *bio; + int i; INIT_LIST_HEAD(&o->cb.list); if (unlikely(!current->bio_list)) return; - list = *current->bio_list; - bio_list_init(current->bio_list); + for (i = 0; i < 2; i++) { + list = current->bio_list[i]; + bio_list_init(¤t->bio_list[i]); - while ((bio = bio_list_pop(&list))) { - struct bio_set *bs = bio->bi_pool; - if (unlikely(!bs) || bs == fs_bio_set) { - bio_list_add(current->bio_list, bio); - continue; + while ((bio = bio_list_pop(&list))) { + struct bio_set *bs = bio->bi_pool; + if (unlikely(!bs) || bs == fs_bio_set) { + bio_list_add(¤t->bio_list[i], bio); + continue; + } + + spin_lock(&bs->rescue_lock); + bio_list_add(&bs->rescue_list, bio); + queue_work(bs->rescue_workqueue, &bs->rescue_work); + spin_unlock(&bs->rescue_lock); } - - spin_lock(&bs->rescue_lock); - bio_list_add(&bs->rescue_list, bio); - queue_work(bs->rescue_workqueue, &bs->rescue_work); - spin_unlock(&bs->rescue_lock); } } diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 515554c7365b..9be39988bf06 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -877,7 +877,8 @@ static sector_t wait_barrier(struct r1conf *conf, struct bio *bio) ((conf->start_next_window < conf->next_resync + RESYNC_SECTORS) && current->bio_list && - !bio_list_empty(current->bio_list))), + (!bio_list_empty(¤t->bio_list[0]) || + !bio_list_empty(¤t->bio_list[1])))), conf->resync_lock); conf->nr_waiting--; } diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index a92979e704e3..e5ee4e9e0ea5 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -946,7 +946,8 @@ static void wait_barrier(struct r10conf *conf) !conf->barrier || (conf->nr_pending && current->bio_list && - !bio_list_empty(current->bio_list)), + (!bio_list_empty(¤t->bio_list[0]) || + !bio_list_empty(¤t->bio_list[1]))), conf->resync_lock); conf->nr_waiting--; } From 84bd21a708b83a24d26cd0010ea94106c96557de Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 23 Mar 2017 12:24:43 +0100 Subject: [PATCH 203/521] padata: avoid race in reordering commit de5540d088fe97ad583cc7d396586437b32149a5 upstream. Under extremely heavy uses of padata, crashes occur, and with list debugging turned on, this happens instead: [87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33 __list_add+0xae/0x130 [87487.301868] list_add corruption. prev->next should be next (ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00). [87487.339011] [] dump_stack+0x68/0xa3 [87487.342198] [] ? console_unlock+0x281/0x6d0 [87487.345364] [] __warn+0xff/0x140 [87487.348513] [] warn_slowpath_fmt+0x4a/0x50 [87487.351659] [] __list_add+0xae/0x130 [87487.354772] [] ? _raw_spin_lock+0x64/0x70 [87487.357915] [] padata_reorder+0x1e6/0x420 [87487.361084] [] padata_do_serial+0xa5/0x120 padata_reorder calls list_add_tail with the list to which its adding locked, which seems correct: spin_lock(&squeue->serial.lock); list_add_tail(&padata->list, &squeue->serial.list); spin_unlock(&squeue->serial.lock); This therefore leaves only place where such inconsistency could occur: if padata->list is added at the same time on two different threads. This pdata pointer comes from the function call to padata_get_next(pd), which has in it the following block: next_queue = per_cpu_ptr(pd->pqueue, cpu); padata = NULL; reorder = &next_queue->reorder; if (!list_empty(&reorder->list)) { padata = list_entry(reorder->list.next, struct padata_priv, list); spin_lock(&reorder->lock); list_del_init(&padata->list); atomic_dec(&pd->reorder_objects); spin_unlock(&reorder->lock); pd->processed++; goto out; } out: return padata; I strongly suspect that the problem here is that two threads can race on reorder list. Even though the deletion is locked, call to list_entry is not locked, which means it's feasible that two threads pick up the same padata object and subsequently call list_add_tail on them at the same time. The fix is thus be hoist that lock outside of that block. Signed-off-by: Jason A. Donenfeld Acked-by: Steffen Klassert Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- kernel/padata.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/padata.c b/kernel/padata.c index b38bea9c466a..401227e3967c 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -189,19 +189,20 @@ static struct padata_priv *padata_get_next(struct parallel_data *pd) reorder = &next_queue->reorder; + spin_lock(&reorder->lock); if (!list_empty(&reorder->list)) { padata = list_entry(reorder->list.next, struct padata_priv, list); - spin_lock(&reorder->lock); list_del_init(&padata->list); atomic_dec(&pd->reorder_objects); - spin_unlock(&reorder->lock); pd->processed++; + spin_unlock(&reorder->lock); goto out; } + spin_unlock(&reorder->lock); if (__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index) { padata = ERR_PTR(-ENODATA); From 8f8ee9706b0a64a3506b9d9789ace7c44f3d817d Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sat, 8 Apr 2017 09:53:53 +0200 Subject: [PATCH 204/521] Linux 4.4.60 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 083724c6ca4d..fb7c2b40753d 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 59 +SUBLEVEL = 60 EXTRAVERSION = NAME = Blurry Fish Butt From 48e38a939ec533ccc28849785c9c729e0e80fbfe Mon Sep 17 00:00:00 2001 From: Pratyush Anand Date: Mon, 14 Nov 2016 19:32:42 +0530 Subject: [PATCH 205/521] BACKPORT: hw_breakpoint: Allow watchpoint of length 3,5,6 and 7 (cherry picked from commit 651be3cb085341a21847e47c694c249c3e1e4e5b) We only support breakpoint/watchpoint of length 1, 2, 4 and 8. If we can support other length as well, then user may watch more data with less number of watchpoints (provided hardware supports it). For example: if we have to watch only 4th, 5th and 6th byte from a 64 bit aligned address, we will have to use two slots to implement it currently. One slot will watch a half word at offset 4 and other a byte at offset 6. If we can have a watchpoint of length 3 then we can watch it with single slot as well. ARM64 hardware does support such functionality, therefore adding these new definitions in generic layer. Signed-off-by: Pratyush Anand Signed-off-by: Will Deacon Signed-off-by: Pavel Labath [pavel: tools/include/uapi/linux/hw_breakpoint.h is not present in this branch] Change-Id: Ie17ed89ca526e4fddf591bb4e556fdfb55fc2eac Bug: 30919905 --- include/uapi/linux/hw_breakpoint.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/uapi/linux/hw_breakpoint.h b/include/uapi/linux/hw_breakpoint.h index b04000a2296a..2b65efd19a46 100644 --- a/include/uapi/linux/hw_breakpoint.h +++ b/include/uapi/linux/hw_breakpoint.h @@ -4,7 +4,11 @@ enum { HW_BREAKPOINT_LEN_1 = 1, HW_BREAKPOINT_LEN_2 = 2, + HW_BREAKPOINT_LEN_3 = 3, HW_BREAKPOINT_LEN_4 = 4, + HW_BREAKPOINT_LEN_5 = 5, + HW_BREAKPOINT_LEN_6 = 6, + HW_BREAKPOINT_LEN_7 = 7, HW_BREAKPOINT_LEN_8 = 8, }; From 29dbff68c9df0610f77968f2511c1928e0c80f07 Mon Sep 17 00:00:00 2001 From: Pratyush Anand Date: Mon, 14 Nov 2016 19:32:43 +0530 Subject: [PATCH 206/521] UPSTREAM: arm64: Allow hw watchpoint at varied offset from base address ARM64 hardware supports watchpoint at any double word aligned address. However, it can select any consecutive bytes from offset 0 to 7 from that base address. For example, if base address is programmed as 0x420030 and byte select is 0x1C, then access of 0x420032,0x420033 and 0x420034 will generate a watchpoint exception. Currently, we do not have such modularity. We can only program byte, halfword, word and double word access exception from any base address. This patch adds support to overcome above limitations. Signed-off-by: Pratyush Anand Signed-off-by: Will Deacon Signed-off-by: Pavel Labath Change-Id: I28b1ca63f63182c10c3d6b6b3bacf6c56887ddbe Bug: 30919905 --- arch/arm64/include/asm/hw_breakpoint.h | 2 +- arch/arm64/kernel/hw_breakpoint.c | 47 +++++++++++++------------- arch/arm64/kernel/ptrace.c | 7 ++-- 3 files changed, 28 insertions(+), 28 deletions(-) diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 9732908bfc8a..8acfd989a4e4 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -110,7 +110,7 @@ struct perf_event; struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, - int *gen_len, int *gen_type); + int *gen_len, int *gen_type, int *offset); extern int arch_check_bp_in_kernelspace(struct perf_event *bp); extern int arch_validate_hwbkpt_settings(struct perf_event *bp); extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index 367a954f9937..1f4fc5364a47 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -349,7 +349,7 @@ int arch_check_bp_in_kernelspace(struct perf_event *bp) * to generic breakpoint descriptions. */ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, - int *gen_len, int *gen_type) + int *gen_len, int *gen_type, int *offset) { /* Type */ switch (ctrl.type) { @@ -369,8 +369,12 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, return -EINVAL; } + if (!ctrl.len) + return -EINVAL; + *offset = __ffs(ctrl.len); + /* Len */ - switch (ctrl.len) { + switch (ctrl.len >> *offset) { case ARM_BREAKPOINT_LEN_1: *gen_len = HW_BREAKPOINT_LEN_1; break; @@ -517,18 +521,17 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) default: return -EINVAL; } - - info->address &= ~alignment_mask; - info->ctrl.len <<= offset; } else { if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE) alignment_mask = 0x3; else alignment_mask = 0x7; - if (info->address & alignment_mask) - return -EINVAL; + offset = info->address & alignment_mask; } + info->address &= ~alignment_mask; + info->ctrl.len <<= offset; + /* * Disallow per-task kernel breakpoints since these would * complicate the stepping code. @@ -665,8 +668,8 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, struct pt_regs *regs) { int i, step = 0, *kernel_step, access; - u32 ctrl_reg; - u64 val, alignment_mask; + u32 ctrl_reg, lens, lene; + u64 val; struct perf_event *wp, **slots; struct debug_info *debug_info; struct arch_hw_breakpoint *info; @@ -684,25 +687,21 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, goto unlock; info = counter_arch_bp(wp); - /* AArch32 watchpoints are either 4 or 8 bytes aligned. */ - if (is_compat_task()) { - if (info->ctrl.len == ARM_BREAKPOINT_LEN_8) - alignment_mask = 0x7; - else - alignment_mask = 0x3; - } else { - alignment_mask = 0x7; - } - /* Check if the watchpoint value matches. */ + /* Check if the watchpoint value and byte select match. */ val = read_wb_reg(AARCH64_DBG_REG_WVR, i); - if (val != (addr & ~alignment_mask)) - goto unlock; - - /* Possible match, check the byte address select to confirm. */ ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i); decode_ctrl_reg(ctrl_reg, &ctrl); - if (!((1 << (addr & alignment_mask)) & ctrl.len)) + lens = ffs(ctrl.len) - 1; + lene = fls(ctrl.len) - 1; + /* + * FIXME: reported address can be anywhere between "the + * lowest address accessed by the memory access that + * triggered the watchpoint" and "the highest watchpointed + * address accessed by the memory access". So, it may not + * lie in the interval of watchpoint address range. + */ + if (addr < val + lens || addr > val + lene) goto unlock; /* diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index c5ef05959813..6204b7600d1b 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -327,13 +327,13 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type, struct arch_hw_breakpoint_ctrl ctrl, struct perf_event_attr *attr) { - int err, len, type, disabled = !ctrl.enabled; + int err, len, type, offset, disabled = !ctrl.enabled; attr->disabled = disabled; if (disabled) return 0; - err = arch_bp_generic_fields(ctrl, &len, &type); + err = arch_bp_generic_fields(ctrl, &len, &type, &offset); if (err) return err; @@ -352,6 +352,7 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type, attr->bp_len = len; attr->bp_type = type; + attr->bp_addr += offset; return 0; } @@ -404,7 +405,7 @@ static int ptrace_hbp_get_addr(unsigned int note_type, if (IS_ERR(bp)) return PTR_ERR(bp); - *addr = bp ? bp->attr.bp_addr : 0; + *addr = bp ? counter_arch_bp(bp)->address : 0; return 0; } From 98912f6825a5223fd6960f20f598d699e19ff299 Mon Sep 17 00:00:00 2001 From: Pavel Labath Date: Mon, 14 Nov 2016 19:32:44 +0530 Subject: [PATCH 207/521] BACKPORT: arm64: hw_breakpoint: Handle inexact watchpoint addresses (cherry picked from commit fdfeff0f9e3d9be2b68fa02566017ffc581ae17b) Arm64 hardware does not always report a watchpoint hit address that matches one of the watchpoints set. It can also report an address "near" the watchpoint if a single instruction access both watched and unwatched addresses. There is no straight-forward way, short of disassembling the offending instruction, to map that address back to the watchpoint. Previously, when the hardware reported a watchpoint hit on an address that did not match our watchpoint (this happens in case of instructions which access large chunks of memory such as "stp") the process would enter a loop where we would be continually resuming it (because we did not recognise that watchpoint hit) and it would keep hitting the watchpoint again and again. The tracing process would never get notified of the watchpoint hit. This commit fixes the problem by looking at the watchpoints near the address reported by the hardware. If the address does not exactly match one of the watchpoints we have set, it attributes the hit to the nearest watchpoint we have. This heuristic is a bit dodgy, but I don't think we can do much more, given the hardware limitations. Signed-off-by: Pavel Labath [panand: reworked to rebase on his patches] Signed-off-by: Pratyush Anand [will: use __ffs instead of ffs - 1] Signed-off-by: Will Deacon Signed-off-by: Pavel Labath [pavel: trivial fixup in hw_breakpoint.c:watchpoint_handler] Change-Id: I714dfaa3947d89d89a9e9a1ea84914d44ba0faa3 Bug: 30919905 --- arch/arm64/kernel/hw_breakpoint.c | 98 ++++++++++++++++++++++--------- 1 file changed, 70 insertions(+), 28 deletions(-) diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index 1f4fc5364a47..3730b1fb3344 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -664,11 +664,46 @@ static int breakpoint_handler(unsigned long unused, unsigned int esr, } NOKPROBE_SYMBOL(breakpoint_handler); +/* + * Arm64 hardware does not always report a watchpoint hit address that matches + * one of the watchpoints set. It can also report an address "near" the + * watchpoint if a single instruction access both watched and unwatched + * addresses. There is no straight-forward way, short of disassembling the + * offending instruction, to map that address back to the watchpoint. This + * function computes the distance of the memory access from the watchpoint as a + * heuristic for the likelyhood that a given access triggered the watchpoint. + * + * See Section D2.10.5 "Determining the memory location that caused a Watchpoint + * exception" of ARMv8 Architecture Reference Manual for details. + * + * The function returns the distance of the address from the bytes watched by + * the watchpoint. In case of an exact match, it returns 0. + */ +static u64 get_distance_from_watchpoint(unsigned long addr, u64 val, + struct arch_hw_breakpoint_ctrl *ctrl) +{ + u64 wp_low, wp_high; + u32 lens, lene; + + lens = __ffs(ctrl->len); + lene = __fls(ctrl->len); + + wp_low = val + lens; + wp_high = val + lene; + if (addr < wp_low) + return wp_low - addr; + else if (addr > wp_high) + return addr - wp_high; + else + return 0; +} + static int watchpoint_handler(unsigned long addr, unsigned int esr, struct pt_regs *regs) { - int i, step = 0, *kernel_step, access; - u32 ctrl_reg, lens, lene; + int i, step = 0, *kernel_step, access, closest_match = 0; + u64 min_dist = -1, dist; + u32 ctrl_reg; u64 val; struct perf_event *wp, **slots; struct debug_info *debug_info; @@ -678,31 +713,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, slots = this_cpu_ptr(wp_on_reg); debug_info = ¤t->thread.debug; + /* + * Find all watchpoints that match the reported address. If no exact + * match is found. Attribute the hit to the closest watchpoint. + */ + rcu_read_lock(); for (i = 0; i < core_num_wrps; ++i) { - rcu_read_lock(); - wp = slots[i]; - if (wp == NULL) - goto unlock; - - info = counter_arch_bp(wp); - - /* Check if the watchpoint value and byte select match. */ - val = read_wb_reg(AARCH64_DBG_REG_WVR, i); - ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i); - decode_ctrl_reg(ctrl_reg, &ctrl); - lens = ffs(ctrl.len) - 1; - lene = fls(ctrl.len) - 1; - /* - * FIXME: reported address can be anywhere between "the - * lowest address accessed by the memory access that - * triggered the watchpoint" and "the highest watchpointed - * address accessed by the memory access". So, it may not - * lie in the interval of watchpoint address range. - */ - if (addr < val + lens || addr > val + lene) - goto unlock; + continue; /* * Check that the access type matches. @@ -711,18 +730,41 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, access = (esr & AARCH64_ESR_ACCESS_MASK) ? HW_BREAKPOINT_W : HW_BREAKPOINT_R; if (!(access & hw_breakpoint_type(wp))) - goto unlock; + continue; + /* Check if the watchpoint value and byte select match. */ + val = read_wb_reg(AARCH64_DBG_REG_WVR, i); + ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i); + decode_ctrl_reg(ctrl_reg, &ctrl); + dist = get_distance_from_watchpoint(addr, val, &ctrl); + if (dist < min_dist) { + min_dist = dist; + closest_match = i; + } + /* Is this an exact match? */ + if (dist != 0) + continue; + + info = counter_arch_bp(wp); info->trigger = addr; perf_bp_event(wp, regs); /* Do we need to handle the stepping? */ if (!wp->overflow_handler) step = 1; - -unlock: - rcu_read_unlock(); } + if (min_dist > 0 && min_dist != -1) { + /* No exact match found. */ + wp = slots[closest_match]; + info = counter_arch_bp(wp); + info->trigger = addr; + perf_bp_event(wp, regs); + + /* Do we need to handle the stepping? */ + if (!wp->overflow_handler) + step = 1; + } + rcu_read_unlock(); if (!step) return 0; From 2f090863ffad75de06c3c3ee8a86a594ddbf3a58 Mon Sep 17 00:00:00 2001 From: Pratyush Anand Date: Mon, 14 Nov 2016 19:32:45 +0530 Subject: [PATCH 208/521] UPSTREAM: arm64: Allow hw watchpoint of length 3,5,6 and 7 (cherry picked from commit 0ddb8e0b784ba034f3096d5a54684d0d73155e2a) Since, arm64 can support all offset within a double word limit. Therefore, now support other lengths within that range as well. Signed-off-by: Pratyush Anand Signed-off-by: Will Deacon Signed-off-by: Pavel Labath Change-Id: Ibcb263a3903572336ccbf96e0180d3990326545a Bug: 30919905 --- arch/arm64/include/asm/hw_breakpoint.h | 4 +++ arch/arm64/kernel/hw_breakpoint.c | 36 ++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 8acfd989a4e4..c72b8e201ab4 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -68,7 +68,11 @@ static inline void decode_ctrl_reg(u32 reg, /* Lengths */ #define ARM_BREAKPOINT_LEN_1 0x1 #define ARM_BREAKPOINT_LEN_2 0x3 +#define ARM_BREAKPOINT_LEN_3 0x7 #define ARM_BREAKPOINT_LEN_4 0xf +#define ARM_BREAKPOINT_LEN_5 0x1f +#define ARM_BREAKPOINT_LEN_6 0x3f +#define ARM_BREAKPOINT_LEN_7 0x7f #define ARM_BREAKPOINT_LEN_8 0xff /* Kernel stepping */ diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index 3730b1fb3344..f4dfd8c41e06 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -317,9 +317,21 @@ static int get_hbp_len(u8 hbp_len) case ARM_BREAKPOINT_LEN_2: len_in_bytes = 2; break; + case ARM_BREAKPOINT_LEN_3: + len_in_bytes = 3; + break; case ARM_BREAKPOINT_LEN_4: len_in_bytes = 4; break; + case ARM_BREAKPOINT_LEN_5: + len_in_bytes = 5; + break; + case ARM_BREAKPOINT_LEN_6: + len_in_bytes = 6; + break; + case ARM_BREAKPOINT_LEN_7: + len_in_bytes = 7; + break; case ARM_BREAKPOINT_LEN_8: len_in_bytes = 8; break; @@ -381,9 +393,21 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, case ARM_BREAKPOINT_LEN_2: *gen_len = HW_BREAKPOINT_LEN_2; break; + case ARM_BREAKPOINT_LEN_3: + *gen_len = HW_BREAKPOINT_LEN_3; + break; case ARM_BREAKPOINT_LEN_4: *gen_len = HW_BREAKPOINT_LEN_4; break; + case ARM_BREAKPOINT_LEN_5: + *gen_len = HW_BREAKPOINT_LEN_5; + break; + case ARM_BREAKPOINT_LEN_6: + *gen_len = HW_BREAKPOINT_LEN_6; + break; + case ARM_BREAKPOINT_LEN_7: + *gen_len = HW_BREAKPOINT_LEN_7; + break; case ARM_BREAKPOINT_LEN_8: *gen_len = HW_BREAKPOINT_LEN_8; break; @@ -427,9 +451,21 @@ static int arch_build_bp_info(struct perf_event *bp) case HW_BREAKPOINT_LEN_2: info->ctrl.len = ARM_BREAKPOINT_LEN_2; break; + case HW_BREAKPOINT_LEN_3: + info->ctrl.len = ARM_BREAKPOINT_LEN_3; + break; case HW_BREAKPOINT_LEN_4: info->ctrl.len = ARM_BREAKPOINT_LEN_4; break; + case HW_BREAKPOINT_LEN_5: + info->ctrl.len = ARM_BREAKPOINT_LEN_5; + break; + case HW_BREAKPOINT_LEN_6: + info->ctrl.len = ARM_BREAKPOINT_LEN_6; + break; + case HW_BREAKPOINT_LEN_7: + info->ctrl.len = ARM_BREAKPOINT_LEN_7; + break; case HW_BREAKPOINT_LEN_8: info->ctrl.len = ARM_BREAKPOINT_LEN_8; break; From 89613b1da8445026521758e336912c06d3fd321b Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Tue, 13 Dec 2016 10:33:34 -0800 Subject: [PATCH 209/521] FROMLIST: 9p: fix a potential acl leak (https://lkml.org/lkml/2016/12/13/579) posix_acl_update_mode() could possibly clear 'acl', if so we leak the memory pointed by 'acl'. Save this pointer before calling posix_acl_update_mode() and release the memory if 'acl' really gets cleared. Reported-by: Mark Salyzyn Reviewed-by: Jan Kara Reviewed-by: Greg Kurz Cc: Eric Van Hensbergen Cc: Ron Minnich Cc: Latchesar Ionkov Signed-off-by: Cong Wang Bug: 32458736 Change-Id: Ia78da401e6fd1bfd569653bd2cd0ebd3f9c737a0 --- fs/9p/acl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/9p/acl.c b/fs/9p/acl.c index 929b618da43b..c30c6ceac2c4 100644 --- a/fs/9p/acl.c +++ b/fs/9p/acl.c @@ -283,6 +283,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler *handler, case ACL_TYPE_ACCESS: if (acl) { struct iattr iattr; + struct posix_acl *old_acl = acl; retval = posix_acl_update_mode(inode, &iattr.ia_mode, &acl); if (retval) @@ -293,6 +294,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler *handler, * by the mode bits. So don't * update ACL. */ + posix_acl_release(old_acl); value = NULL; size = 0; } From 2445eaaabeb9193fbeec47c30d9b1a97ff024899 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Mon, 6 Feb 2017 14:27:24 -0800 Subject: [PATCH 210/521] ANDROID: android-recommended.cfg: CONFIG_CPU_SW_DOMAIN_PAN=y Bug: 31374660 Change-Id: Id2710a5fa2694da66d3f34cbcc0c2a58a006cec5 Signed-off-by: Sami Tolvanen --- android/configs/android-recommended.cfg | 1 + 1 file changed, 1 insertion(+) diff --git a/android/configs/android-recommended.cfg b/android/configs/android-recommended.cfg index 70aaae17ad29..28610303db60 100644 --- a/android/configs/android-recommended.cfg +++ b/android/configs/android-recommended.cfg @@ -15,6 +15,7 @@ CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=8192 CONFIG_CC_STACKPROTECTOR_STRONG=y CONFIG_COMPACTION=y +CONFIG_CPU_SW_DOMAIN_PAN=y CONFIG_DEBUG_RODATA=y CONFIG_DM_UEVENT=y CONFIG_DRAGONRISE_FF=y From 3de0af4df53087c5648e1e3cbfffbdd0f56660d9 Mon Sep 17 00:00:00 2001 From: Adrien Schildknecht Date: Wed, 28 Sep 2016 12:14:39 -0700 Subject: [PATCH 211/521] Squashfs: remove the FILE_CACHE option FILE_DIRECT is working fine and offers faster results and lower memory footprint. Removing FILE_CACHE makes our life easier because we don't have to maintain 2 differents function that does the same thing. Signed-off-by: Adrien Schildknecht Change-Id: I6689ba74d0042c222a806f9edc539995e8e04c6b --- fs/squashfs/Kconfig | 28 --------------------------- fs/squashfs/Makefile | 3 +-- fs/squashfs/file_cache.c | 38 ------------------------------------ fs/squashfs/page_actor.h | 42 +--------------------------------------- 4 files changed, 2 insertions(+), 109 deletions(-) delete mode 100644 fs/squashfs/file_cache.c diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig index ffb093e72b6c..6dd158a216f4 100644 --- a/fs/squashfs/Kconfig +++ b/fs/squashfs/Kconfig @@ -25,34 +25,6 @@ config SQUASHFS If unsure, say N. -choice - prompt "File decompression options" - depends on SQUASHFS - help - Squashfs now supports two options for decompressing file - data. Traditionally Squashfs has decompressed into an - intermediate buffer and then memcopied it into the page cache. - Squashfs now supports the ability to decompress directly into - the page cache. - - If unsure, select "Decompress file data into an intermediate buffer" - -config SQUASHFS_FILE_CACHE - bool "Decompress file data into an intermediate buffer" - help - Decompress file data into an intermediate buffer and then - memcopy it into the page cache. - -config SQUASHFS_FILE_DIRECT - bool "Decompress files directly into the page cache" - help - Directly decompress file data into the page cache. - Doing so can significantly improve performance because - it eliminates a memcpy and it also removes the lock contention - on the single buffer. - -endchoice - choice prompt "Decompressor parallelisation options" depends on SQUASHFS diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile index 246a6f329d89..fe51f1507ed1 100644 --- a/fs/squashfs/Makefile +++ b/fs/squashfs/Makefile @@ -5,8 +5,7 @@ obj-$(CONFIG_SQUASHFS) += squashfs.o squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o squashfs-y += namei.o super.o symlink.o decompressor.o -squashfs-$(CONFIG_SQUASHFS_FILE_CACHE) += file_cache.o -squashfs-$(CONFIG_SQUASHFS_FILE_DIRECT) += file_direct.o page_actor.o +squashfs-y += file_direct.o page_actor.o squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o diff --git a/fs/squashfs/file_cache.c b/fs/squashfs/file_cache.c deleted file mode 100644 index f2310d2a2019..000000000000 --- a/fs/squashfs/file_cache.c +++ /dev/null @@ -1,38 +0,0 @@ -/* - * Copyright (c) 2013 - * Phillip Lougher - * - * This work is licensed under the terms of the GNU GPL, version 2. See - * the COPYING file in the top-level directory. - */ - -#include -#include -#include -#include -#include -#include -#include - -#include "squashfs_fs.h" -#include "squashfs_fs_sb.h" -#include "squashfs_fs_i.h" -#include "squashfs.h" - -/* Read separately compressed datablock and memcopy into page cache */ -int squashfs_readpage_block(struct page *page, u64 block, int bsize) -{ - struct inode *i = page->mapping->host; - struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb, - block, bsize); - int res = buffer->error; - - if (res) - ERROR("Unable to read page, block %llx, size %x\n", block, - bsize); - else - squashfs_copy_cache(page, buffer, buffer->length, 0); - - squashfs_cache_put(buffer); - return res; -} diff --git a/fs/squashfs/page_actor.h b/fs/squashfs/page_actor.h index 26dd82008b82..d2df0544e0df 100644 --- a/fs/squashfs/page_actor.h +++ b/fs/squashfs/page_actor.h @@ -8,46 +8,6 @@ * the COPYING file in the top-level directory. */ -#ifndef CONFIG_SQUASHFS_FILE_DIRECT -struct squashfs_page_actor { - void **page; - int pages; - int length; - int next_page; -}; - -static inline struct squashfs_page_actor *squashfs_page_actor_init(void **page, - int pages, int length) -{ - struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL); - - if (actor == NULL) - return NULL; - - actor->length = length ? : pages * PAGE_CACHE_SIZE; - actor->page = page; - actor->pages = pages; - actor->next_page = 0; - return actor; -} - -static inline void *squashfs_first_page(struct squashfs_page_actor *actor) -{ - actor->next_page = 1; - return actor->page[0]; -} - -static inline void *squashfs_next_page(struct squashfs_page_actor *actor) -{ - return actor->next_page == actor->pages ? NULL : - actor->page[actor->next_page++]; -} - -static inline void squashfs_finish_page(struct squashfs_page_actor *actor) -{ - /* empty */ -} -#else struct squashfs_page_actor { union { void **buffer; @@ -77,5 +37,5 @@ static inline void squashfs_finish_page(struct squashfs_page_actor *actor) { actor->squashfs_finish_page(actor); } -#endif + #endif From 4bc7d97903d6dc4c6baedda07608b41481140761 Mon Sep 17 00:00:00 2001 From: Adrien Schildknecht Date: Wed, 28 Sep 2016 13:59:18 -0700 Subject: [PATCH 212/521] Squashfs: refactor page_actor This patch essentially does 3 things: 1/ Always use an array of page to store the data instead of a mix of buffers and pages. 2/ It is now possible to have 'holes' in a page actor, i.e. NULL pages in the array. When reading a block (default 128K), squashfs tries to grab all the pages covering this block. If a single page is up-to-date or locked, it falls back to using an intermediate buffer to do the read and then copy the pages in the actor. Allowing holes in the page actor remove the need for this intermediate buffer. 3/ Refactor the wrappers to share code that deals with page actors. Signed-off-by: Adrien Schildknecht Change-Id: I98128bed5d518cf31b67e788a85b275e9a323bec --- fs/squashfs/cache.c | 73 +++++-------- fs/squashfs/decompressor.c | 53 ++++----- fs/squashfs/file_direct.c | 4 +- fs/squashfs/lz4_wrapper.c | 32 +----- fs/squashfs/lzo_wrapper.c | 40 ++----- fs/squashfs/page_actor.c | 201 ++++++++++++++++++++++------------- fs/squashfs/page_actor.h | 64 +++++++---- fs/squashfs/squashfs_fs_sb.h | 2 +- fs/squashfs/xz_wrapper.c | 15 ++- fs/squashfs/zlib_wrapper.c | 14 ++- 10 files changed, 270 insertions(+), 228 deletions(-) diff --git a/fs/squashfs/cache.c b/fs/squashfs/cache.c index 1cb70a0b2168..6785d086ab38 100644 --- a/fs/squashfs/cache.c +++ b/fs/squashfs/cache.c @@ -209,17 +209,14 @@ void squashfs_cache_put(struct squashfs_cache_entry *entry) */ void squashfs_cache_delete(struct squashfs_cache *cache) { - int i, j; + int i; if (cache == NULL) return; for (i = 0; i < cache->entries; i++) { - if (cache->entry[i].data) { - for (j = 0; j < cache->pages; j++) - kfree(cache->entry[i].data[j]); - kfree(cache->entry[i].data); - } + if (cache->entry[i].page) + free_page_array(cache->entry[i].page, cache->pages); kfree(cache->entry[i].actor); } @@ -236,7 +233,7 @@ void squashfs_cache_delete(struct squashfs_cache *cache) struct squashfs_cache *squashfs_cache_init(char *name, int entries, int block_size) { - int i, j; + int i; struct squashfs_cache *cache = kzalloc(sizeof(*cache), GFP_KERNEL); if (cache == NULL) { @@ -268,22 +265,13 @@ struct squashfs_cache *squashfs_cache_init(char *name, int entries, init_waitqueue_head(&cache->entry[i].wait_queue); entry->cache = cache; entry->block = SQUASHFS_INVALID_BLK; - entry->data = kcalloc(cache->pages, sizeof(void *), GFP_KERNEL); - if (entry->data == NULL) { + entry->page = alloc_page_array(cache->pages, GFP_KERNEL); + if (!entry->page) { ERROR("Failed to allocate %s cache entry\n", name); goto cleanup; } - - for (j = 0; j < cache->pages; j++) { - entry->data[j] = kmalloc(PAGE_CACHE_SIZE, GFP_KERNEL); - if (entry->data[j] == NULL) { - ERROR("Failed to allocate %s buffer\n", name); - goto cleanup; - } - } - - entry->actor = squashfs_page_actor_init(entry->data, - cache->pages, 0); + entry->actor = squashfs_page_actor_init(entry->page, + cache->pages, 0, NULL); if (entry->actor == NULL) { ERROR("Failed to allocate %s cache entry\n", name); goto cleanup; @@ -314,18 +302,20 @@ int squashfs_copy_data(void *buffer, struct squashfs_cache_entry *entry, return min(length, entry->length - offset); while (offset < entry->length) { - void *buff = entry->data[offset / PAGE_CACHE_SIZE] - + (offset % PAGE_CACHE_SIZE); + void *buff = kmap_atomic(entry->page[offset / PAGE_CACHE_SIZE]) + + (offset % PAGE_CACHE_SIZE); int bytes = min_t(int, entry->length - offset, PAGE_CACHE_SIZE - (offset % PAGE_CACHE_SIZE)); if (bytes >= remaining) { memcpy(buffer, buff, remaining); + kunmap_atomic(buff); remaining = 0; break; } memcpy(buffer, buff, bytes); + kunmap_atomic(buff); buffer += bytes; remaining -= bytes; offset += bytes; @@ -416,43 +406,38 @@ struct squashfs_cache_entry *squashfs_get_datablock(struct super_block *sb, void *squashfs_read_table(struct super_block *sb, u64 block, int length) { int pages = (length + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; - int i, res; - void *table, *buffer, **data; + struct page **page; + void *buff; + int res; struct squashfs_page_actor *actor; - table = buffer = kmalloc(length, GFP_KERNEL); - if (table == NULL) + page = alloc_page_array(pages, GFP_KERNEL); + if (!page) return ERR_PTR(-ENOMEM); - data = kcalloc(pages, sizeof(void *), GFP_KERNEL); - if (data == NULL) { + actor = squashfs_page_actor_init(page, pages, length, NULL); + if (actor == NULL) { res = -ENOMEM; goto failed; } - actor = squashfs_page_actor_init(data, pages, length); - if (actor == NULL) { - res = -ENOMEM; - goto failed2; - } - - for (i = 0; i < pages; i++, buffer += PAGE_CACHE_SIZE) - data[i] = buffer; - res = squashfs_read_data(sb, block, length | SQUASHFS_COMPRESSED_BIT_BLOCK, NULL, actor); - kfree(data); - kfree(actor); - if (res < 0) - goto failed; + goto failed2; - return table; + buff = kmalloc(length, GFP_KERNEL); + if (!buff) + goto failed2; + squashfs_actor_to_buf(actor, buff, length); + squashfs_page_actor_free(actor, 0); + free_page_array(page, pages); + return buff; failed2: - kfree(data); + squashfs_page_actor_free(actor, 0); failed: - kfree(table); + free_page_array(page, pages); return ERR_PTR(res); } diff --git a/fs/squashfs/decompressor.c b/fs/squashfs/decompressor.c index e9034bf6e5ae..7de35bf297aa 100644 --- a/fs/squashfs/decompressor.c +++ b/fs/squashfs/decompressor.c @@ -24,7 +24,8 @@ #include #include #include -#include +#include +#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -94,40 +95,44 @@ const struct squashfs_decompressor *squashfs_lookup_decompressor(int id) static void *get_comp_opts(struct super_block *sb, unsigned short flags) { struct squashfs_sb_info *msblk = sb->s_fs_info; - void *buffer = NULL, *comp_opts; + void *comp_opts, *buffer = NULL; + struct page *page; struct squashfs_page_actor *actor = NULL; int length = 0; + if (!SQUASHFS_COMP_OPTS(flags)) + return squashfs_comp_opts(msblk, buffer, length); + /* * Read decompressor specific options from file system if present */ - if (SQUASHFS_COMP_OPTS(flags)) { - buffer = kmalloc(PAGE_CACHE_SIZE, GFP_KERNEL); - if (buffer == NULL) { - comp_opts = ERR_PTR(-ENOMEM); - goto out; - } - actor = squashfs_page_actor_init(&buffer, 1, 0); - if (actor == NULL) { - comp_opts = ERR_PTR(-ENOMEM); - goto out; - } + page = alloc_page(GFP_KERNEL); + if (!page) + return ERR_PTR(-ENOMEM); - length = squashfs_read_data(sb, - sizeof(struct squashfs_super_block), 0, NULL, actor); - - if (length < 0) { - comp_opts = ERR_PTR(length); - goto out; - } + actor = squashfs_page_actor_init(&page, 1, 0, NULL); + if (actor == NULL) { + comp_opts = ERR_PTR(-ENOMEM); + goto actor_error; } - comp_opts = squashfs_comp_opts(msblk, buffer, length); + length = squashfs_read_data(sb, + sizeof(struct squashfs_super_block), 0, NULL, actor); -out: - kfree(actor); - kfree(buffer); + if (length < 0) { + comp_opts = ERR_PTR(length); + goto read_error; + } + + buffer = kmap_atomic(page); + comp_opts = squashfs_comp_opts(msblk, buffer, length); + kunmap_atomic(buffer); + +read_error: + squashfs_page_actor_free(actor, 0); +actor_error: + __free_page(page); return comp_opts; } diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c index 43e7a7eddac0..9d033ec0f133 100644 --- a/fs/squashfs/file_direct.c +++ b/fs/squashfs/file_direct.c @@ -52,7 +52,7 @@ int squashfs_readpage_block(struct page *target_page, u64 block, int bsize) * Create a "page actor" which will kmap and kunmap the * page cache pages appropriately within the decompressor */ - actor = squashfs_page_actor_init_special(page, pages, 0); + actor = squashfs_page_actor_init(page, pages, 0, NULL); if (actor == NULL) goto out; @@ -131,7 +131,7 @@ int squashfs_readpage_block(struct page *target_page, u64 block, int bsize) } out: - kfree(actor); + squashfs_page_actor_free(actor, 0); kfree(page); return res; } diff --git a/fs/squashfs/lz4_wrapper.c b/fs/squashfs/lz4_wrapper.c index c31e2bc9c081..df4fa3c7ddd0 100644 --- a/fs/squashfs/lz4_wrapper.c +++ b/fs/squashfs/lz4_wrapper.c @@ -94,39 +94,17 @@ static int lz4_uncompress(struct squashfs_sb_info *msblk, void *strm, struct buffer_head **bh, int b, int offset, int length, struct squashfs_page_actor *output) { - struct squashfs_lz4 *stream = strm; - void *buff = stream->input, *data; - int avail, i, bytes = length, res; + int res; size_t dest_len = output->length; + struct squashfs_lz4 *stream = strm; - for (i = 0; i < b; i++) { - avail = min(bytes, msblk->devblksize - offset); - memcpy(buff, bh[i]->b_data + offset, avail); - buff += avail; - bytes -= avail; - offset = 0; - put_bh(bh[i]); - } - + squashfs_bh_to_buf(bh, b, stream->input, offset, length, + msblk->devblksize); res = lz4_decompress_unknownoutputsize(stream->input, length, stream->output, &dest_len); if (res) return -EIO; - - bytes = dest_len; - data = squashfs_first_page(output); - buff = stream->output; - while (data) { - if (bytes <= PAGE_CACHE_SIZE) { - memcpy(data, buff, bytes); - break; - } - memcpy(data, buff, PAGE_CACHE_SIZE); - buff += PAGE_CACHE_SIZE; - bytes -= PAGE_CACHE_SIZE; - data = squashfs_next_page(output); - } - squashfs_finish_page(output); + squashfs_buf_to_actor(stream->output, output, dest_len); return dest_len; } diff --git a/fs/squashfs/lzo_wrapper.c b/fs/squashfs/lzo_wrapper.c index 244b9fbfff7b..2c844d53a59e 100644 --- a/fs/squashfs/lzo_wrapper.c +++ b/fs/squashfs/lzo_wrapper.c @@ -79,45 +79,19 @@ static int lzo_uncompress(struct squashfs_sb_info *msblk, void *strm, struct buffer_head **bh, int b, int offset, int length, struct squashfs_page_actor *output) { - struct squashfs_lzo *stream = strm; - void *buff = stream->input, *data; - int avail, i, bytes = length, res; + int res; size_t out_len = output->length; + struct squashfs_lzo *stream = strm; - for (i = 0; i < b; i++) { - avail = min(bytes, msblk->devblksize - offset); - memcpy(buff, bh[i]->b_data + offset, avail); - buff += avail; - bytes -= avail; - offset = 0; - put_bh(bh[i]); - } - + squashfs_bh_to_buf(bh, b, stream->input, offset, length, + msblk->devblksize); res = lzo1x_decompress_safe(stream->input, (size_t)length, stream->output, &out_len); if (res != LZO_E_OK) - goto failed; + return -EIO; + squashfs_buf_to_actor(stream->output, output, out_len); - res = bytes = (int)out_len; - data = squashfs_first_page(output); - buff = stream->output; - while (data) { - if (bytes <= PAGE_CACHE_SIZE) { - memcpy(data, buff, bytes); - break; - } else { - memcpy(data, buff, PAGE_CACHE_SIZE); - buff += PAGE_CACHE_SIZE; - bytes -= PAGE_CACHE_SIZE; - data = squashfs_next_page(output); - } - } - squashfs_finish_page(output); - - return res; - -failed: - return -EIO; + return out_len; } const struct squashfs_decompressor squashfs_lzo_comp_ops = { diff --git a/fs/squashfs/page_actor.c b/fs/squashfs/page_actor.c index 5a1c11f56441..53863508e400 100644 --- a/fs/squashfs/page_actor.c +++ b/fs/squashfs/page_actor.c @@ -9,79 +9,11 @@ #include #include #include +#include #include "page_actor.h" -/* - * This file contains implementations of page_actor for decompressing into - * an intermediate buffer, and for decompressing directly into the - * page cache. - * - * Calling code should avoid sleeping between calls to squashfs_first_page() - * and squashfs_finish_page(). - */ - -/* Implementation of page_actor for decompressing into intermediate buffer */ -static void *cache_first_page(struct squashfs_page_actor *actor) -{ - actor->next_page = 1; - return actor->buffer[0]; -} - -static void *cache_next_page(struct squashfs_page_actor *actor) -{ - if (actor->next_page == actor->pages) - return NULL; - - return actor->buffer[actor->next_page++]; -} - -static void cache_finish_page(struct squashfs_page_actor *actor) -{ - /* empty */ -} - -struct squashfs_page_actor *squashfs_page_actor_init(void **buffer, - int pages, int length) -{ - struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL); - - if (actor == NULL) - return NULL; - - actor->length = length ? : pages * PAGE_CACHE_SIZE; - actor->buffer = buffer; - actor->pages = pages; - actor->next_page = 0; - actor->squashfs_first_page = cache_first_page; - actor->squashfs_next_page = cache_next_page; - actor->squashfs_finish_page = cache_finish_page; - return actor; -} - -/* Implementation of page_actor for decompressing directly into page cache. */ -static void *direct_first_page(struct squashfs_page_actor *actor) -{ - actor->next_page = 1; - return actor->pageaddr = kmap_atomic(actor->page[0]); -} - -static void *direct_next_page(struct squashfs_page_actor *actor) -{ - if (actor->pageaddr) - kunmap_atomic(actor->pageaddr); - - return actor->pageaddr = actor->next_page == actor->pages ? NULL : - kmap_atomic(actor->page[actor->next_page++]); -} - -static void direct_finish_page(struct squashfs_page_actor *actor) -{ - if (actor->pageaddr) - kunmap_atomic(actor->pageaddr); -} - -struct squashfs_page_actor *squashfs_page_actor_init_special(struct page **page, - int pages, int length) +struct squashfs_page_actor *squashfs_page_actor_init(struct page **page, + int pages, int length, void (*release_pages)(struct page **, int, int)) { struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL); @@ -93,8 +25,129 @@ struct squashfs_page_actor *squashfs_page_actor_init_special(struct page **page, actor->pages = pages; actor->next_page = 0; actor->pageaddr = NULL; - actor->squashfs_first_page = direct_first_page; - actor->squashfs_next_page = direct_next_page; - actor->squashfs_finish_page = direct_finish_page; + actor->release_pages = release_pages; return actor; } + +void squashfs_page_actor_free(struct squashfs_page_actor *actor, int error) +{ + if (!actor) + return; + + if (actor->release_pages) + actor->release_pages(actor->page, actor->pages, error); + kfree(actor); +} + +void squashfs_actor_to_buf(struct squashfs_page_actor *actor, void *buf, + int length) +{ + void *pageaddr; + int pos = 0, avail, i; + + for (i = 0; i < actor->pages && pos < length; ++i) { + avail = min_t(int, length - pos, PAGE_CACHE_SIZE); + if (actor->page[i]) { + pageaddr = kmap_atomic(actor->page[i]); + memcpy(buf + pos, pageaddr, avail); + kunmap_atomic(pageaddr); + } + pos += avail; + } +} + +void squashfs_buf_to_actor(void *buf, struct squashfs_page_actor *actor, + int length) +{ + void *pageaddr; + int pos = 0, avail, i; + + for (i = 0; i < actor->pages && pos < length; ++i) { + avail = min_t(int, length - pos, PAGE_CACHE_SIZE); + if (actor->page[i]) { + pageaddr = kmap_atomic(actor->page[i]); + memcpy(pageaddr, buf + pos, avail); + kunmap_atomic(pageaddr); + } + pos += avail; + } +} + +void squashfs_bh_to_actor(struct buffer_head **bh, int nr_buffers, + struct squashfs_page_actor *actor, int offset, int length, int blksz) +{ + void *kaddr = NULL; + int bytes = 0, pgoff = 0, b = 0, p = 0, avail, i; + + while (bytes < length) { + if (actor->page[p]) { + kaddr = kmap_atomic(actor->page[p]); + while (pgoff < PAGE_CACHE_SIZE && bytes < length) { + avail = min_t(int, blksz - offset, + PAGE_CACHE_SIZE - pgoff); + memcpy(kaddr + pgoff, bh[b]->b_data + offset, + avail); + pgoff += avail; + bytes += avail; + offset = (offset + avail) % blksz; + if (!offset) { + put_bh(bh[b]); + ++b; + } + } + kunmap_atomic(kaddr); + pgoff = 0; + } else { + for (i = 0; i < PAGE_CACHE_SIZE / blksz; ++i) { + if (bh[b]) + put_bh(bh[b]); + ++b; + } + bytes += PAGE_CACHE_SIZE; + } + ++p; + } +} + +void squashfs_bh_to_buf(struct buffer_head **bh, int nr_buffers, void *buf, + int offset, int length, int blksz) +{ + int i, avail, bytes = 0; + + for (i = 0; i < nr_buffers && bytes < length; ++i) { + avail = min_t(int, length - bytes, blksz - offset); + if (bh[i]) { + memcpy(buf + bytes, bh[i]->b_data + offset, avail); + put_bh(bh[i]); + } + bytes += avail; + offset = 0; + } +} + +void free_page_array(struct page **page, int nr_pages) +{ + int i; + + for (i = 0; i < nr_pages; ++i) + __free_page(page[i]); + kfree(page); +} + +struct page **alloc_page_array(int nr_pages, int gfp_mask) +{ + int i; + struct page **page; + + page = kcalloc(nr_pages, sizeof(struct page *), gfp_mask); + if (!page) + return NULL; + for (i = 0; i < nr_pages; ++i) { + page[i] = alloc_page(gfp_mask); + if (!page[i]) { + free_page_array(page, i); + return NULL; + } + } + return page; +} diff --git a/fs/squashfs/page_actor.h b/fs/squashfs/page_actor.h index d2df0544e0df..aa1ed790b5a3 100644 --- a/fs/squashfs/page_actor.h +++ b/fs/squashfs/page_actor.h @@ -5,37 +5,61 @@ * Phillip Lougher * * This work is licensed under the terms of the GNU GPL, version 2. See - * the COPYING file in the top-level directory. + * the COPYING file in the top-level squashfsory. */ struct squashfs_page_actor { - union { - void **buffer; - struct page **page; - }; + struct page **page; void *pageaddr; - void *(*squashfs_first_page)(struct squashfs_page_actor *); - void *(*squashfs_next_page)(struct squashfs_page_actor *); - void (*squashfs_finish_page)(struct squashfs_page_actor *); int pages; int length; int next_page; + void (*release_pages)(struct page **, int, int); }; -extern struct squashfs_page_actor *squashfs_page_actor_init(void **, int, int); -extern struct squashfs_page_actor *squashfs_page_actor_init_special(struct page - **, int, int); +extern struct squashfs_page_actor *squashfs_page_actor_init(struct page **, + int, int, void (*)(struct page **, int, int)); +extern void squashfs_page_actor_free(struct squashfs_page_actor *, int); + +extern void squashfs_actor_to_buf(struct squashfs_page_actor *, void *, int); +extern void squashfs_buf_to_actor(void *, struct squashfs_page_actor *, int); +extern void squashfs_bh_to_actor(struct buffer_head **, int, + struct squashfs_page_actor *, int, int, int); +extern void squashfs_bh_to_buf(struct buffer_head **, int, void *, int, int, + int); + +/* + * Calling code should avoid sleeping between calls to squashfs_first_page() + * and squashfs_finish_page(). + */ static inline void *squashfs_first_page(struct squashfs_page_actor *actor) { - return actor->squashfs_first_page(actor); -} -static inline void *squashfs_next_page(struct squashfs_page_actor *actor) -{ - return actor->squashfs_next_page(actor); -} -static inline void squashfs_finish_page(struct squashfs_page_actor *actor) -{ - actor->squashfs_finish_page(actor); + actor->next_page = 1; + return actor->pageaddr = actor->page[0] ? kmap_atomic(actor->page[0]) + : NULL; } +static inline void *squashfs_next_page(struct squashfs_page_actor *actor) +{ + if (!IS_ERR_OR_NULL(actor->pageaddr)) + kunmap_atomic(actor->pageaddr); + + if (actor->next_page == actor->pages) + return actor->pageaddr = ERR_PTR(-ENODATA); + + actor->pageaddr = actor->page[actor->next_page] ? + kmap_atomic(actor->page[actor->next_page]) : NULL; + ++actor->next_page; + return actor->pageaddr; +} + +static inline void squashfs_finish_page(struct squashfs_page_actor *actor) +{ + if (!IS_ERR_OR_NULL(actor->pageaddr)) + kunmap_atomic(actor->pageaddr); +} + +extern struct page **alloc_page_array(int, int); +extern void free_page_array(struct page **, int); + #endif diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h index 1da565cb50c3..8a6995de0277 100644 --- a/fs/squashfs/squashfs_fs_sb.h +++ b/fs/squashfs/squashfs_fs_sb.h @@ -49,7 +49,7 @@ struct squashfs_cache_entry { int num_waiters; wait_queue_head_t wait_queue; struct squashfs_cache *cache; - void **data; + struct page **page; struct squashfs_page_actor *actor; }; diff --git a/fs/squashfs/xz_wrapper.c b/fs/squashfs/xz_wrapper.c index c609624e4b8a..14cd373e1897 100644 --- a/fs/squashfs/xz_wrapper.c +++ b/fs/squashfs/xz_wrapper.c @@ -55,7 +55,7 @@ static void *squashfs_xz_comp_opts(struct squashfs_sb_info *msblk, struct comp_opts *opts; int err = 0, n; - opts = kmalloc(sizeof(*opts), GFP_KERNEL); + opts = kmalloc(sizeof(*opts), GFP_ATOMIC); if (opts == NULL) { err = -ENOMEM; goto out2; @@ -136,6 +136,7 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm, enum xz_ret xz_err; int avail, total = 0, k = 0; struct squashfs_xz *stream = strm; + void *buf = NULL; xz_dec_reset(stream->state); stream->buf.in_pos = 0; @@ -156,12 +157,20 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm, if (stream->buf.out_pos == stream->buf.out_size) { stream->buf.out = squashfs_next_page(output); - if (stream->buf.out != NULL) { + if (!IS_ERR(stream->buf.out)) { stream->buf.out_pos = 0; total += PAGE_CACHE_SIZE; } } + if (!stream->buf.out) { + if (!buf) { + buf = kmalloc(PAGE_CACHE_SIZE, GFP_ATOMIC); + if (!buf) + goto out; + } + stream->buf.out = buf; + } xz_err = xz_dec_run(stream->state, &stream->buf); if (stream->buf.in_pos == stream->buf.in_size && k < b) @@ -173,11 +182,13 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm, if (xz_err != XZ_STREAM_END || k < b) goto out; + kfree(buf); return total + stream->buf.out_pos; out: for (; k < b; k++) put_bh(bh[k]); + kfree(buf); return -EIO; } diff --git a/fs/squashfs/zlib_wrapper.c b/fs/squashfs/zlib_wrapper.c index 8727caba6882..09c892b5308e 100644 --- a/fs/squashfs/zlib_wrapper.c +++ b/fs/squashfs/zlib_wrapper.c @@ -66,6 +66,7 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void *strm, struct buffer_head **bh, int b, int offset, int length, struct squashfs_page_actor *output) { + void *buf = NULL; int zlib_err, zlib_init = 0, k = 0; z_stream *stream = strm; @@ -84,10 +85,19 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void *strm, if (stream->avail_out == 0) { stream->next_out = squashfs_next_page(output); - if (stream->next_out != NULL) + if (!IS_ERR(stream->next_out)) stream->avail_out = PAGE_CACHE_SIZE; } + if (!stream->next_out) { + if (!buf) { + buf = kmalloc(PAGE_CACHE_SIZE, GFP_ATOMIC); + if (!buf) + goto out; + } + stream->next_out = buf; + } + if (!zlib_init) { zlib_err = zlib_inflateInit(stream); if (zlib_err != Z_OK) { @@ -115,11 +125,13 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void *strm, if (k < b) goto out; + kfree(buf); return stream->total_out; out: for (; k < b; k++) put_bh(bh[k]); + kfree(buf); return -EIO; } From 832771453f05c0eab9e3659d2085aff808f88e6a Mon Sep 17 00:00:00 2001 From: Adrien Schildknecht Date: Mon, 7 Nov 2016 12:37:55 -0800 Subject: [PATCH 213/521] Squashfs: replace buffer_head with BIO The 'll_rw_block' has been deprecated and BIO is now the basic container for block I/O within the kernel. Switching to BIO offers 2 advantages: 1/ It removes synchronous wait for the up-to-date buffers: SquashFS now deals with decompressions/copies asynchronously. Implementing an asynchronous mechanism to read data is needed to efficiently implement .readpages(). 2/ Prior to this patch, merging the read requests entirely depends on the IO scheduler. SquashFS has more information than the IO scheduler about what could be merged. Moreover, merging the reads at the FS level means that we rely less on the IO scheduler. Signed-off-by: Adrien Schildknecht Change-Id: I775d2e11f017476e1899518ab52d9d0a8a0bce28 --- fs/squashfs/block.c | 527 +++++++++++++++++++++++++++----------- fs/squashfs/file_direct.c | 204 +++++---------- fs/squashfs/squashfs.h | 6 + fs/squashfs/super.c | 7 + 4 files changed, 464 insertions(+), 280 deletions(-) diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c index 0cea9b9236d0..8a75d812ffdf 100644 --- a/fs/squashfs/block.c +++ b/fs/squashfs/block.c @@ -28,9 +28,12 @@ #include #include +#include #include #include +#include #include +#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -38,45 +41,355 @@ #include "decompressor.h" #include "page_actor.h" -/* - * Read the metadata block length, this is stored in the first two - * bytes of the metadata block. - */ -static struct buffer_head *get_block_length(struct super_block *sb, - u64 *cur_index, int *offset, int *length) +static struct workqueue_struct *squashfs_read_wq; + +struct squashfs_read_request { + struct super_block *sb; + u64 index; + int length; + int compressed; + int offset; + u64 read_end; + struct squashfs_page_actor *output; + enum { + SQUASHFS_COPY, + SQUASHFS_DECOMPRESS, + SQUASHFS_METADATA, + } data_processing; + bool synchronous; + + /* + * If the read is synchronous, it is possible to retrieve information + * about the request by setting these pointers. + */ + int *res; + int *bytes_read; + int *bytes_uncompressed; + + int nr_buffers; + struct buffer_head **bh; + struct work_struct offload; +}; + +struct squashfs_bio_request { + struct buffer_head **bh; + int nr_buffers; +}; + +static int squashfs_bio_submit(struct squashfs_read_request *req); + +int squashfs_init_read_wq(void) { - struct squashfs_sb_info *msblk = sb->s_fs_info; - struct buffer_head *bh; + squashfs_read_wq = create_workqueue("SquashFS read wq"); + return !!squashfs_read_wq; +} - bh = sb_bread(sb, *cur_index); - if (bh == NULL) - return NULL; +void squashfs_destroy_read_wq(void) +{ + flush_workqueue(squashfs_read_wq); + destroy_workqueue(squashfs_read_wq); +} - if (msblk->devblksize - *offset == 1) { - *length = (unsigned char) bh->b_data[*offset]; - put_bh(bh); - bh = sb_bread(sb, ++(*cur_index)); - if (bh == NULL) - return NULL; - *length |= (unsigned char) bh->b_data[0] << 8; - *offset = 1; - } else { - *length = (unsigned char) bh->b_data[*offset] | - (unsigned char) bh->b_data[*offset + 1] << 8; - *offset += 2; +static void free_read_request(struct squashfs_read_request *req, int error) +{ + if (!req->synchronous) + squashfs_page_actor_free(req->output, error); + if (req->res) + *(req->res) = error; + kfree(req->bh); + kfree(req); +} - if (*offset == msblk->devblksize) { - put_bh(bh); - bh = sb_bread(sb, ++(*cur_index)); - if (bh == NULL) - return NULL; - *offset = 0; +static void squashfs_process_blocks(struct squashfs_read_request *req) +{ + int error = 0; + int bytes, i, length; + struct squashfs_sb_info *msblk = req->sb->s_fs_info; + struct squashfs_page_actor *actor = req->output; + struct buffer_head **bh = req->bh; + int nr_buffers = req->nr_buffers; + + for (i = 0; i < nr_buffers; ++i) { + if (!bh[i]) + continue; + wait_on_buffer(bh[i]); + if (!buffer_uptodate(bh[i])) + error = -EIO; + } + if (error) + goto cleanup; + + if (req->data_processing == SQUASHFS_METADATA) { + /* Extract the length of the metadata block */ + if (req->offset != msblk->devblksize - 1) + length = *((u16 *)(bh[0]->b_data + req->offset)); + else { + length = bh[0]->b_data[req->offset]; + length |= bh[1]->b_data[0] << 8; + } + req->compressed = SQUASHFS_COMPRESSED(length); + req->data_processing = req->compressed ? SQUASHFS_DECOMPRESS + : SQUASHFS_COPY; + length = SQUASHFS_COMPRESSED_SIZE(length); + if (req->index + length + 2 > req->read_end) { + for (i = 0; i < nr_buffers; ++i) + put_bh(bh[i]); + kfree(bh); + req->length = length; + req->index += 2; + squashfs_bio_submit(req); + return; + } + req->length = length; + req->offset = (req->offset + 2) % PAGE_SIZE; + if (req->offset < 2) { + put_bh(bh[0]); + ++bh; + --nr_buffers; + } + } + if (req->bytes_read) + *(req->bytes_read) = req->length; + + if (req->data_processing == SQUASHFS_COPY) { + squashfs_bh_to_actor(bh, nr_buffers, req->output, req->offset, + req->length, msblk->devblksize); + } else if (req->data_processing == SQUASHFS_DECOMPRESS) { + req->length = squashfs_decompress(msblk, bh, nr_buffers, + req->offset, req->length, actor); + if (req->length < 0) { + error = -EIO; + goto cleanup; } } - return bh; + /* Last page may have trailing bytes not filled */ + bytes = req->length % PAGE_SIZE; + if (bytes && actor->page[actor->pages - 1]) + zero_user_segment(actor->page[actor->pages - 1], bytes, + PAGE_SIZE); + +cleanup: + if (req->bytes_uncompressed) + *(req->bytes_uncompressed) = req->length; + if (error) { + for (i = 0; i < nr_buffers; ++i) + if (bh[i]) + put_bh(bh[i]); + } + free_read_request(req, error); } +static void read_wq_handler(struct work_struct *work) +{ + squashfs_process_blocks(container_of(work, + struct squashfs_read_request, offload)); +} + +static void squashfs_bio_end_io(struct bio *bio) +{ + int i; + int error = bio->bi_error; + struct squashfs_bio_request *bio_req = bio->bi_private; + + bio_put(bio); + + for (i = 0; i < bio_req->nr_buffers; ++i) { + if (!bio_req->bh[i]) + continue; + if (!error) + set_buffer_uptodate(bio_req->bh[i]); + else + clear_buffer_uptodate(bio_req->bh[i]); + unlock_buffer(bio_req->bh[i]); + } + kfree(bio_req); +} + +static int actor_getblks(struct squashfs_read_request *req, u64 block) +{ + int i; + + req->bh = kmalloc_array(req->nr_buffers, sizeof(*(req->bh)), GFP_NOIO); + if (!req->bh) + return -ENOMEM; + + for (i = 0; i < req->nr_buffers; ++i) { + req->bh[i] = sb_getblk(req->sb, block + i); + if (!req->bh[i]) { + while (--i) { + if (req->bh[i]) + put_bh(req->bh[i]); + } + return -1; + } + } + return 0; +} + +static int squashfs_bio_submit(struct squashfs_read_request *req) +{ + struct bio *bio = NULL; + struct buffer_head *bh; + struct squashfs_bio_request *bio_req = NULL; + int b = 0, prev_block = 0; + struct squashfs_sb_info *msblk = req->sb->s_fs_info; + + u64 read_start = round_down(req->index, msblk->devblksize); + u64 read_end = round_up(req->index + req->length, msblk->devblksize); + sector_t block = read_start >> msblk->devblksize_log2; + sector_t block_end = read_end >> msblk->devblksize_log2; + int offset = read_start - round_down(req->index, PAGE_SIZE); + int nr_buffers = block_end - block; + int blksz = msblk->devblksize; + int bio_max_pages = nr_buffers > BIO_MAX_PAGES ? BIO_MAX_PAGES + : nr_buffers; + + /* Setup the request */ + req->read_end = read_end; + req->offset = req->index - read_start; + req->nr_buffers = nr_buffers; + if (actor_getblks(req, block) < 0) + goto getblk_failed; + + /* Create and submit the BIOs */ + for (b = 0; b < nr_buffers; ++b, offset += blksz) { + bh = req->bh[b]; + if (!bh || !trylock_buffer(bh)) + continue; + if (buffer_uptodate(bh)) { + unlock_buffer(bh); + continue; + } + offset %= PAGE_SIZE; + + /* Append the buffer to the current BIO if it is contiguous */ + if (bio && bio_req && prev_block + 1 == b) { + if (bio_add_page(bio, bh->b_page, blksz, offset)) { + bio_req->nr_buffers += 1; + prev_block = b; + continue; + } + } + + /* Otherwise, submit the current BIO and create a new one */ + if (bio) + submit_bio(READ, bio); + bio_req = kcalloc(1, sizeof(struct squashfs_bio_request), + GFP_NOIO); + if (!bio_req) + goto req_alloc_failed; + bio_req->bh = &req->bh[b]; + bio = bio_alloc(GFP_NOIO, bio_max_pages); + if (!bio) + goto bio_alloc_failed; + bio->bi_bdev = req->sb->s_bdev; + bio->bi_iter.bi_sector = (block + b) + << (msblk->devblksize_log2 - 9); + bio->bi_private = bio_req; + bio->bi_end_io = squashfs_bio_end_io; + + bio_add_page(bio, bh->b_page, blksz, offset); + bio_req->nr_buffers += 1; + prev_block = b; + } + if (bio) + submit_bio(READ, bio); + + if (req->synchronous) + squashfs_process_blocks(req); + else { + INIT_WORK(&req->offload, read_wq_handler); + schedule_work(&req->offload); + } + return 0; + +bio_alloc_failed: + kfree(bio_req); +req_alloc_failed: + unlock_buffer(bh); + while (--nr_buffers >= b) + if (req->bh[nr_buffers]) + put_bh(req->bh[nr_buffers]); + while (--b >= 0) + if (req->bh[b]) + wait_on_buffer(req->bh[b]); +getblk_failed: + free_read_request(req, -ENOMEM); + return -ENOMEM; +} + +static int read_metadata_block(struct squashfs_read_request *req, + u64 *next_index) +{ + int ret, error, bytes_read = 0, bytes_uncompressed = 0; + struct squashfs_sb_info *msblk = req->sb->s_fs_info; + + if (req->index + 2 > msblk->bytes_used) { + free_read_request(req, -EINVAL); + return -EINVAL; + } + req->length = 2; + + /* Do not read beyond the end of the device */ + if (req->index + req->length > msblk->bytes_used) + req->length = msblk->bytes_used - req->index; + req->data_processing = SQUASHFS_METADATA; + + /* + * Reading metadata is always synchronous because we don't know the + * length in advance and the function is expected to update + * 'next_index' and return the length. + */ + req->synchronous = true; + req->res = &error; + req->bytes_read = &bytes_read; + req->bytes_uncompressed = &bytes_uncompressed; + + TRACE("Metadata block @ 0x%llx, %scompressed size %d, src size %d\n", + req->index, req->compressed ? "" : "un", bytes_read, + req->output->length); + + ret = squashfs_bio_submit(req); + if (ret) + return ret; + if (error) + return error; + if (next_index) + *next_index += 2 + bytes_read; + return bytes_uncompressed; +} + +static int read_data_block(struct squashfs_read_request *req, int length, + u64 *next_index, bool synchronous) +{ + int ret, error = 0, bytes_uncompressed = 0, bytes_read = 0; + + req->compressed = SQUASHFS_COMPRESSED_BLOCK(length); + req->length = length = SQUASHFS_COMPRESSED_SIZE_BLOCK(length); + req->data_processing = req->compressed ? SQUASHFS_DECOMPRESS + : SQUASHFS_COPY; + + req->synchronous = synchronous; + if (synchronous) { + req->res = &error; + req->bytes_read = &bytes_read; + req->bytes_uncompressed = &bytes_uncompressed; + } + + TRACE("Data block @ 0x%llx, %scompressed size %d, src size %d\n", + req->index, req->compressed ? "" : "un", req->length, + req->output->length); + + ret = squashfs_bio_submit(req); + if (ret) + return ret; + if (synchronous) + ret = error ? error : bytes_uncompressed; + if (next_index) + *next_index += length; + return ret; +} /* * Read and decompress a metadata block or datablock. Length is non-zero @@ -87,128 +400,50 @@ static struct buffer_head *get_block_length(struct super_block *sb, * generated a larger block - this does occasionally happen with compression * algorithms). */ -int squashfs_read_data(struct super_block *sb, u64 index, int length, - u64 *next_index, struct squashfs_page_actor *output) +static int __squashfs_read_data(struct super_block *sb, u64 index, int length, + u64 *next_index, struct squashfs_page_actor *output, bool sync) { - struct squashfs_sb_info *msblk = sb->s_fs_info; - struct buffer_head **bh; - int offset = index & ((1 << msblk->devblksize_log2) - 1); - u64 cur_index = index >> msblk->devblksize_log2; - int bytes, compressed, b = 0, k = 0, avail, i; + struct squashfs_read_request *req; - bh = kcalloc(((output->length + msblk->devblksize - 1) - >> msblk->devblksize_log2) + 1, sizeof(*bh), GFP_KERNEL); - if (bh == NULL) + req = kcalloc(1, sizeof(struct squashfs_read_request), GFP_KERNEL); + if (!req) { + if (!sync) + squashfs_page_actor_free(output, -ENOMEM); return -ENOMEM; - - if (length) { - /* - * Datablock. - */ - bytes = -offset; - compressed = SQUASHFS_COMPRESSED_BLOCK(length); - length = SQUASHFS_COMPRESSED_SIZE_BLOCK(length); - if (next_index) - *next_index = index + length; - - TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n", - index, compressed ? "" : "un", length, output->length); - - if (length < 0 || length > output->length || - (index + length) > msblk->bytes_used) - goto read_failure; - - for (b = 0; bytes < length; b++, cur_index++) { - bh[b] = sb_getblk(sb, cur_index); - if (bh[b] == NULL) - goto block_release; - bytes += msblk->devblksize; - } - ll_rw_block(READ, b, bh); - } else { - /* - * Metadata block. - */ - if ((index + 2) > msblk->bytes_used) - goto read_failure; - - bh[0] = get_block_length(sb, &cur_index, &offset, &length); - if (bh[0] == NULL) - goto read_failure; - b = 1; - - bytes = msblk->devblksize - offset; - compressed = SQUASHFS_COMPRESSED(length); - length = SQUASHFS_COMPRESSED_SIZE(length); - if (next_index) - *next_index = index + length + 2; - - TRACE("Block @ 0x%llx, %scompressed size %d\n", index, - compressed ? "" : "un", length); - - if (length < 0 || length > output->length || - (index + length) > msblk->bytes_used) - goto block_release; - - for (; bytes < length; b++) { - bh[b] = sb_getblk(sb, ++cur_index); - if (bh[b] == NULL) - goto block_release; - bytes += msblk->devblksize; - } - ll_rw_block(READ, b - 1, bh + 1); } - for (i = 0; i < b; i++) { - wait_on_buffer(bh[i]); - if (!buffer_uptodate(bh[i])) - goto block_release; + req->sb = sb; + req->index = index; + req->output = output; + + if (next_index) + *next_index = index; + + if (length) + length = read_data_block(req, length, next_index, sync); + else + length = read_metadata_block(req, next_index); + + if (length < 0) { + ERROR("squashfs_read_data failed to read block 0x%llx\n", + (unsigned long long)index); + return -EIO; } - if (compressed) { - length = squashfs_decompress(msblk, bh, b, offset, length, - output); - if (length < 0) - goto read_failure; - } else { - /* - * Block is uncompressed. - */ - int in, pg_offset = 0; - void *data = squashfs_first_page(output); - - for (bytes = length; k < b; k++) { - in = min(bytes, msblk->devblksize - offset); - bytes -= in; - while (in) { - if (pg_offset == PAGE_CACHE_SIZE) { - data = squashfs_next_page(output); - pg_offset = 0; - } - avail = min_t(int, in, PAGE_CACHE_SIZE - - pg_offset); - memcpy(data + pg_offset, bh[k]->b_data + offset, - avail); - in -= avail; - pg_offset += avail; - offset += avail; - } - offset = 0; - put_bh(bh[k]); - } - squashfs_finish_page(output); - } - - kfree(bh); return length; - -block_release: - for (; k < b; k++) - put_bh(bh[k]); - -read_failure: - ERROR("squashfs_read_data failed to read block 0x%llx\n", - (unsigned long long) index); - kfree(bh); - return -EIO; +} + +int squashfs_read_data(struct super_block *sb, u64 index, int length, + u64 *next_index, struct squashfs_page_actor *output) +{ + return __squashfs_read_data(sb, index, length, next_index, output, + true); +} + +int squashfs_read_data_async(struct super_block *sb, u64 index, int length, + u64 *next_index, struct squashfs_page_actor *output) +{ + + return __squashfs_read_data(sb, index, length, next_index, output, + false); } diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c index 9d033ec0f133..10fe1272535f 100644 --- a/fs/squashfs/file_direct.c +++ b/fs/squashfs/file_direct.c @@ -20,8 +20,68 @@ #include "squashfs.h" #include "page_actor.h" -static int squashfs_read_cache(struct page *target_page, u64 block, int bsize, - int pages, struct page **page); +static void release_actor_pages(struct page **page, int pages, int error) +{ + int i; + + for (i = 0; i < pages; i++) { + if (!page[i]) + continue; + flush_dcache_page(page[i]); + if (!error) + SetPageUptodate(page[i]); + else { + SetPageError(page[i]); + zero_user_segment(page[i], 0, PAGE_CACHE_SIZE); + } + unlock_page(page[i]); + put_page(page[i]); + } + kfree(page); +} + +/* + * Create a "page actor" which will kmap and kunmap the + * page cache pages appropriately within the decompressor + */ +static struct squashfs_page_actor *actor_from_page_cache( + struct page *target_page, int start_index, int nr_pages) +{ + int i, n; + struct page **page; + struct squashfs_page_actor *actor; + + page = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL); + if (!page) + return NULL; + + /* Try to grab all the pages covered by the SquashFS block */ + for (i = 0, n = start_index; i < nr_pages; i++, n++) { + if (target_page->index == n) { + page[i] = target_page; + } else { + page[i] = grab_cache_page_nowait(target_page->mapping, + n); + if (page[i] == NULL) + continue; + } + + if (PageUptodate(page[i])) { + unlock_page(page[i]); + put_page(page[i]); + page[i] = NULL; + } + } + + actor = squashfs_page_actor_init(page, nr_pages, 0, + release_actor_pages); + if (!actor) { + release_actor_pages(page, nr_pages, -ENOMEM); + kfree(page); + return NULL; + } + return actor; +} /* Read separately compressed datablock directly into page cache */ int squashfs_readpage_block(struct page *target_page, u64 block, int bsize) @@ -34,143 +94,19 @@ int squashfs_readpage_block(struct page *target_page, u64 block, int bsize) int mask = (1 << (msblk->block_log - PAGE_CACHE_SHIFT)) - 1; int start_index = target_page->index & ~mask; int end_index = start_index | mask; - int i, n, pages, missing_pages, bytes, res = -ENOMEM; - struct page **page; + int pages, res = -ENOMEM; struct squashfs_page_actor *actor; - void *pageaddr; if (end_index > file_end) end_index = file_end; - pages = end_index - start_index + 1; - page = kmalloc_array(pages, sizeof(void *), GFP_KERNEL); - if (page == NULL) - return res; + actor = actor_from_page_cache(target_page, start_index, pages); + if (!actor) + return -ENOMEM; - /* - * Create a "page actor" which will kmap and kunmap the - * page cache pages appropriately within the decompressor - */ - actor = squashfs_page_actor_init(page, pages, 0, NULL); - if (actor == NULL) - goto out; - - /* Try to grab all the pages covered by the Squashfs block */ - for (missing_pages = 0, i = 0, n = start_index; i < pages; i++, n++) { - page[i] = (n == target_page->index) ? target_page : - grab_cache_page_nowait(target_page->mapping, n); - - if (page[i] == NULL) { - missing_pages++; - continue; - } - - if (PageUptodate(page[i])) { - unlock_page(page[i]); - page_cache_release(page[i]); - page[i] = NULL; - missing_pages++; - } - } - - if (missing_pages) { - /* - * Couldn't get one or more pages, this page has either - * been VM reclaimed, but others are still in the page cache - * and uptodate, or we're racing with another thread in - * squashfs_readpage also trying to grab them. Fall back to - * using an intermediate buffer. - */ - res = squashfs_read_cache(target_page, block, bsize, pages, - page); - if (res < 0) - goto mark_errored; - - goto out; - } - - /* Decompress directly into the page cache buffers */ - res = squashfs_read_data(inode->i_sb, block, bsize, NULL, actor); - if (res < 0) - goto mark_errored; - - /* Last page may have trailing bytes not filled */ - bytes = res % PAGE_CACHE_SIZE; - if (bytes) { - pageaddr = kmap_atomic(page[pages - 1]); - memset(pageaddr + bytes, 0, PAGE_CACHE_SIZE - bytes); - kunmap_atomic(pageaddr); - } - - /* Mark pages as uptodate, unlock and release */ - for (i = 0; i < pages; i++) { - flush_dcache_page(page[i]); - SetPageUptodate(page[i]); - unlock_page(page[i]); - if (page[i] != target_page) - page_cache_release(page[i]); - } - - kfree(actor); - kfree(page); - - return 0; - -mark_errored: - /* Decompression failed, mark pages as errored. Target_page is - * dealt with by the caller - */ - for (i = 0; i < pages; i++) { - if (page[i] == NULL || page[i] == target_page) - continue; - flush_dcache_page(page[i]); - SetPageError(page[i]); - unlock_page(page[i]); - page_cache_release(page[i]); - } - -out: - squashfs_page_actor_free(actor, 0); - kfree(page); - return res; -} - - -static int squashfs_read_cache(struct page *target_page, u64 block, int bsize, - int pages, struct page **page) -{ - struct inode *i = target_page->mapping->host; - struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb, - block, bsize); - int bytes = buffer->length, res = buffer->error, n, offset = 0; - void *pageaddr; - - if (res) { - ERROR("Unable to read page, block %llx, size %x\n", block, - bsize); - goto out; - } - - for (n = 0; n < pages && bytes > 0; n++, - bytes -= PAGE_CACHE_SIZE, offset += PAGE_CACHE_SIZE) { - int avail = min_t(int, bytes, PAGE_CACHE_SIZE); - - if (page[n] == NULL) - continue; - - pageaddr = kmap_atomic(page[n]); - squashfs_copy_data(pageaddr, buffer, offset, avail); - memset(pageaddr + avail, 0, PAGE_CACHE_SIZE - avail); - kunmap_atomic(pageaddr); - flush_dcache_page(page[n]); - SetPageUptodate(page[n]); - unlock_page(page[n]); - if (page[n] != target_page) - page_cache_release(page[n]); - } - -out: - squashfs_cache_put(buffer); - return res; + get_page(target_page); + res = squashfs_read_data_async(inode->i_sb, block, bsize, NULL, + actor); + return res < 0 ? res : 0; } diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h index 887d6d270080..985317fbf6c2 100644 --- a/fs/squashfs/squashfs.h +++ b/fs/squashfs/squashfs.h @@ -28,8 +28,14 @@ #define WARNING(s, args...) pr_warn("SQUASHFS: "s, ## args) /* block.c */ +extern int squashfs_init_read_wq(void); +extern void squashfs_destroy_read_wq(void); extern int squashfs_read_data(struct super_block *, u64, int, u64 *, struct squashfs_page_actor *); +extern int squashfs_read_data(struct super_block *, u64, int, u64 *, + struct squashfs_page_actor *); +extern int squashfs_read_data_async(struct super_block *, u64, int, u64 *, + struct squashfs_page_actor *); /* cache.c */ extern struct squashfs_cache *squashfs_cache_init(char *, int, int); diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c index 5056babe00df..61cd0b39ed0e 100644 --- a/fs/squashfs/super.c +++ b/fs/squashfs/super.c @@ -444,9 +444,15 @@ static int __init init_squashfs_fs(void) if (err) return err; + if (!squashfs_init_read_wq()) { + destroy_inodecache(); + return -ENOMEM; + } + err = register_filesystem(&squashfs_fs_type); if (err) { destroy_inodecache(); + squashfs_destroy_read_wq(); return err; } @@ -460,6 +466,7 @@ static void __exit exit_squashfs_fs(void) { unregister_filesystem(&squashfs_fs_type); destroy_inodecache(); + squashfs_destroy_read_wq(); } From 9c6d9abc8fa9d5207da81d4f7bb14b9c315e11e7 Mon Sep 17 00:00:00 2001 From: Adrien Schildknecht Date: Mon, 7 Nov 2016 12:41:42 -0800 Subject: [PATCH 214/521] Squashfs: implement .readpages() Squashfs does not implement .readpages(), so the kernel just repeatedly calls .readpage(). The readpages function tries to pack as much pages as possible in the same page actor so that only 1 read request is issued. Now that the read requests are asynchronous, the kernel can truly prefetch pages using its readahead algorithm. Signed-off-by: Adrien Schildknecht Change-Id: Ice70e029dc24526f61e4e5a1a902588be2212498 --- fs/squashfs/file.c | 138 ++++++++++++++++++++++++++++---------- fs/squashfs/file_direct.c | 64 ++++++++++++------ fs/squashfs/squashfs.h | 5 +- 3 files changed, 151 insertions(+), 56 deletions(-) diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c index e5c9689062ba..6f5ef8d7e55a 100644 --- a/fs/squashfs/file.c +++ b/fs/squashfs/file.c @@ -47,12 +47,16 @@ #include #include #include +#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" #include "squashfs_fs_i.h" #include "squashfs.h" +// Backported from 4.5 +#define lru_to_page(head) (list_entry((head)->prev, struct page, lru)) + /* * Locate cache slot in range [offset, index] for specified inode. If * there's more than one return the slot closest to index. @@ -438,6 +442,21 @@ static int squashfs_readpage_fragment(struct page *page) return res; } +static int squashfs_readpages_fragment(struct page *page, + struct list_head *readahead_pages, struct address_space *mapping) +{ + if (!page) { + page = lru_to_page(readahead_pages); + list_del(&page->lru); + if (add_to_page_cache_lru(page, mapping, page->index, + mapping_gfp_constraint(mapping, GFP_KERNEL))) { + put_page(page); + return 0; + } + } + return squashfs_readpage_fragment(page); +} + static int squashfs_readpage_sparse(struct page *page, int index, int file_end) { struct inode *inode = page->mapping->host; @@ -450,54 +469,105 @@ static int squashfs_readpage_sparse(struct page *page, int index, int file_end) return 0; } -static int squashfs_readpage(struct file *file, struct page *page) +static int squashfs_readpages_sparse(struct page *page, + struct list_head *readahead_pages, int index, int file_end, + struct address_space *mapping) { - struct inode *inode = page->mapping->host; + if (!page) { + page = lru_to_page(readahead_pages); + list_del(&page->lru); + if (add_to_page_cache_lru(page, mapping, page->index, + mapping_gfp_constraint(mapping, GFP_KERNEL))) { + put_page(page); + return 0; + } + } + return squashfs_readpage_sparse(page, index, file_end); +} + +static int __squashfs_readpages(struct file *file, struct page *page, + struct list_head *readahead_pages, unsigned int nr_pages, + struct address_space *mapping) +{ + struct inode *inode = mapping->host; struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info; - int index = page->index >> (msblk->block_log - PAGE_CACHE_SHIFT); int file_end = i_size_read(inode) >> msblk->block_log; int res; - void *pageaddr; + + do { + struct page *cur_page = page ? page + : lru_to_page(readahead_pages); + int page_index = cur_page->index; + int index = page_index >> (msblk->block_log - PAGE_CACHE_SHIFT); + + if (page_index >= ((i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> + PAGE_CACHE_SHIFT)) + return 1; + + if (index < file_end || squashfs_i(inode)->fragment_block == + SQUASHFS_INVALID_BLK) { + u64 block = 0; + int bsize = read_blocklist(inode, index, &block); + + if (bsize < 0) + return -1; + + if (bsize == 0) { + res = squashfs_readpages_sparse(page, + readahead_pages, index, file_end, + mapping); + } else { + res = squashfs_readpages_block(page, + readahead_pages, &nr_pages, mapping, + page_index, block, bsize); + } + } else { + res = squashfs_readpages_fragment(page, + readahead_pages, mapping); + } + if (res) + return 0; + page = NULL; + } while (readahead_pages && !list_empty(readahead_pages)); + + return 0; +} + +static int squashfs_readpage(struct file *file, struct page *page) +{ + int ret; TRACE("Entered squashfs_readpage, page index %lx, start block %llx\n", - page->index, squashfs_i(inode)->start); + page->index, squashfs_i(page->mapping->host)->start); - if (page->index >= ((i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> - PAGE_CACHE_SHIFT)) - goto out; + get_page(page); - if (index < file_end || squashfs_i(inode)->fragment_block == - SQUASHFS_INVALID_BLK) { - u64 block = 0; - int bsize = read_blocklist(inode, index, &block); - if (bsize < 0) - goto error_out; - - if (bsize == 0) - res = squashfs_readpage_sparse(page, index, file_end); + ret = __squashfs_readpages(file, page, NULL, 1, page->mapping); + if (ret) { + flush_dcache_page(page); + if (ret < 0) + SetPageError(page); else - res = squashfs_readpage_block(page, block, bsize); - } else - res = squashfs_readpage_fragment(page); + SetPageUptodate(page); + zero_user_segment(page, 0, PAGE_CACHE_SIZE); + unlock_page(page); + put_page(page); + } - if (!res) - return 0; - -error_out: - SetPageError(page); -out: - pageaddr = kmap_atomic(page); - memset(pageaddr, 0, PAGE_CACHE_SIZE); - kunmap_atomic(pageaddr); - flush_dcache_page(page); - if (!PageError(page)) - SetPageUptodate(page); - unlock_page(page); + return 0; +} +static int squashfs_readpages(struct file *file, struct address_space *mapping, + struct list_head *pages, unsigned int nr_pages) +{ + TRACE("Entered squashfs_readpages, %u pages, first page index %lx\n", + nr_pages, lru_to_page(pages)->index); + __squashfs_readpages(file, NULL, pages, nr_pages, mapping); return 0; } const struct address_space_operations squashfs_aops = { - .readpage = squashfs_readpage + .readpage = squashfs_readpage, + .readpages = squashfs_readpages, }; diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c index 10fe1272535f..3fb4ce210edb 100644 --- a/fs/squashfs/file_direct.c +++ b/fs/squashfs/file_direct.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -20,6 +21,9 @@ #include "squashfs.h" #include "page_actor.h" +// Backported from 4.5 +#define lru_to_page(head) (list_entry((head)->prev, struct page, lru)) + static void release_actor_pages(struct page **page, int pages, int error) { int i; @@ -45,23 +49,40 @@ static void release_actor_pages(struct page **page, int pages, int error) * page cache pages appropriately within the decompressor */ static struct squashfs_page_actor *actor_from_page_cache( - struct page *target_page, int start_index, int nr_pages) + unsigned int actor_pages, struct page *target_page, + struct list_head *rpages, unsigned int *nr_pages, int start_index, + struct address_space *mapping) { - int i, n; struct page **page; struct squashfs_page_actor *actor; + int i, n; + gfp_t gfp = mapping_gfp_constraint(mapping, GFP_KERNEL); - page = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL); + page = kmalloc_array(actor_pages, sizeof(void *), GFP_KERNEL); if (!page) return NULL; - /* Try to grab all the pages covered by the SquashFS block */ - for (i = 0, n = start_index; i < nr_pages; i++, n++) { - if (target_page->index == n) { + for (i = 0, n = start_index; i < actor_pages; i++, n++) { + if (target_page == NULL && rpages && !list_empty(rpages)) { + struct page *cur_page = lru_to_page(rpages); + + if (cur_page->index < start_index + actor_pages) { + list_del(&cur_page->lru); + --(*nr_pages); + if (add_to_page_cache_lru(cur_page, mapping, + cur_page->index, gfp)) + put_page(cur_page); + else + target_page = cur_page; + } else + rpages = NULL; + } + + if (target_page && target_page->index == n) { page[i] = target_page; + target_page = NULL; } else { - page[i] = grab_cache_page_nowait(target_page->mapping, - n); + page[i] = grab_cache_page_nowait(mapping, n); if (page[i] == NULL) continue; } @@ -73,39 +94,42 @@ static struct squashfs_page_actor *actor_from_page_cache( } } - actor = squashfs_page_actor_init(page, nr_pages, 0, + actor = squashfs_page_actor_init(page, actor_pages, 0, release_actor_pages); if (!actor) { - release_actor_pages(page, nr_pages, -ENOMEM); + release_actor_pages(page, actor_pages, -ENOMEM); kfree(page); return NULL; } return actor; } -/* Read separately compressed datablock directly into page cache */ -int squashfs_readpage_block(struct page *target_page, u64 block, int bsize) +int squashfs_readpages_block(struct page *target_page, + struct list_head *readahead_pages, + unsigned int *nr_pages, + struct address_space *mapping, + int page_index, u64 block, int bsize) { - struct inode *inode = target_page->mapping->host; + struct squashfs_page_actor *actor; + struct inode *inode = mapping->host; struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info; - int file_end = (i_size_read(inode) - 1) >> PAGE_CACHE_SHIFT; int mask = (1 << (msblk->block_log - PAGE_CACHE_SHIFT)) - 1; - int start_index = target_page->index & ~mask; + int start_index = page_index & ~mask; int end_index = start_index | mask; - int pages, res = -ENOMEM; - struct squashfs_page_actor *actor; + int actor_pages, res; if (end_index > file_end) end_index = file_end; - pages = end_index - start_index + 1; + actor_pages = end_index - start_index + 1; - actor = actor_from_page_cache(target_page, start_index, pages); + actor = actor_from_page_cache(actor_pages, target_page, + readahead_pages, nr_pages, start_index, + mapping); if (!actor) return -ENOMEM; - get_page(target_page); res = squashfs_read_data_async(inode->i_sb, block, bsize, NULL, actor); return res < 0 ? res : 0; diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h index 985317fbf6c2..6093579c6c5d 100644 --- a/fs/squashfs/squashfs.h +++ b/fs/squashfs/squashfs.h @@ -76,8 +76,9 @@ extern __le64 *squashfs_read_fragment_index_table(struct super_block *, void squashfs_copy_cache(struct page *, struct squashfs_cache_entry *, int, int); -/* file_xxx.c */ -extern int squashfs_readpage_block(struct page *, u64, int); +/* file_direct.c */ +extern int squashfs_readpages_block(struct page *, struct list_head *, + unsigned int *, struct address_space *, int, u64, int); /* id.c */ extern int squashfs_get_id(struct super_block *, unsigned int, unsigned int *); From 0c2f831ee7b27dc478a3362f6743d0013c8bdd73 Mon Sep 17 00:00:00 2001 From: Adrien Schildknecht Date: Thu, 29 Sep 2016 15:25:30 -0700 Subject: [PATCH 215/521] Squashfs: optimize reading uncompressed data When dealing with uncompressed data, there is no need to read a whole block (default 128K) to get the desired page: the pages are independent from each others. This patch change the readpages logic so that reading uncompressed data only read the number of pages advised by the readahead algorithm. Moreover, if the page actor contains holes (i.e. pages that are already up-to-date), squashfs skips the buffer_head associated to those pages. This patch greatly improve the performance of random reads for uncompressed files because squashfs only read what is needed. It also reduces the number of unnecessary reads. Signed-off-by: Adrien Schildknecht Change-Id: I1850150fbf4b45c9dd138d88409fea1ab44054c0 --- fs/squashfs/block.c | 25 +++++++++++++++++++++++++ fs/squashfs/file_direct.c | 37 ++++++++++++++++++++++++++++++------- 2 files changed, 55 insertions(+), 7 deletions(-) diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c index 8a75d812ffdf..2eb66decc5ab 100644 --- a/fs/squashfs/block.c +++ b/fs/squashfs/block.c @@ -206,6 +206,22 @@ static void squashfs_bio_end_io(struct bio *bio) kfree(bio_req); } +static int bh_is_optional(struct squashfs_read_request *req, int idx) +{ + int start_idx, end_idx; + struct squashfs_sb_info *msblk = req->sb->s_fs_info; + + start_idx = (idx * msblk->devblksize - req->offset) / PAGE_CACHE_SIZE; + end_idx = ((idx + 1) * msblk->devblksize - req->offset + 1) / PAGE_CACHE_SIZE; + if (start_idx >= req->output->pages) + return 1; + if (start_idx < 0) + start_idx = end_idx; + if (end_idx >= req->output->pages) + end_idx = start_idx; + return !req->output->page[start_idx] && !req->output->page[end_idx]; +} + static int actor_getblks(struct squashfs_read_request *req, u64 block) { int i; @@ -215,6 +231,15 @@ static int actor_getblks(struct squashfs_read_request *req, u64 block) return -ENOMEM; for (i = 0; i < req->nr_buffers; ++i) { + /* + * When dealing with an uncompressed block, the actor may + * contains NULL pages. There's no need to read the buffers + * associated with these pages. + */ + if (!req->compressed && bh_is_optional(req, i)) { + req->bh[i] = NULL; + continue; + } req->bh[i] = sb_getblk(req->sb, block + i); if (!req->bh[i]) { while (--i) { diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c index 3fb4ce210edb..c97af4c6ccd0 100644 --- a/fs/squashfs/file_direct.c +++ b/fs/squashfs/file_direct.c @@ -114,15 +114,38 @@ int squashfs_readpages_block(struct page *target_page, struct squashfs_page_actor *actor; struct inode *inode = mapping->host; struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info; - int file_end = (i_size_read(inode) - 1) >> PAGE_CACHE_SHIFT; + int start_index, end_index, file_end, actor_pages, res; int mask = (1 << (msblk->block_log - PAGE_CACHE_SHIFT)) - 1; - int start_index = page_index & ~mask; - int end_index = start_index | mask; - int actor_pages, res; - if (end_index > file_end) - end_index = file_end; - actor_pages = end_index - start_index + 1; + /* + * If readpage() is called on an uncompressed datablock, we can just + * read the pages instead of fetching the whole block. + * This greatly improves the performance when a process keep doing + * random reads because we only fetch the necessary data. + * The readahead algorithm will take care of doing speculative reads + * if necessary. + * We can't read more than 1 block even if readahead provides use more + * pages because we don't know yet if the next block is compressed or + * not. + */ + if (bsize && !SQUASHFS_COMPRESSED_BLOCK(bsize)) { + u64 block_end = block + msblk->block_size; + + block += (page_index & mask) * PAGE_CACHE_SIZE; + actor_pages = (block_end - block) / PAGE_CACHE_SIZE; + if (*nr_pages < actor_pages) + actor_pages = *nr_pages; + start_index = page_index; + bsize = min_t(int, bsize, (PAGE_CACHE_SIZE * actor_pages) + | SQUASHFS_COMPRESSED_BIT_BLOCK); + } else { + file_end = (i_size_read(inode) - 1) >> PAGE_CACHE_SHIFT; + start_index = page_index & ~mask; + end_index = start_index | mask; + if (end_index > file_end) + end_index = file_end; + actor_pages = end_index - start_index + 1; + } actor = actor_from_page_cache(actor_pages, target_page, readahead_pages, nr_pages, start_index, From db943372859b8acf8b99e258c0f2146238793e12 Mon Sep 17 00:00:00 2001 From: Mohan Srinivasan Date: Fri, 3 Feb 2017 15:48:03 -0800 Subject: [PATCH 216/521] ANDROID: Refactor fs readpage/write tracepoints. Refactor the fs readpage/write tracepoints to move the inode->path lookup outside the tracepoint code, and pass a pointer to the path into the tracepoint code instead. This is necessary because the tracepoint code runs non-preemptible. Thanks to Trilok Soni for catching this in 4.4. Change-Id: I7486c5947918d155a30c61d6b9cd5027cf8fbe15 Signed-off-by: Mohan Srinivasan --- fs/ext4/inline.c | 12 ++++- fs/ext4/inode.c | 53 ++++++++++++++++------ fs/ext4/readpage.c | 6 +++ fs/f2fs/data.c | 40 ++++++++++++---- fs/f2fs/inline.c | 13 ++++-- fs/mpage.c | 6 +++ include/trace/events/android_fs.h | 44 ++++++++++++++++-- include/trace/events/android_fs_template.h | 34 +++----------- 8 files changed, 147 insertions(+), 61 deletions(-) diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 84da8fd0ab11..bc7c082b7913 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -503,8 +503,16 @@ int ext4_readpage_inline(struct inode *inode, struct page *page) return -EAGAIN; } - trace_android_fs_dataread_start(inode, page_offset(page), PAGE_SIZE, - current->pid, current->comm); + if (trace_android_fs_dataread_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_dataread_start(inode, page_offset(page), + PAGE_SIZE, current->pid, + path, current->comm); + } /* * Current inline data can only exist in the 1st page, diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 3b8a0b052988..1c733321b03c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1017,8 +1017,16 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, pgoff_t index; unsigned from, to; - trace_android_fs_datawrite_start(inode, pos, len, - current->pid, current->comm); + if (trace_android_fs_datawrite_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, pos, len, + current->pid, path, + current->comm); + } trace_ext4_write_begin(inode, pos, len, flags); /* * Reserve one block more for addition to orphan list in case @@ -2745,8 +2753,16 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, len, flags, pagep, fsdata); } *fsdata = (void *)0; - trace_android_fs_datawrite_start(inode, pos, len, - current->pid, current->comm); + if (trace_android_fs_datawrite_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, pos, len, + current->pid, + path, current->comm); + } trace_ext4_da_write_begin(inode, pos, len, flags); if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) { @@ -3355,16 +3371,27 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter, return 0; if (trace_android_fs_dataread_start_enabled() && - (iov_iter_rw(iter) == READ)) - trace_android_fs_dataread_start(inode, offset, count, - current->pid, - current->comm); - if (trace_android_fs_datawrite_start_enabled() && - (iov_iter_rw(iter) == WRITE)) - trace_android_fs_datawrite_start(inode, offset, count, - current->pid, - current->comm); + (iov_iter_rw(iter) == READ)) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_dataread_start(inode, offset, count, + current->pid, path, + current->comm); + } + if (trace_android_fs_datawrite_start_enabled() && + (iov_iter_rw(iter) == WRITE)) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, offset, count, + current->pid, path, + current->comm); + } trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) ret = ext4_ext_direct_IO(iocb, iter, offset); diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c index 1ce24a6759a0..1c5db9fd9c8f 100644 --- a/fs/ext4/readpage.c +++ b/fs/ext4/readpage.c @@ -152,11 +152,17 @@ ext4_submit_bio_read(struct bio *bio) struct page *first_page = bio->bi_io_vec[0].bv_page; if (first_page != NULL) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + first_page->mapping->host); trace_android_fs_dataread_start( first_page->mapping->host, page_offset(first_page), bio->bi_iter.bi_size, current->pid, + path, current->comm); } } diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index e692958d6e78..8936044dee4c 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1402,8 +1402,16 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping, struct dnode_of_data dn; int err = 0; - trace_android_fs_datawrite_start(inode, pos, len, - current->pid, current->comm); + if (trace_android_fs_datawrite_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, pos, len, + current->pid, path, + current->comm); + } trace_f2fs_write_begin(inode, pos, len, flags); f2fs_balance_fs(sbi); @@ -1587,15 +1595,27 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter, trace_f2fs_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); if (trace_android_fs_dataread_start_enabled() && - (iov_iter_rw(iter) == READ)) - trace_android_fs_dataread_start(inode, offset, - count, current->pid, - current->comm); - if (trace_android_fs_datawrite_start_enabled() && - (iov_iter_rw(iter) == WRITE)) - trace_android_fs_datawrite_start(inode, offset, count, - current->pid, current->comm); + (iov_iter_rw(iter) == READ)) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_dataread_start(inode, offset, + count, current->pid, path, + current->comm); + } + if (trace_android_fs_datawrite_start_enabled() && + (iov_iter_rw(iter) == WRITE)) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, offset, count, + current->pid, path, + current->comm); + } if (iov_iter_rw(iter) == WRITE) { __allocate_data_blocks(inode, offset, count); if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) { diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index d2c5d69ba0b1..dbb2cc4df603 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -85,9 +85,16 @@ int f2fs_read_inline_data(struct inode *inode, struct page *page) { struct page *ipage; - trace_android_fs_dataread_start(inode, page_offset(page), - PAGE_SIZE, current->pid, - current->comm); + if (trace_android_fs_dataread_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_dataread_start(inode, page_offset(page), + PAGE_SIZE, current->pid, + path, current->comm); + } ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino); if (IS_ERR(ipage)) { diff --git a/fs/mpage.c b/fs/mpage.c index 5c65d8942692..0fd48fdcc1b1 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -79,11 +79,17 @@ static struct bio *mpage_bio_submit(int rw, struct bio *bio) struct page *first_page = bio->bi_io_vec[0].bv_page; if (first_page != NULL) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + first_page->mapping->host); trace_android_fs_dataread_start( first_page->mapping->host, page_offset(first_page), bio->bi_iter.bi_size, current->pid, + path, current->comm); } } diff --git a/include/trace/events/android_fs.h b/include/trace/events/android_fs.h index 531da433a7bc..49509533d3fa 100644 --- a/include/trace/events/android_fs.h +++ b/include/trace/events/android_fs.h @@ -9,8 +9,8 @@ DEFINE_EVENT(android_fs_data_start_template, android_fs_dataread_start, TP_PROTO(struct inode *inode, loff_t offset, int bytes, - pid_t pid, char *command), - TP_ARGS(inode, offset, bytes, pid, command)); + pid_t pid, char *pathname, char *command), + TP_ARGS(inode, offset, bytes, pid, pathname, command)); DEFINE_EVENT(android_fs_data_end_template, android_fs_dataread_end, TP_PROTO(struct inode *inode, loff_t offset, int bytes), @@ -18,14 +18,48 @@ DEFINE_EVENT(android_fs_data_end_template, android_fs_dataread_end, DEFINE_EVENT(android_fs_data_start_template, android_fs_datawrite_start, TP_PROTO(struct inode *inode, loff_t offset, int bytes, - pid_t pid, char *command), - TP_ARGS(inode, offset, bytes, pid, command)); + pid_t pid, char *pathname, char *command), + TP_ARGS(inode, offset, bytes, pid, pathname, command)); DEFINE_EVENT(android_fs_data_end_template, android_fs_datawrite_end, TP_PROTO(struct inode *inode, loff_t offset, int bytes), - TP_ARGS(inode, offset, bytes)); + TP_ARGS(inode, offset, bytes)); #endif /* _TRACE_ANDROID_FS_H */ /* This part must be outside protection */ #include + +#ifndef ANDROID_FSTRACE_GET_PATHNAME +#define ANDROID_FSTRACE_GET_PATHNAME + +/* Sizes an on-stack array, so careful if sizing this up ! */ +#define MAX_TRACE_PATHBUF_LEN 256 + +static inline char * +android_fstrace_get_pathname(char *buf, int buflen, struct inode *inode) +{ + char *path; + struct dentry *d; + + /* + * d_obtain_alias() will either iput() if it locates an existing + * dentry or transfer the reference to the new dentry created. + * So get an extra reference here. + */ + ihold(inode); + d = d_obtain_alias(inode); + if (likely(!IS_ERR(d))) { + path = dentry_path_raw(d, buf, buflen); + if (unlikely(IS_ERR(path))) { + strcpy(buf, "ERROR"); + path = buf; + } + dput(d); + } else { + strcpy(buf, "ERROR"); + path = buf; + } + return path; +} +#endif diff --git a/include/trace/events/android_fs_template.h b/include/trace/events/android_fs_template.h index 618988b047c1..4e61ffe7a814 100644 --- a/include/trace/events/android_fs_template.h +++ b/include/trace/events/android_fs_template.h @@ -5,11 +5,10 @@ DECLARE_EVENT_CLASS(android_fs_data_start_template, TP_PROTO(struct inode *inode, loff_t offset, int bytes, - pid_t pid, char *command), - TP_ARGS(inode, offset, bytes, pid, command), + pid_t pid, char *pathname, char *command), + TP_ARGS(inode, offset, bytes, pid, pathname, command), TP_STRUCT__entry( - __array(char, path, MAX_FILTER_STR_VAL); - __field(char *, pathname); + __string(pathbuf, pathname); __field(loff_t, offset); __field(int, bytes); __field(loff_t, i_size); @@ -19,27 +18,7 @@ DECLARE_EVENT_CLASS(android_fs_data_start_template, ), TP_fast_assign( { - struct dentry *d; - - /* - * Grab a reference to the inode here because - * d_obtain_alias() will either drop the inode - * reference if it locates an existing dentry - * or transfer the reference to the new dentry - * created. In our case, the file is still open, - * so the dentry is guaranteed to exist (connected), - * so d_obtain_alias() drops the reference we - * grabbed here. - */ - ihold(inode); - d = d_obtain_alias(inode); - if (!IS_ERR(d)) { - __entry->pathname = dentry_path(d, - __entry->path, - MAX_FILTER_STR_VAL); - dput(d); - } else - __entry->pathname = ERR_PTR(-EINVAL); + __assign_str(pathbuf, pathname); __entry->offset = offset; __entry->bytes = bytes; __entry->i_size = i_size_read(inode); @@ -50,9 +29,8 @@ DECLARE_EVENT_CLASS(android_fs_data_start_template, ), TP_printk("entry_name %s, offset %llu, bytes %d, cmdline %s," " pid %d, i_size %llu, ino %lu", - (IS_ERR(__entry->pathname) ? "ERROR" : __entry->pathname), - __entry->offset, __entry->bytes, __get_str(cmdline), - __entry->pid, __entry->i_size, + __get_str(pathbuf), __entry->offset, __entry->bytes, + __get_str(cmdline), __entry->pid, __entry->i_size, (unsigned long) __entry->ino) ); From ea6520c1c8a825cea1c233d126b685c1881606f6 Mon Sep 17 00:00:00 2001 From: Jeremy Linton Date: Fri, 12 Feb 2016 09:47:52 -0600 Subject: [PATCH 217/521] UPSTREAM: arm/arm64: crypto: assure that ECB modes don't require an IV ECB modes don't use an initialization vector. The kernel /proc/crypto interface doesn't reflect this properly. Acked-by: Ard Biesheuvel Signed-off-by: Jeremy Linton Signed-off-by: Will Deacon (cherry picked from bee038a4bd2efe8188cc80dfdad706a9abe568ad) Signed-off-by: Eric Biggers Change-Id: Ief9558d2b41be58a2d845d2033a141b5ef7b585f --- arch/arm/crypto/aes-ce-glue.c | 4 ++-- arch/arm64/crypto/aes-glue.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/arm/crypto/aes-ce-glue.c b/arch/arm/crypto/aes-ce-glue.c index 679c589c4828..1f7b98e1a00d 100644 --- a/arch/arm/crypto/aes-ce-glue.c +++ b/arch/arm/crypto/aes-ce-glue.c @@ -369,7 +369,7 @@ static struct crypto_alg aes_algs[] = { { .cra_blkcipher = { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MAX_KEY_SIZE, - .ivsize = AES_BLOCK_SIZE, + .ivsize = 0, .setkey = ce_aes_setkey, .encrypt = ecb_encrypt, .decrypt = ecb_decrypt, @@ -446,7 +446,7 @@ static struct crypto_alg aes_algs[] = { { .cra_ablkcipher = { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MAX_KEY_SIZE, - .ivsize = AES_BLOCK_SIZE, + .ivsize = 0, .setkey = ablk_set_key, .encrypt = ablk_encrypt, .decrypt = ablk_decrypt, diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 6a51dfccfe71..448b874a4826 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -294,7 +294,7 @@ static struct crypto_alg aes_algs[] = { { .cra_blkcipher = { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MAX_KEY_SIZE, - .ivsize = AES_BLOCK_SIZE, + .ivsize = 0, .setkey = aes_setkey, .encrypt = ecb_encrypt, .decrypt = ecb_decrypt, @@ -371,7 +371,7 @@ static struct crypto_alg aes_algs[] = { { .cra_ablkcipher = { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MAX_KEY_SIZE, - .ivsize = AES_BLOCK_SIZE, + .ivsize = 0, .setkey = ablk_set_key, .encrypt = ablk_encrypt, .decrypt = ablk_decrypt, From c1bba8c93fc1c337365e14293dd1ec951d83e017 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 10 Jan 2017 16:47:49 -0800 Subject: [PATCH 218/521] ANDROID: crypto: allow blkcipher walks over ablkcipher data Add a function blkcipher_ablkcipher_walk_virt() which allows ablkcipher algorithms to use the blkcipher_walk API to walk over their data. This will be used by the HEH algorithm, which to support asynchronous ECB algorithms will be an ablkcipher, but it also needs to make other passes over the data. Bug: 32975945 Signed-off-by: Eric Biggers Change-Id: I05f9a0e5473ba6115fcc72d5122d6b0b18b2078b --- crypto/blkcipher.c | 21 +++++++++++++++++++++ include/crypto/algapi.h | 3 +++ 2 files changed, 24 insertions(+) diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c index dca7bc87dad9..7bbfadc195a6 100644 --- a/crypto/blkcipher.c +++ b/crypto/blkcipher.c @@ -373,6 +373,27 @@ int blkcipher_aead_walk_virt_block(struct blkcipher_desc *desc, } EXPORT_SYMBOL_GPL(blkcipher_aead_walk_virt_block); +/* + * This function allows ablkcipher algorithms to use the blkcipher_walk API to + * walk over their data. The specified crypto_ablkcipher tfm is used to + * initialize the struct blkcipher_walk, and the crypto_blkcipher specified in + * desc->tfm is never used so it can be left NULL. (Yes, this design is ugly, + * but it parallels blkcipher_aead_walk_virt_block() above. In the 4.10 kernel + * this is starting to be cleaned up...) + */ +int blkcipher_ablkcipher_walk_virt(struct blkcipher_desc *desc, + struct blkcipher_walk *walk, + struct crypto_ablkcipher *tfm) +{ + walk->flags &= ~BLKCIPHER_WALK_PHYS; + walk->walk_blocksize = crypto_ablkcipher_blocksize(tfm); + walk->cipher_blocksize = walk->walk_blocksize; + walk->ivsize = crypto_ablkcipher_ivsize(tfm); + walk->alignmask = crypto_ablkcipher_alignmask(tfm); + return blkcipher_walk_first(desc, walk); +} +EXPORT_SYMBOL_GPL(blkcipher_ablkcipher_walk_virt); + static int setkey_unaligned(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen) { diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h index c9fe145f7dd3..04661e1fb625 100644 --- a/include/crypto/algapi.h +++ b/include/crypto/algapi.h @@ -202,6 +202,9 @@ int blkcipher_aead_walk_virt_block(struct blkcipher_desc *desc, struct blkcipher_walk *walk, struct crypto_aead *tfm, unsigned int blocksize); +int blkcipher_ablkcipher_walk_virt(struct blkcipher_desc *desc, + struct blkcipher_walk *walk, + struct crypto_ablkcipher *tfm); int ablkcipher_walk_done(struct ablkcipher_request *req, struct ablkcipher_walk *walk, int err); From f9e82f17e180298cb92be33e1617c79041380fea Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 10 Jan 2017 16:47:49 -0800 Subject: [PATCH 219/521] ANDROID: crypto: shash - Add crypto_grab_shash() and crypto_spawn_shash_alg() Analogous to crypto_grab_skcipher() and crypto_spawn_skcipher_alg(), these are useful for algorithms that need to use a shash sub-algorithm, possibly in addition to other sub-algorithms. Bug: 32975945 Signed-off-by: Eric Biggers Change-Id: I44e5a519d73f5f839e3b6ecbf8c66e36ec569557 --- crypto/shash.c | 8 ++++++++ include/crypto/internal/hash.h | 8 ++++++++ 2 files changed, 16 insertions(+) diff --git a/crypto/shash.c b/crypto/shash.c index 359754591653..9ae1e891308d 100644 --- a/crypto/shash.c +++ b/crypto/shash.c @@ -683,6 +683,14 @@ void shash_free_instance(struct crypto_instance *inst) } EXPORT_SYMBOL_GPL(shash_free_instance); +int crypto_grab_shash(struct crypto_shash_spawn *spawn, + const char *name, u32 type, u32 mask) +{ + spawn->base.frontend = &crypto_shash_type; + return crypto_grab_spawn(&spawn->base, name, type, mask); +} +EXPORT_SYMBOL_GPL(crypto_grab_shash); + int crypto_init_shash_spawn(struct crypto_shash_spawn *spawn, struct shash_alg *alg, struct crypto_instance *inst) diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h index 3b4af1d7c7e9..476d99d0edb7 100644 --- a/include/crypto/internal/hash.h +++ b/include/crypto/internal/hash.h @@ -102,6 +102,8 @@ int shash_register_instance(struct crypto_template *tmpl, struct shash_instance *inst); void shash_free_instance(struct crypto_instance *inst); +int crypto_grab_shash(struct crypto_shash_spawn *spawn, + const char *name, u32 type, u32 mask); int crypto_init_shash_spawn(struct crypto_shash_spawn *spawn, struct shash_alg *alg, struct crypto_instance *inst); @@ -111,6 +113,12 @@ static inline void crypto_drop_shash(struct crypto_shash_spawn *spawn) crypto_drop_spawn(&spawn->base); } +static inline struct shash_alg *crypto_spawn_shash_alg( + struct crypto_shash_spawn *spawn) +{ + return container_of(spawn->base.alg, struct shash_alg, base); +} + struct shash_alg *shash_attr_alg(struct rtattr *rta, u32 type, u32 mask); int shash_ahash_update(struct ahash_request *req, struct shash_desc *desc); From a2d2edaf30069ba0f8cab1ebf819bdda52e7b11e Mon Sep 17 00:00:00 2001 From: Alex Cope Date: Mon, 14 Nov 2016 11:02:54 -0800 Subject: [PATCH 220/521] UPSTREAM: crypto: gf128mul - Zero memory when freeing multiplication table GF(2^128) multiplication tables are typically used for secret information, so it's a good idea to zero them on free. Signed-off-by: Alex Cope Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu (cherry-picked from 75aa0a7cafe951538c7cb7c5ed457a3371ec5bcd) Bug: 32975945 Signed-off-by: Eric Biggers Change-Id: I37b1ae9544158007f9ee2caf070120f4a42153ab --- crypto/gf128mul.c | 4 ++-- include/crypto/gf128mul.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c index 5276607c72d0..0594dd6f82f2 100644 --- a/crypto/gf128mul.c +++ b/crypto/gf128mul.c @@ -352,8 +352,8 @@ void gf128mul_free_64k(struct gf128mul_64k *t) int i; for (i = 0; i < 16; i++) - kfree(t->t[i]); - kfree(t); + kzfree(t->t[i]); + kzfree(t); } EXPORT_SYMBOL(gf128mul_free_64k); diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h index da2530e34b26..7217fe6dbe33 100644 --- a/include/crypto/gf128mul.h +++ b/include/crypto/gf128mul.h @@ -177,7 +177,7 @@ void gf128mul_4k_bbe(be128 *a, struct gf128mul_4k *t); static inline void gf128mul_free_4k(struct gf128mul_4k *t) { - kfree(t); + kzfree(t); } From 65513b9e83d7b1f6f1f5098fb71d158fce51e581 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 10 Jan 2017 17:04:54 -0800 Subject: [PATCH 221/521] ANDROID: crypto: gf128mul - Refactor gf128 overflow macros and tables Rename and clean up the GF(2^128) overflow macros and tables. Their usage is more general than the name suggested, e.g. what was previously known as the "bbe" table can actually be used for both "bbe" and "ble" multiplication. Bug: 32975945 Signed-off-by: Eric Biggers Change-Id: Ie6c47b4075ca40031eb1767e9b468cfd7bf1b2e4 --- crypto/gf128mul.c | 68 ++++++++++++++++++++++++++++------------------- 1 file changed, 41 insertions(+), 27 deletions(-) diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c index 0594dd6f82f2..8b65b1eb5dda 100644 --- a/crypto/gf128mul.c +++ b/crypto/gf128mul.c @@ -88,33 +88,47 @@ q(0xf8), q(0xf9), q(0xfa), q(0xfb), q(0xfc), q(0xfd), q(0xfe), q(0xff) \ } -/* Given the value i in 0..255 as the byte overflow when a field element - in GHASH is multiplied by x^8, this function will return the values that - are generated in the lo 16-bit word of the field value by applying the - modular polynomial. The values lo_byte and hi_byte are returned via the - macro xp_fun(lo_byte, hi_byte) so that the values can be assembled into - memory as required by a suitable definition of this macro operating on - the table above -*/ +/* + * Given a value i in 0..255 as the byte overflow when a field element + * in GF(2^128) is multiplied by x^8, the following macro returns the + * 16-bit value that must be XOR-ed into the low-degree end of the + * product to reduce it modulo the irreducible polynomial x^128 + x^7 + + * x^2 + x + 1. + * + * There are two versions of the macro, and hence two tables: one for + * the "be" convention where the highest-order bit is the coefficient of + * the highest-degree polynomial term, and one for the "le" convention + * where the highest-order bit is the coefficient of the lowest-degree + * polynomial term. In both cases the values are stored in CPU byte + * endianness such that the coefficients are ordered consistently across + * bytes, i.e. in the "be" table bits 15..0 of the stored value + * correspond to the coefficients of x^15..x^0, and in the "le" table + * bits 15..0 correspond to the coefficients of x^0..x^15. + * + * Therefore, provided that the appropriate byte endianness conversions + * are done by the multiplication functions (and these must be in place + * anyway to support both little endian and big endian CPUs), the "be" + * table can be used for multiplications of both "bbe" and "ble" + * elements, and the "le" table can be used for multiplications of both + * "lle" and "lbe" elements. + */ -#define xx(p, q) 0x##p##q - -#define xda_bbe(i) ( \ - (i & 0x80 ? xx(43, 80) : 0) ^ (i & 0x40 ? xx(21, c0) : 0) ^ \ - (i & 0x20 ? xx(10, e0) : 0) ^ (i & 0x10 ? xx(08, 70) : 0) ^ \ - (i & 0x08 ? xx(04, 38) : 0) ^ (i & 0x04 ? xx(02, 1c) : 0) ^ \ - (i & 0x02 ? xx(01, 0e) : 0) ^ (i & 0x01 ? xx(00, 87) : 0) \ +#define xda_be(i) ( \ + (i & 0x80 ? 0x4380 : 0) ^ (i & 0x40 ? 0x21c0 : 0) ^ \ + (i & 0x20 ? 0x10e0 : 0) ^ (i & 0x10 ? 0x0870 : 0) ^ \ + (i & 0x08 ? 0x0438 : 0) ^ (i & 0x04 ? 0x021c : 0) ^ \ + (i & 0x02 ? 0x010e : 0) ^ (i & 0x01 ? 0x0087 : 0) \ ) -#define xda_lle(i) ( \ - (i & 0x80 ? xx(e1, 00) : 0) ^ (i & 0x40 ? xx(70, 80) : 0) ^ \ - (i & 0x20 ? xx(38, 40) : 0) ^ (i & 0x10 ? xx(1c, 20) : 0) ^ \ - (i & 0x08 ? xx(0e, 10) : 0) ^ (i & 0x04 ? xx(07, 08) : 0) ^ \ - (i & 0x02 ? xx(03, 84) : 0) ^ (i & 0x01 ? xx(01, c2) : 0) \ +#define xda_le(i) ( \ + (i & 0x80 ? 0xe100 : 0) ^ (i & 0x40 ? 0x7080 : 0) ^ \ + (i & 0x20 ? 0x3840 : 0) ^ (i & 0x10 ? 0x1c20 : 0) ^ \ + (i & 0x08 ? 0x0e10 : 0) ^ (i & 0x04 ? 0x0708 : 0) ^ \ + (i & 0x02 ? 0x0384 : 0) ^ (i & 0x01 ? 0x01c2 : 0) \ ) -static const u16 gf128mul_table_lle[256] = gf128mul_dat(xda_lle); -static const u16 gf128mul_table_bbe[256] = gf128mul_dat(xda_bbe); +static const u16 gf128mul_table_le[256] = gf128mul_dat(xda_le); +static const u16 gf128mul_table_be[256] = gf128mul_dat(xda_be); /* These functions multiply a field element by x, by x^4 and by x^8 * in the polynomial field representation. It uses 32-bit word operations @@ -126,7 +140,7 @@ static void gf128mul_x_lle(be128 *r, const be128 *x) { u64 a = be64_to_cpu(x->a); u64 b = be64_to_cpu(x->b); - u64 _tt = gf128mul_table_lle[(b << 7) & 0xff]; + u64 _tt = gf128mul_table_le[(b << 7) & 0xff]; r->b = cpu_to_be64((b >> 1) | (a << 63)); r->a = cpu_to_be64((a >> 1) ^ (_tt << 48)); @@ -136,7 +150,7 @@ static void gf128mul_x_bbe(be128 *r, const be128 *x) { u64 a = be64_to_cpu(x->a); u64 b = be64_to_cpu(x->b); - u64 _tt = gf128mul_table_bbe[a >> 63]; + u64 _tt = gf128mul_table_be[a >> 63]; r->a = cpu_to_be64((a << 1) | (b >> 63)); r->b = cpu_to_be64((b << 1) ^ _tt); @@ -146,7 +160,7 @@ void gf128mul_x_ble(be128 *r, const be128 *x) { u64 a = le64_to_cpu(x->a); u64 b = le64_to_cpu(x->b); - u64 _tt = gf128mul_table_bbe[b >> 63]; + u64 _tt = gf128mul_table_be[b >> 63]; r->a = cpu_to_le64((a << 1) ^ _tt); r->b = cpu_to_le64((b << 1) | (a >> 63)); @@ -157,7 +171,7 @@ static void gf128mul_x8_lle(be128 *x) { u64 a = be64_to_cpu(x->a); u64 b = be64_to_cpu(x->b); - u64 _tt = gf128mul_table_lle[b & 0xff]; + u64 _tt = gf128mul_table_le[b & 0xff]; x->b = cpu_to_be64((b >> 8) | (a << 56)); x->a = cpu_to_be64((a >> 8) ^ (_tt << 48)); @@ -167,7 +181,7 @@ static void gf128mul_x8_bbe(be128 *x) { u64 a = be64_to_cpu(x->a); u64 b = be64_to_cpu(x->b); - u64 _tt = gf128mul_table_bbe[a >> 56]; + u64 _tt = gf128mul_table_be[a >> 56]; x->a = cpu_to_be64((a << 8) | (b >> 56)); x->b = cpu_to_be64((b << 8) ^ _tt); From 2c3ce605845da742cd7c456c8ec23d5fcc2710da Mon Sep 17 00:00:00 2001 From: Alex Cope Date: Tue, 10 Jan 2017 16:47:49 -0800 Subject: [PATCH 222/521] ANDROID: crypto: gf128mul - Add ble multiplication functions Adding ble multiplication to GF128mul, and fixing up comments. The ble multiplication functions multiply GF(2^128) elements in the ble format. This format is preferable because the bits within each byte map to polynomial coefficients in the natural order (lowest order bit = coefficient of lowest degree polynomial term), and the bytes are stored in little endian order which matches the endianness of most modern CPUs. These new functions will be used by the HEH algorithm. Signed-off-by: Alex Cope Bug: 32975945 Signed-off-by: Eric Biggers Change-Id: I39a58e8ee83e6f9b2e6bd51738f816dbfa2f3a47 --- crypto/gf128mul.c | 99 ++++++++++++++++++++++++++++++++++++--- include/crypto/gf128mul.h | 45 +++++++++--------- 2 files changed, 117 insertions(+), 27 deletions(-) diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c index 8b65b1eb5dda..f3d9f6da0767 100644 --- a/crypto/gf128mul.c +++ b/crypto/gf128mul.c @@ -44,7 +44,7 @@ --------------------------------------------------------------------------- Issue 31/01/2006 - This file provides fast multiplication in GF(128) as required by several + This file provides fast multiplication in GF(2^128) as required by several cryptographic authentication modes */ @@ -130,9 +130,10 @@ static const u16 gf128mul_table_le[256] = gf128mul_dat(xda_le); static const u16 gf128mul_table_be[256] = gf128mul_dat(xda_be); -/* These functions multiply a field element by x, by x^4 and by x^8 - * in the polynomial field representation. It uses 32-bit word operations - * to gain speed but compensates for machine endianess and hence works +/* + * The following functions multiply a field element by x or by x^8 in + * the polynomial field representation. They use 64-bit word operations + * to gain speed but compensate for machine endianness and hence work * correctly on both styles of machine. */ @@ -187,6 +188,16 @@ static void gf128mul_x8_bbe(be128 *x) x->b = cpu_to_be64((b << 8) ^ _tt); } +static void gf128mul_x8_ble(be128 *x) +{ + u64 a = le64_to_cpu(x->b); + u64 b = le64_to_cpu(x->a); + u64 _tt = gf128mul_table_be[a >> 56]; + + x->b = cpu_to_le64((a << 8) | (b >> 56)); + x->a = cpu_to_le64((b << 8) ^ _tt); +} + void gf128mul_lle(be128 *r, const be128 *b) { be128 p[8]; @@ -263,9 +274,48 @@ void gf128mul_bbe(be128 *r, const be128 *b) } EXPORT_SYMBOL(gf128mul_bbe); +void gf128mul_ble(be128 *r, const be128 *b) +{ + be128 p[8]; + int i; + + p[0] = *r; + for (i = 0; i < 7; ++i) + gf128mul_x_ble((be128 *)&p[i + 1], (be128 *)&p[i]); + + memset(r, 0, sizeof(*r)); + for (i = 0;;) { + u8 ch = ((u8 *)b)[15 - i]; + + if (ch & 0x80) + be128_xor(r, r, &p[7]); + if (ch & 0x40) + be128_xor(r, r, &p[6]); + if (ch & 0x20) + be128_xor(r, r, &p[5]); + if (ch & 0x10) + be128_xor(r, r, &p[4]); + if (ch & 0x08) + be128_xor(r, r, &p[3]); + if (ch & 0x04) + be128_xor(r, r, &p[2]); + if (ch & 0x02) + be128_xor(r, r, &p[1]); + if (ch & 0x01) + be128_xor(r, r, &p[0]); + + if (++i >= 16) + break; + + gf128mul_x8_ble(r); + } +} +EXPORT_SYMBOL(gf128mul_ble); + + /* This version uses 64k bytes of table space. A 16 byte buffer has to be multiplied by a 16 byte key - value in GF(128). If we consider a GF(128) value in + value in GF(2^128). If we consider a GF(2^128) value in the buffer's lowest byte, we can construct a table of the 256 16 byte values that result from the 256 values of this byte. This requires 4096 bytes. But we also @@ -399,7 +449,7 @@ EXPORT_SYMBOL(gf128mul_64k_bbe); /* This version uses 4k bytes of table space. A 16 byte buffer has to be multiplied by a 16 byte key - value in GF(128). If we consider a GF(128) value in a + value in GF(2^128). If we consider a GF(2^128) value in a single byte, we can construct a table of the 256 16 byte values that result from the 256 values of this byte. This requires 4096 bytes. If we take the highest byte in @@ -457,6 +507,28 @@ struct gf128mul_4k *gf128mul_init_4k_bbe(const be128 *g) } EXPORT_SYMBOL(gf128mul_init_4k_bbe); +struct gf128mul_4k *gf128mul_init_4k_ble(const be128 *g) +{ + struct gf128mul_4k *t; + int j, k; + + t = kzalloc(sizeof(*t), GFP_KERNEL); + if (!t) + goto out; + + t->t[1] = *g; + for (j = 1; j <= 64; j <<= 1) + gf128mul_x_ble(&t->t[j + j], &t->t[j]); + + for (j = 2; j < 256; j += j) + for (k = 1; k < j; ++k) + be128_xor(&t->t[j + k], &t->t[j], &t->t[k]); + +out: + return t; +} +EXPORT_SYMBOL(gf128mul_init_4k_ble); + void gf128mul_4k_lle(be128 *a, struct gf128mul_4k *t) { u8 *ap = (u8 *)a; @@ -487,5 +559,20 @@ void gf128mul_4k_bbe(be128 *a, struct gf128mul_4k *t) } EXPORT_SYMBOL(gf128mul_4k_bbe); +void gf128mul_4k_ble(be128 *a, struct gf128mul_4k *t) +{ + u8 *ap = (u8 *)a; + be128 r[1]; + int i = 15; + + *r = t->t[ap[15]]; + while (i--) { + gf128mul_x8_ble(r); + be128_xor(r, r, &t->t[ap[i]]); + } + *a = *r; +} +EXPORT_SYMBOL(gf128mul_4k_ble); + MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Functions for multiplying elements of GF(2^128)"); diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h index 7217fe6dbe33..230760aef93b 100644 --- a/include/crypto/gf128mul.h +++ b/include/crypto/gf128mul.h @@ -43,7 +43,7 @@ --------------------------------------------------------------------------- Issue Date: 31/01/2006 - An implementation of field multiplication in Galois Field GF(128) + An implementation of field multiplication in Galois Field GF(2^128) */ #ifndef _CRYPTO_GF128MUL_H @@ -65,7 +65,7 @@ * are left and the lsb's are right. char b[16] is an array and b[0] is * the first octet. * - * 80000000 00000000 00000000 00000000 .... 00000000 00000000 00000000 + * 10000000 00000000 00000000 00000000 .... 00000000 00000000 00000000 * b[0] b[1] b[2] b[3] b[13] b[14] b[15] * * Every bit is a coefficient of some power of X. We can store the bits @@ -99,21 +99,21 @@ * * bbe on a little endian machine u32 x[4]: * - * MS x[0] LS MS x[1] LS + * MS x[0] LS MS x[1] LS * ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls * 103..96 111.104 119.112 127.120 71...64 79...72 87...80 95...88 * - * MS x[2] LS MS x[3] LS + * MS x[2] LS MS x[3] LS * ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls * 39...32 47...40 55...48 63...56 07...00 15...08 23...16 31...24 * * ble on a little endian machine * - * MS x[0] LS MS x[1] LS + * MS x[0] LS MS x[1] LS * ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls * 31...24 23...16 15...08 07...00 63...56 55...48 47...40 39...32 * - * MS x[2] LS MS x[3] LS + * MS x[2] LS MS x[3] LS * ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls * 95...88 87...80 79...72 71...64 127.120 199.112 111.104 103..96 * @@ -127,7 +127,7 @@ * machines this will automatically aligned to wordsize and on a 64-bit * machine also. */ -/* Multiply a GF128 field element by x. Field elements are held in arrays +/* Multiply a GF128 field element by x. Field elements are held in arrays of bytes in which field bits 8n..8n + 7 are held in byte[n], with lower indexed bits placed in the more numerically significant bit positions within bytes. @@ -135,45 +135,47 @@ On little endian machines the bit indexes translate into the bit positions within four 32-bit words in the following way - MS x[0] LS MS x[1] LS + MS x[0] LS MS x[1] LS ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls 24...31 16...23 08...15 00...07 56...63 48...55 40...47 32...39 - MS x[2] LS MS x[3] LS + MS x[2] LS MS x[3] LS ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls 88...95 80...87 72...79 64...71 120.127 112.119 104.111 96..103 On big endian machines the bit indexes translate into the bit positions within four 32-bit words in the following way - MS x[0] LS MS x[1] LS + MS x[0] LS MS x[1] LS ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls 00...07 08...15 16...23 24...31 32...39 40...47 48...55 56...63 - MS x[2] LS MS x[3] LS + MS x[2] LS MS x[3] LS ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls 64...71 72...79 80...87 88...95 96..103 104.111 112.119 120.127 */ -/* A slow generic version of gf_mul, implemented for lle and bbe - * It multiplies a and b and puts the result in a */ +/* A slow generic version of gf_mul, implemented for lle, bbe, and ble. + * It multiplies a and b and puts the result in a + */ void gf128mul_lle(be128 *a, const be128 *b); - void gf128mul_bbe(be128 *a, const be128 *b); +void gf128mul_ble(be128 *a, const be128 *b); -/* multiply by x in ble format, needed by XTS */ +/* multiply by x in ble format, needed by XTS and HEH */ void gf128mul_x_ble(be128 *a, const be128 *b); /* 4k table optimization */ - struct gf128mul_4k { be128 t[256]; }; struct gf128mul_4k *gf128mul_init_4k_lle(const be128 *g); struct gf128mul_4k *gf128mul_init_4k_bbe(const be128 *g); +struct gf128mul_4k *gf128mul_init_4k_ble(const be128 *g); void gf128mul_4k_lle(be128 *a, struct gf128mul_4k *t); void gf128mul_4k_bbe(be128 *a, struct gf128mul_4k *t); +void gf128mul_4k_ble(be128 *a, struct gf128mul_4k *t); static inline void gf128mul_free_4k(struct gf128mul_4k *t) { @@ -181,16 +183,17 @@ static inline void gf128mul_free_4k(struct gf128mul_4k *t) } -/* 64k table optimization, implemented for lle and bbe */ +/* 64k table optimization, implemented for lle, ble, and bbe */ struct gf128mul_64k { struct gf128mul_4k *t[16]; }; -/* first initialize with the constant factor with which you - * want to multiply and then call gf128_64k_lle with the other - * factor in the first argument, the table in the second and a - * scratch register in the third. Afterwards *a = *r. */ +/* First initialize with the constant factor with which you + * want to multiply and then call gf128mul_64k_bbe with the other + * factor in the first argument, and the table in the second. + * Afterwards, the result is stored in *a. + */ struct gf128mul_64k *gf128mul_init_64k_lle(const be128 *g); struct gf128mul_64k *gf128mul_init_64k_bbe(const be128 *g); void gf128mul_free_64k(struct gf128mul_64k *t); From a2f9a7b9fe8a9562fa8819dd23711a9d6dcfd1be Mon Sep 17 00:00:00 2001 From: Alex Cope Date: Tue, 10 Jan 2017 16:47:49 -0800 Subject: [PATCH 223/521] ANDROID: crypto: heh - Add Hash-Encrypt-Hash (HEH) algorithm Hash-Encrypt-Hash (HEH) is a proposed block cipher mode of operation which extends the strong pseudo-random permutation property of block ciphers (e.g. AES) to arbitrary length input strings. This provides a stronger notion of security than existing block cipher modes of operation (e.g. CBC, CTR, XTS), though it is usually less performant. It uses two keyed invertible hash functions with a layer of ECB encryption applied in-between. The algorithm is currently specified by the following Internet Draft: https://tools.ietf.org/html/draft-cope-heh-01 This patch adds HEH as a symmetric cipher only. Support for HEH as an AEAD is not yet implemented. HEH will use an existing accelerated ecb(block_cipher) implementation for the encrypt step if available. Accelerated versions of the hash step are planned but will be left for later patches. This patch backports HEH to the 4.4 Android kernel, initially for use by ext4 filenames encryption. Note that HEH is not yet upstream; however, patches have been made available on linux-crypto, and as noted there is also a draft specification available. This backport required updating the code to conform to the legacy ablkcipher API rather than the skcipher API, which wasn't complete in 4.4. Signed-off-by: Alex Cope Bug: 32975945 Signed-off-by: Eric Biggers Change-Id: I945bcc9c0115916824d701bae91b86e3f059a1a9 --- crypto/Kconfig | 17 + crypto/Makefile | 1 + crypto/heh.c | 899 +++++++++++++++++++++++++++++++++++++++++++++++ crypto/testmgr.c | 15 + crypto/testmgr.h | 194 ++++++++++ 5 files changed, 1126 insertions(+) create mode 100644 crypto/heh.c diff --git a/crypto/Kconfig b/crypto/Kconfig index 7240821137fd..627227f1162d 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -289,6 +289,23 @@ config CRYPTO_CBC CBC: Cipher Block Chaining mode This block cipher algorithm is required for IPSec. +config CRYPTO_HEH + tristate "HEH support" + select CRYPTO_CMAC + select CRYPTO_ECB + select CRYPTO_GF128MUL + select CRYPTO_MANAGER + help + HEH: Hash-Encrypt-Hash mode + HEH is a proposed block cipher mode of operation which extends the + strong pseudo-random permutation (SPRP) property of block ciphers to + arbitrary-length input strings. This provides a stronger notion of + security than existing block cipher modes of operation (e.g. CBC, CTR, + XTS), though it is usually less performant. Applications include disk + encryption and encryption of file names and contents. Currently, this + implementation only provides a symmetric cipher interface, so it can't + yet be used as an AEAD. + config CRYPTO_CTR tristate "CTR support" select CRYPTO_BLKCIPHER diff --git a/crypto/Makefile b/crypto/Makefile index 03e66097eb0c..8507d1fab3ac 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -67,6 +67,7 @@ obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o obj-$(CONFIG_CRYPTO_GF128MUL) += gf128mul.o obj-$(CONFIG_CRYPTO_ECB) += ecb.o obj-$(CONFIG_CRYPTO_CBC) += cbc.o +obj-$(CONFIG_CRYPTO_HEH) += heh.o obj-$(CONFIG_CRYPTO_PCBC) += pcbc.o obj-$(CONFIG_CRYPTO_CTS) += cts.o obj-$(CONFIG_CRYPTO_LRW) += lrw.o diff --git a/crypto/heh.c b/crypto/heh.c new file mode 100644 index 000000000000..48a284cecaa2 --- /dev/null +++ b/crypto/heh.c @@ -0,0 +1,899 @@ +/* + * HEH: Hash-Encrypt-Hash mode + * + * Copyright (c) 2016 Google Inc. + * + * Authors: + * Alex Cope + * Eric Biggers + */ + +/* + * Hash-Encrypt-Hash (HEH) is a proposed block cipher mode of operation which + * extends the strong pseudo-random permutation (SPRP) property of block ciphers + * (e.g. AES) to arbitrary length input strings. It uses two keyed invertible + * hash functions with a layer of ECB encryption applied in-between. The + * algorithm is specified by the following Internet Draft: + * + * https://tools.ietf.org/html/draft-cope-heh-01 + * + * Although HEH can be used as either a regular symmetric cipher or as an AEAD, + * currently this module only provides it as a symmetric cipher. Additionally, + * only 16-byte nonces are supported. + */ + +#include +#include +#include +#include +#include +#include "internal.h" + +/* + * The block size is the size of GF(2^128) elements and also the required block + * size of the underlying block cipher. + */ +#define HEH_BLOCK_SIZE 16 + +struct heh_instance_ctx { + struct crypto_shash_spawn cmac; + struct crypto_skcipher_spawn ecb; +}; + +struct heh_tfm_ctx { + struct crypto_shash *cmac; + struct crypto_ablkcipher *ecb; + struct gf128mul_4k *tau_key; +}; + +struct heh_cmac_data { + u8 nonce[HEH_BLOCK_SIZE]; + __le32 nonce_length; + __le32 aad_length; + __le32 message_length; + __le32 padding; +}; + +struct heh_req_ctx { /* aligned to alignmask */ + be128 beta1_key; + be128 beta2_key; + union { + struct { + struct heh_cmac_data data; + struct shash_desc desc; + /* + crypto_shash_descsize(cmac) */ + } cmac; + struct { + u8 keystream[HEH_BLOCK_SIZE]; + u8 tmp[HEH_BLOCK_SIZE]; + struct scatterlist tmp_sgl[2]; + struct ablkcipher_request req; + /* + crypto_ablkcipher_reqsize(ecb) */ + } ecb; + } u; +}; + +/* + * Get the offset in bytes to the last full block, or equivalently the length of + * all full blocks excluding the last + */ +static inline unsigned int get_tail_offset(unsigned int len) +{ + len -= len % HEH_BLOCK_SIZE; + return len - HEH_BLOCK_SIZE; +} + +static inline struct heh_req_ctx *heh_req_ctx(struct ablkcipher_request *req) +{ + unsigned int alignmask = crypto_ablkcipher_alignmask( + crypto_ablkcipher_reqtfm(req)); + + return (void *)PTR_ALIGN((u8 *)ablkcipher_request_ctx(req), + alignmask + 1); +} + +static inline void async_done(struct crypto_async_request *areq, int err, + int (*next_step)(struct ablkcipher_request *, + u32)) +{ + struct ablkcipher_request *req = areq->data; + + if (err) + goto out; + + err = next_step(req, req->base.flags & ~CRYPTO_TFM_REQ_MAY_SLEEP); + if (err == -EINPROGRESS || + (err == -EBUSY && (req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG))) + return; +out: + ablkcipher_request_complete(req, err); +} + +/* + * Generate the per-message "beta" keys used by the hashing layers of HEH. The + * first beta key is the CMAC of the nonce, the additional authenticated data + * (AAD), and the lengths in bytes of the nonce, AAD, and message. The nonce + * and AAD are each zero-padded to the next 16-byte block boundary, and the + * lengths are serialized as 4-byte little endian integers and zero-padded to + * the next 16-byte block boundary. + * The second beta key is the first one interpreted as an element in GF(2^128) + * and multiplied by x. + * + * Note that because the nonce and AAD may, in general, be variable-length, the + * key generation must be done by a pseudo-random function (PRF) on + * variable-length inputs. CBC-MAC does not satisfy this, as it is only a PRF + * on fixed-length inputs. CMAC remedies this flaw. Including the lengths of + * the nonce, AAD, and message is also critical to avoid collisions. + * + * That being said, this implementation does not yet operate as an AEAD and + * therefore there is never any AAD, nor are variable-length nonces supported. + */ +static int generate_betas(struct ablkcipher_request *req, + be128 *beta1_key, be128 *beta2_key) +{ + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct heh_tfm_ctx *ctx = crypto_ablkcipher_ctx(tfm); + struct heh_req_ctx *rctx = heh_req_ctx(req); + struct heh_cmac_data *data = &rctx->u.cmac.data; + struct shash_desc *desc = &rctx->u.cmac.desc; + int err; + + BUILD_BUG_ON(sizeof(*data) != 2 * HEH_BLOCK_SIZE); + memcpy(data->nonce, req->info, HEH_BLOCK_SIZE); + data->nonce_length = cpu_to_le32(HEH_BLOCK_SIZE); + data->aad_length = cpu_to_le32(0); + data->message_length = cpu_to_le32(req->nbytes); + data->padding = cpu_to_le32(0); + + desc->tfm = ctx->cmac; + desc->flags = req->base.flags; + + err = crypto_shash_digest(desc, (const u8 *)data, sizeof(*data), + (u8 *)beta1_key); + if (err) + return err; + + gf128mul_x_ble(beta2_key, beta1_key); + return 0; +} + +/* + * Evaluation of a polynomial over GF(2^128) using Horner's rule. The + * polynomial is evaluated at 'point'. The polynomial's coefficients are taken + * from 'coeffs_sgl' and are for terms with consecutive descending degree ending + * at degree 1. 'bytes_of_coeffs' is 16 times the number of terms. + */ +static be128 evaluate_polynomial(struct gf128mul_4k *point, + struct scatterlist *coeffs_sgl, + unsigned int bytes_of_coeffs) +{ + be128 value = {0}; + struct sg_mapping_iter miter; + unsigned int remaining = bytes_of_coeffs; + unsigned int needed = 0; + + sg_miter_start(&miter, coeffs_sgl, sg_nents(coeffs_sgl), + SG_MITER_FROM_SG | SG_MITER_ATOMIC); + while (remaining) { + be128 coeff; + const u8 *src; + unsigned int srclen; + u8 *dst = (u8 *)&value; + + /* + * Note: scatterlist elements are not necessarily evenly + * divisible into blocks, nor are they necessarily aligned to + * __alignof__(be128). + */ + sg_miter_next(&miter); + + src = miter.addr; + srclen = min_t(unsigned int, miter.length, remaining); + remaining -= srclen; + + if (needed) { + unsigned int n = min(srclen, needed); + u8 *pos = dst + (HEH_BLOCK_SIZE - needed); + + needed -= n; + srclen -= n; + + while (n--) + *pos++ ^= *src++; + + if (!needed) + gf128mul_4k_ble(&value, point); + } + + while (srclen >= HEH_BLOCK_SIZE) { + memcpy(&coeff, src, HEH_BLOCK_SIZE); + be128_xor(&value, &value, &coeff); + gf128mul_4k_ble(&value, point); + src += HEH_BLOCK_SIZE; + srclen -= HEH_BLOCK_SIZE; + } + + if (srclen) { + needed = HEH_BLOCK_SIZE - srclen; + do { + *dst++ ^= *src++; + } while (--srclen); + } + } + sg_miter_stop(&miter); + return value; +} + +/* + * Split the message into 16 byte blocks, padding out the last block, and use + * the blocks as coefficients in the evaluation of a polynomial over GF(2^128) + * at the secret point 'tau_key'. For ease of implementing the higher-level + * heh_hash_inv() function, the constant and degree-1 coefficients are swapped + * if there is a partial block. + * + * Mathematically, compute: + * if (no partial block) + * k^{N-1} * m_0 + ... + k * m_{N-2} + m_{N-1} + * else if (partial block) + * k^N * m_0 + ... + k^2 * m_{N-2} + k * m_N + m_{N-1} + * + * where: + * t is tau_key + * N is the number of full blocks in the message + * m_i is the i-th full block in the message for i = 0 to N-1 inclusive + * m_N is the partial block of the message zero-padded up to 16 bytes + */ +static be128 poly_hash(struct crypto_ablkcipher *tfm, struct scatterlist *sgl, + unsigned int len) +{ + struct heh_tfm_ctx *ctx = crypto_ablkcipher_ctx(tfm); + unsigned int tail_offset = get_tail_offset(len); + unsigned int tail_len = len - tail_offset; + be128 hash; + be128 tail[2]; + + /* Handle all full blocks except the last */ + hash = evaluate_polynomial(ctx->tau_key, sgl, tail_offset); + + /* Handle the last full block and the partial block */ + scatterwalk_map_and_copy(tail, sgl, tail_offset, tail_len, 0); + + if (tail_len != HEH_BLOCK_SIZE) { + /* handle the partial block */ + memset((u8 *)tail + tail_len, 0, sizeof(tail) - tail_len); + be128_xor(&hash, &hash, &tail[1]); + gf128mul_4k_ble(&hash, ctx->tau_key); + } + be128_xor(&hash, &hash, &tail[0]); + return hash; +} + +/* + * Transform all full blocks except the last. + * This is used by both the hash and inverse hash phases. + */ +static int heh_tfm_blocks(struct ablkcipher_request *req, + struct scatterlist *src_sgl, + struct scatterlist *dst_sgl, unsigned int len, + const be128 *hash, const be128 *beta_key) +{ + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct blkcipher_desc desc = { .flags = req->base.flags }; + struct blkcipher_walk walk; + be128 e = *beta_key; + int err; + unsigned int nbytes; + + blkcipher_walk_init(&walk, dst_sgl, src_sgl, len); + + err = blkcipher_ablkcipher_walk_virt(&desc, &walk, tfm); + + while ((nbytes = walk.nbytes)) { + const be128 *src = (be128 *)walk.src.virt.addr; + be128 *dst = (be128 *)walk.dst.virt.addr; + + do { + gf128mul_x_ble(&e, &e); + be128_xor(dst, src, hash); + be128_xor(dst, dst, &e); + src++; + dst++; + } while ((nbytes -= HEH_BLOCK_SIZE) >= HEH_BLOCK_SIZE); + err = blkcipher_walk_done(&desc, &walk, nbytes); + } + return err; +} + +/* + * The hash phase of HEH. Given a message, compute: + * + * (m_0 + H, ..., m_{N-2} + H, H, m_N) + (xb, x^2b, ..., x^{N-1}b, b, 0) + * + * where: + * N is the number of full blocks in the message + * m_i is the i-th full block in the message for i = 0 to N-1 inclusive + * m_N is the unpadded partial block, possibly empty + * H is the poly_hash() of the message, keyed by tau_key + * b is beta_key + * x is the element x in our representation of GF(2^128) + * + * Note that the partial block remains unchanged, but it does affect the result + * of poly_hash() and therefore the transformation of all the full blocks. + */ +static int heh_hash(struct ablkcipher_request *req, const be128 *beta_key) +{ + be128 hash; + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + unsigned int tail_offset = get_tail_offset(req->nbytes); + unsigned int partial_len = req->nbytes % HEH_BLOCK_SIZE; + int err; + + /* poly_hash() the full message including the partial block */ + hash = poly_hash(tfm, req->src, req->nbytes); + + /* Transform all full blocks except the last */ + err = heh_tfm_blocks(req, req->src, req->dst, tail_offset, &hash, + beta_key); + if (err) + return err; + + /* Set the last full block to hash XOR beta_key */ + be128_xor(&hash, &hash, beta_key); + scatterwalk_map_and_copy(&hash, req->dst, tail_offset, HEH_BLOCK_SIZE, + 1); + + /* Copy the partial block if needed */ + if (partial_len != 0 && req->src != req->dst) { + unsigned int offs = tail_offset + HEH_BLOCK_SIZE; + + scatterwalk_map_and_copy(&hash, req->src, offs, partial_len, 0); + scatterwalk_map_and_copy(&hash, req->dst, offs, partial_len, 1); + } + return 0; +} + +/* + * The inverse hash phase of HEH. This undoes the result of heh_hash(). + */ +static int heh_hash_inv(struct ablkcipher_request *req, const be128 *beta_key) +{ + be128 hash; + be128 tmp; + struct scatterlist tmp_sgl[2]; + struct scatterlist *tail_sgl; + unsigned int len = req->nbytes; + unsigned int tail_offset = get_tail_offset(len); + struct scatterlist *sgl = req->dst; + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + int err; + + /* + * The last full block was computed as hash XOR beta_key, so XOR it with + * beta_key to recover hash. + */ + tail_sgl = scatterwalk_ffwd(tmp_sgl, sgl, tail_offset); + scatterwalk_map_and_copy(&hash, tail_sgl, 0, HEH_BLOCK_SIZE, 0); + be128_xor(&hash, &hash, beta_key); + + /* Transform all full blocks except the last */ + err = heh_tfm_blocks(req, sgl, sgl, tail_offset, &hash, beta_key); + if (err) + return err; + + /* + * Recover the last full block. We know 'hash', i.e. the poly_hash() of + * the the original message. The last full block was the constant term + * of the polynomial. To recover the last full block, temporarily zero + * it, compute the poly_hash(), and take the difference from 'hash'. + */ + memset(&tmp, 0, sizeof(tmp)); + scatterwalk_map_and_copy(&tmp, tail_sgl, 0, HEH_BLOCK_SIZE, 1); + tmp = poly_hash(tfm, sgl, len); + be128_xor(&tmp, &tmp, &hash); + scatterwalk_map_and_copy(&tmp, tail_sgl, 0, HEH_BLOCK_SIZE, 1); + return 0; +} + +static int heh_hash_inv_step(struct ablkcipher_request *req, u32 flags) +{ + struct heh_req_ctx *rctx = heh_req_ctx(req); + + return heh_hash_inv(req, &rctx->beta2_key); +} + +static int heh_ecb_step_3(struct ablkcipher_request *req, u32 flags) +{ + struct heh_req_ctx *rctx = heh_req_ctx(req); + u8 partial_block[HEH_BLOCK_SIZE] __aligned(__alignof__(u32)); + unsigned int tail_offset = get_tail_offset(req->nbytes); + unsigned int partial_offset = tail_offset + HEH_BLOCK_SIZE; + unsigned int partial_len = req->nbytes - partial_offset; + + /* + * Extract the pad in req->dst at tail_offset, and xor the partial block + * with it to create encrypted partial block + */ + scatterwalk_map_and_copy(rctx->u.ecb.keystream, req->dst, tail_offset, + HEH_BLOCK_SIZE, 0); + scatterwalk_map_and_copy(partial_block, req->dst, partial_offset, + partial_len, 0); + crypto_xor(partial_block, rctx->u.ecb.keystream, partial_len); + + /* + * Store the encrypted final block and partial block back in dst_sg + */ + scatterwalk_map_and_copy(&rctx->u.ecb.tmp, req->dst, tail_offset, + HEH_BLOCK_SIZE, 1); + scatterwalk_map_and_copy(partial_block, req->dst, partial_offset, + partial_len, 1); + + return heh_hash_inv_step(req, flags); +} + +static void heh_ecb_step_2_done(struct crypto_async_request *areq, int err) +{ + return async_done(areq, err, heh_ecb_step_3); +} + +static int heh_ecb_step_2(struct ablkcipher_request *req, u32 flags) +{ + struct heh_req_ctx *rctx = heh_req_ctx(req); + unsigned int partial_len = req->nbytes % HEH_BLOCK_SIZE; + struct scatterlist *tmp_sgl; + int err; + unsigned int tail_offset = get_tail_offset(req->nbytes); + + if (partial_len == 0) + return heh_hash_inv_step(req, flags); + + /* + * Extract the final full block, store it in tmp, and then xor that with + * the value saved in u.ecb.keystream + */ + scatterwalk_map_and_copy(rctx->u.ecb.tmp, req->dst, tail_offset, + HEH_BLOCK_SIZE, 0); + crypto_xor(rctx->u.ecb.keystream, rctx->u.ecb.tmp, HEH_BLOCK_SIZE); + + /* + * Encrypt the value in rctx->u.ecb.keystream to create the pad for the + * partial block. + * We cannot encrypt stack buffers, so re-use the dst_sg to do this + * encryption to avoid a malloc. The value at tail_offset is stored in + * tmp, and will be restored later. + */ + scatterwalk_map_and_copy(rctx->u.ecb.keystream, req->dst, tail_offset, + HEH_BLOCK_SIZE, 1); + tmp_sgl = scatterwalk_ffwd(rctx->u.ecb.tmp_sgl, req->dst, tail_offset); + ablkcipher_request_set_callback(&rctx->u.ecb.req, flags, + heh_ecb_step_2_done, req); + ablkcipher_request_set_crypt(&rctx->u.ecb.req, tmp_sgl, tmp_sgl, + HEH_BLOCK_SIZE, NULL); + err = crypto_ablkcipher_encrypt(&rctx->u.ecb.req); + if (err) + return err; + return heh_ecb_step_3(req, flags); +} + +static void heh_ecb_full_done(struct crypto_async_request *areq, int err) +{ + return async_done(areq, err, heh_ecb_step_2); +} + +/* + * The encrypt phase of HEH. This uses ECB encryption, with special handling + * for the partial block at the end if any. The source data is already in + * req->dst, so the encryption happens in-place. + * + * After the encrypt phase we continue on to the inverse hash phase. The + * functions calls are chained to support asynchronous ECB algorithms. + */ +static int heh_ecb(struct ablkcipher_request *req, bool decrypt) +{ + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct heh_tfm_ctx *ctx = crypto_ablkcipher_ctx(tfm); + struct heh_req_ctx *rctx = heh_req_ctx(req); + struct ablkcipher_request *ecb_req = &rctx->u.ecb.req; + unsigned int tail_offset = get_tail_offset(req->nbytes); + unsigned int full_len = tail_offset + HEH_BLOCK_SIZE; + int err; + + /* + * Save the last full block before it is encrypted/decrypted. This will + * be used later to encrypt/decrypt the partial block + */ + scatterwalk_map_and_copy(rctx->u.ecb.keystream, req->dst, tail_offset, + HEH_BLOCK_SIZE, 0); + + /* Encrypt/decrypt all full blocks */ + ablkcipher_request_set_tfm(ecb_req, ctx->ecb); + ablkcipher_request_set_callback(ecb_req, req->base.flags, + heh_ecb_full_done, req); + ablkcipher_request_set_crypt(ecb_req, req->dst, req->dst, full_len, + NULL); + if (decrypt) + err = crypto_ablkcipher_decrypt(ecb_req); + else + err = crypto_ablkcipher_encrypt(ecb_req); + if (err) + return err; + + return heh_ecb_step_2(req, req->base.flags); +} + +static int heh_crypt(struct ablkcipher_request *req, bool decrypt) +{ + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct heh_tfm_ctx *ctx = crypto_ablkcipher_ctx(tfm); + struct heh_req_ctx *rctx = heh_req_ctx(req); + int err; + + /* Inputs must be at least one full block */ + if (req->nbytes < HEH_BLOCK_SIZE) + return -EINVAL; + + /* Key must have been set */ + if (!ctx->tau_key) + return -ENOKEY; + err = generate_betas(req, &rctx->beta1_key, &rctx->beta2_key); + if (err) + return err; + + if (decrypt) + swap(rctx->beta1_key, rctx->beta2_key); + + err = heh_hash(req, &rctx->beta1_key); + if (err) + return err; + + return heh_ecb(req, decrypt); +} + +static int heh_encrypt(struct ablkcipher_request *req) +{ + return heh_crypt(req, false); +} + +static int heh_decrypt(struct ablkcipher_request *req) +{ + return heh_crypt(req, true); +} + +static int heh_setkey(struct crypto_ablkcipher *parent, const u8 *key, + unsigned int keylen) +{ + struct heh_tfm_ctx *ctx = crypto_ablkcipher_ctx(parent); + struct crypto_shash *cmac = ctx->cmac; + struct crypto_ablkcipher *ecb = ctx->ecb; + SHASH_DESC_ON_STACK(desc, cmac); + u8 *derived_keys; + u8 digest[HEH_BLOCK_SIZE]; + unsigned int i; + int err; + + /* set prf_key = key */ + crypto_shash_clear_flags(cmac, CRYPTO_TFM_REQ_MASK); + crypto_shash_set_flags(cmac, crypto_ablkcipher_get_flags(parent) & + CRYPTO_TFM_REQ_MASK); + err = crypto_shash_setkey(cmac, key, keylen); + crypto_ablkcipher_set_flags(parent, crypto_shash_get_flags(cmac) & + CRYPTO_TFM_RES_MASK); + if (err) + return err; + + /* + * Generate tau_key and ecb_key as follows: + * tau_key = cmac(prf_key, 0x00...01) + * ecb_key = cmac(prf_key, 0x00...02) || cmac(prf_key, 0x00...03) || ... + * truncated to keylen bytes + */ + derived_keys = kzalloc(round_up(HEH_BLOCK_SIZE + keylen, + HEH_BLOCK_SIZE), GFP_KERNEL); + if (!derived_keys) + return -ENOMEM; + desc->tfm = cmac; + desc->flags = (crypto_shash_get_flags(cmac) & CRYPTO_TFM_REQ_MASK); + for (i = 0; i < keylen + HEH_BLOCK_SIZE; i += HEH_BLOCK_SIZE) { + derived_keys[i + HEH_BLOCK_SIZE - 1] = + 0x01 + i / HEH_BLOCK_SIZE; + err = crypto_shash_digest(desc, derived_keys + i, + HEH_BLOCK_SIZE, digest); + if (err) + goto out; + memcpy(derived_keys + i, digest, HEH_BLOCK_SIZE); + } + + if (ctx->tau_key) + gf128mul_free_4k(ctx->tau_key); + err = -ENOMEM; + ctx->tau_key = gf128mul_init_4k_ble((const be128 *)derived_keys); + if (!ctx->tau_key) + goto out; + + crypto_ablkcipher_clear_flags(ecb, CRYPTO_TFM_REQ_MASK); + crypto_ablkcipher_set_flags(ecb, crypto_ablkcipher_get_flags(parent) & + CRYPTO_TFM_REQ_MASK); + err = crypto_ablkcipher_setkey(ecb, derived_keys + HEH_BLOCK_SIZE, + keylen); + crypto_ablkcipher_set_flags(parent, crypto_ablkcipher_get_flags(ecb) & + CRYPTO_TFM_RES_MASK); +out: + kzfree(derived_keys); + return err; +} + +static int heh_init_tfm(struct crypto_tfm *tfm) +{ + struct crypto_instance *inst = crypto_tfm_alg_instance(tfm); + struct heh_instance_ctx *ictx = crypto_instance_ctx(inst); + struct heh_tfm_ctx *ctx = crypto_tfm_ctx(tfm); + struct crypto_shash *cmac; + struct crypto_ablkcipher *ecb; + unsigned int reqsize; + int err; + + cmac = crypto_spawn_shash(&ictx->cmac); + if (IS_ERR(cmac)) + return PTR_ERR(cmac); + + ecb = crypto_spawn_skcipher(&ictx->ecb); + err = PTR_ERR(ecb); + if (IS_ERR(ecb)) + goto err_free_cmac; + + ctx->cmac = cmac; + ctx->ecb = ecb; + + reqsize = crypto_tfm_alg_alignmask(tfm) & + ~(crypto_tfm_ctx_alignment() - 1); + reqsize += max(offsetof(struct heh_req_ctx, u.cmac.desc) + + sizeof(struct shash_desc) + + crypto_shash_descsize(cmac), + offsetof(struct heh_req_ctx, u.ecb.req) + + sizeof(struct ablkcipher_request) + + crypto_ablkcipher_reqsize(ecb)); + tfm->crt_ablkcipher.reqsize = reqsize; + return 0; + +err_free_cmac: + crypto_free_shash(cmac); + return err; +} + +static void heh_exit_tfm(struct crypto_tfm *tfm) +{ + struct heh_tfm_ctx *ctx = crypto_tfm_ctx(tfm); + + gf128mul_free_4k(ctx->tau_key); + crypto_free_shash(ctx->cmac); + crypto_free_ablkcipher(ctx->ecb); +} + +static void heh_free_instance(struct crypto_instance *inst) +{ + struct heh_instance_ctx *ctx = crypto_instance_ctx(inst); + + crypto_drop_shash(&ctx->cmac); + crypto_drop_skcipher(&ctx->ecb); + kfree(inst); +} + +/* + * Create an instance of HEH as a ablkcipher. + * + * This relies on underlying CMAC and ECB algorithms, usually cmac(aes) and + * ecb(aes). For performance reasons we support asynchronous ECB algorithms. + * However, we do not yet support asynchronous CMAC algorithms because CMAC is + * only used on a small fixed amount of data per request, independent of the + * request length. This would change if AEAD or variable-length nonce support + * were to be exposed. + */ +static int heh_create_common(struct crypto_template *tmpl, struct rtattr **tb, + const char *full_name, const char *cmac_name, + const char *ecb_name) +{ + struct crypto_attr_type *algt; + struct crypto_instance *inst; + struct heh_instance_ctx *ctx; + struct shash_alg *cmac; + struct crypto_alg *ecb; + int err; + + algt = crypto_get_attr_type(tb); + if (IS_ERR(algt)) + return PTR_ERR(algt); + + /* User must be asking for something compatible with ablkcipher */ + if ((algt->type ^ CRYPTO_ALG_TYPE_ABLKCIPHER) & algt->mask) + return -EINVAL; + + /* Allocate the ablkcipher instance */ + inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL); + if (!inst) + return -ENOMEM; + + ctx = crypto_instance_ctx(inst); + + /* Set up the cmac and ecb spawns */ + + ctx->cmac.base.inst = inst; + err = crypto_grab_shash(&ctx->cmac, cmac_name, 0, CRYPTO_ALG_ASYNC); + if (err) + goto err_free_inst; + cmac = crypto_spawn_shash_alg(&ctx->cmac); + err = -EINVAL; + if (cmac->digestsize != HEH_BLOCK_SIZE) + goto err_drop_cmac; + + ctx->ecb.base.inst = inst; + err = crypto_grab_skcipher(&ctx->ecb, ecb_name, 0, + crypto_requires_sync(algt->type, + algt->mask)); + if (err) + goto err_drop_cmac; + ecb = crypto_skcipher_spawn_alg(&ctx->ecb); + + /* HEH only supports block ciphers with 16 byte block size */ + err = -EINVAL; + if (ecb->cra_blocksize != HEH_BLOCK_SIZE) + goto err_drop_ecb; + + /* The underlying "ECB" algorithm must not require an IV */ + err = -EINVAL; + if ((ecb->cra_flags & CRYPTO_ALG_TYPE_MASK) == CRYPTO_ALG_TYPE_BLKCIPHER) { + if (ecb->cra_blkcipher.ivsize != 0) + goto err_drop_ecb; + } else { + if (ecb->cra_ablkcipher.ivsize != 0) + goto err_drop_ecb; + } + + /* Set the instance names */ + err = -ENAMETOOLONG; + if (snprintf(inst->alg.cra_driver_name, CRYPTO_MAX_ALG_NAME, + "heh_base(%s,%s)", cmac->base.cra_driver_name, + ecb->cra_driver_name) >= CRYPTO_MAX_ALG_NAME) + goto err_drop_ecb; + + err = -ENAMETOOLONG; + if (snprintf(inst->alg.cra_name, CRYPTO_MAX_ALG_NAME, + "%s", full_name) >= CRYPTO_MAX_ALG_NAME) + goto err_drop_ecb; + + /* Finish initializing the instance */ + + inst->alg.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | + ((cmac->base.cra_flags | ecb->cra_flags) & + CRYPTO_ALG_ASYNC); + inst->alg.cra_blocksize = HEH_BLOCK_SIZE; + inst->alg.cra_ctxsize = sizeof(struct heh_tfm_ctx); + inst->alg.cra_alignmask = ecb->cra_alignmask | (__alignof__(be128) - 1); + inst->alg.cra_priority = ecb->cra_priority; + inst->alg.cra_type = &crypto_ablkcipher_type; + inst->alg.cra_init = heh_init_tfm; + inst->alg.cra_exit = heh_exit_tfm; + + inst->alg.cra_ablkcipher.setkey = heh_setkey; + inst->alg.cra_ablkcipher.encrypt = heh_encrypt; + inst->alg.cra_ablkcipher.decrypt = heh_decrypt; + if ((ecb->cra_flags & CRYPTO_ALG_TYPE_MASK) == CRYPTO_ALG_TYPE_BLKCIPHER) { + inst->alg.cra_ablkcipher.min_keysize = ecb->cra_blkcipher.min_keysize; + inst->alg.cra_ablkcipher.max_keysize = ecb->cra_blkcipher.max_keysize; + } else { + inst->alg.cra_ablkcipher.min_keysize = ecb->cra_ablkcipher.min_keysize; + inst->alg.cra_ablkcipher.max_keysize = ecb->cra_ablkcipher.max_keysize; + } + inst->alg.cra_ablkcipher.ivsize = HEH_BLOCK_SIZE; + + /* Register the instance */ + err = crypto_register_instance(tmpl, inst); + if (err) + goto err_drop_ecb; + return 0; + +err_drop_ecb: + crypto_drop_skcipher(&ctx->ecb); +err_drop_cmac: + crypto_drop_shash(&ctx->cmac); +err_free_inst: + kfree(inst); + return err; +} + +static int heh_create(struct crypto_template *tmpl, struct rtattr **tb) +{ + const char *cipher_name; + char full_name[CRYPTO_MAX_ALG_NAME]; + char cmac_name[CRYPTO_MAX_ALG_NAME]; + char ecb_name[CRYPTO_MAX_ALG_NAME]; + + /* Get the name of the requested block cipher (e.g. aes) */ + cipher_name = crypto_attr_alg_name(tb[1]); + if (IS_ERR(cipher_name)) + return PTR_ERR(cipher_name); + + if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "heh(%s)", cipher_name) >= + CRYPTO_MAX_ALG_NAME) + return -ENAMETOOLONG; + + if (snprintf(cmac_name, CRYPTO_MAX_ALG_NAME, "cmac(%s)", cipher_name) >= + CRYPTO_MAX_ALG_NAME) + return -ENAMETOOLONG; + + if (snprintf(ecb_name, CRYPTO_MAX_ALG_NAME, "ecb(%s)", cipher_name) >= + CRYPTO_MAX_ALG_NAME) + return -ENAMETOOLONG; + + return heh_create_common(tmpl, tb, full_name, cmac_name, ecb_name); +} + +static struct crypto_template heh_tmpl = { + .name = "heh", + .create = heh_create, + .free = heh_free_instance, + .module = THIS_MODULE, +}; + +static int heh_base_create(struct crypto_template *tmpl, struct rtattr **tb) +{ + char full_name[CRYPTO_MAX_ALG_NAME]; + const char *cmac_name; + const char *ecb_name; + + cmac_name = crypto_attr_alg_name(tb[1]); + if (IS_ERR(cmac_name)) + return PTR_ERR(cmac_name); + + ecb_name = crypto_attr_alg_name(tb[2]); + if (IS_ERR(ecb_name)) + return PTR_ERR(ecb_name); + + if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "heh_base(%s,%s)", + cmac_name, ecb_name) >= CRYPTO_MAX_ALG_NAME) + return -ENAMETOOLONG; + + return heh_create_common(tmpl, tb, full_name, cmac_name, ecb_name); +} + +/* + * If HEH is instantiated as "heh_base" instead of "heh", then specific + * implementations of cmac and ecb can be specified instead of just the cipher + */ +static struct crypto_template heh_base_tmpl = { + .name = "heh_base", + .create = heh_base_create, + .free = heh_free_instance, + .module = THIS_MODULE, +}; + +static int __init heh_module_init(void) +{ + int err; + + err = crypto_register_template(&heh_tmpl); + if (err) + return err; + + err = crypto_register_template(&heh_base_tmpl); + if (err) + goto out_undo_heh; + + return 0; + +out_undo_heh: + crypto_unregister_template(&heh_tmpl); + return err; +} + +static void __exit heh_module_exit(void) +{ + crypto_unregister_template(&heh_tmpl); + crypto_unregister_template(&heh_base_tmpl); +} + +module_init(heh_module_init); +module_exit(heh_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("Hash-Encrypt-Hash block cipher mode"); +MODULE_ALIAS_CRYPTO("heh"); +MODULE_ALIAS_CRYPTO("heh_base"); diff --git a/crypto/testmgr.c b/crypto/testmgr.c index d4944318ca1f..8374ca8b6579 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -3213,6 +3213,21 @@ static const struct alg_test_desc alg_test_descs[] = { .count = GHASH_TEST_VECTORS } } + }, { + .alg = "heh(aes)", + .test = alg_test_skcipher, + .suite = { + .cipher = { + .enc = { + .vecs = aes_heh_enc_tv_template, + .count = AES_HEH_ENC_TEST_VECTORS + }, + .dec = { + .vecs = aes_heh_dec_tv_template, + .count = AES_HEH_DEC_TEST_VECTORS + } + } + } }, { .alg = "hmac(crc32)", .test = alg_test_hash, diff --git a/crypto/testmgr.h b/crypto/testmgr.h index 0e02c60a57b6..ba6530d8ba58 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -14139,6 +14139,8 @@ static struct cipher_testvec cast6_xts_dec_tv_template[] = { #define AES_DEC_TEST_VECTORS 4 #define AES_CBC_ENC_TEST_VECTORS 5 #define AES_CBC_DEC_TEST_VECTORS 5 +#define AES_HEH_ENC_TEST_VECTORS 4 +#define AES_HEH_DEC_TEST_VECTORS 4 #define HMAC_MD5_ECB_CIPHER_NULL_ENC_TEST_VECTORS 2 #define HMAC_MD5_ECB_CIPHER_NULL_DEC_TEST_VECTORS 2 #define HMAC_SHA1_ECB_CIPHER_NULL_ENC_TEST_VEC 2 @@ -14511,6 +14513,198 @@ static struct cipher_testvec aes_dec_tv_template[] = { }, }; +static struct cipher_testvec aes_heh_enc_tv_template[] = { + { + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F", + .klen = 16, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F", + .ilen = 16, + .result = "\xd8\xbd\x40\xbf\xca\xe5\xee\x81" + "\x0f\x3d\x1f\x1f\xae\x89\x07\x55", + .rlen = 16, + .also_non_np = 1, + .np = 2, + .tap = { 8, 8 }, + }, { + .key = "\xa8\xda\x24\x9b\x5e\xfa\x13\xc2" + "\xc1\x94\xbf\x32\xba\x38\xa3\x77", + .klen = 16, + .iv = "\x4d\x47\x61\x37\x2b\x47\x86\xf0" + "\xd6\x47\xb5\xc2\xe8\xcf\x85\x27", + .input = "\xb8\xee\x29\xe4\xa5\xd1\xe7\x55" + "\xd0\xfd\xe7\x22\x63\x76\x36\xe2" + "\xf8\x0c\xf8\xfe\x65\x76\xe7\xca" + "\xc1\x42\xf5\xca\x5a\xa8\xac\x2a", + .ilen = 32, + .result = "\x59\xf2\x78\x4e\x10\x94\xf9\x5c" + "\x22\x23\x78\x2a\x30\x48\x11\x97" + "\xb1\xfe\x70\xc4\xef\xdf\x04\xef" + "\x16\x39\x04\xcf\xc0\x95\x9a\x98", + .rlen = 32, + .also_non_np = 1, + .np = 3, + .tap = { 16, 13, 3 }, + }, { + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F", + .klen = 16, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00", + .ilen = 63, + .result = "\xe0\x40\xeb\xe9\x52\xbe\x65\x60" + "\xe4\x68\x68\xa3\x73\x75\xb8\x52" + "\xef\x38\x6a\x87\x25\x25\xf6\x04" + "\xe5\x8e\xbe\x14\x8b\x02\x14\x1f" + "\xa9\x73\xb7\xad\x15\xbe\x9c\xa0" + "\xd2\x8a\x2c\xdc\xd4\xe3\x05\x55" + "\x0a\xf5\xf8\x51\xee\xe5\x62\xa5" + "\x71\xa7\x7c\x15\x5d\x7a\x9e", + .rlen = 63, + .also_non_np = 1, + .np = 8, + .tap = { 20, 20, 10, 8, 2, 1, 1, 1 }, + }, { + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F", + .klen = 16, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x01" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00", + .ilen = 63, + .result = "\x4b\x1a\x15\xa0\xaf\x08\x6d\x70" + "\xf0\xa7\x97\xb5\x31\x4b\x8c\xc3" + "\x4d\xf2\x7a\x9d\xdd\xd4\x15\x99" + "\x57\xad\xc6\xb1\x35\x69\xf5\x6a" + "\x2d\x70\xe4\x97\x49\xb2\x9f\x71" + "\xde\x22\xb5\x70\x8c\x69\x24\xd3" + "\xad\x80\x58\x48\x90\xe4\xed\xba" + "\x76\x3d\x71\x7c\x57\x25\x87", + .rlen = 63, + .also_non_np = 1, + .np = 8, + .tap = { 20, 20, 10, 8, 2, 1, 1, 1 }, + } +}; + +static struct cipher_testvec aes_heh_dec_tv_template[] = { + { + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F", + .klen = 16, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\xd8\xbd\x40\xbf\xca\xe5\xee\x81" + "\x0f\x3d\x1f\x1f\xae\x89\x07\x55", + .ilen = 16, + .result = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F", + .rlen = 16, + .also_non_np = 1, + .np = 2, + .tap = { 8, 8 }, + }, { + .key = "\xa8\xda\x24\x9b\x5e\xfa\x13\xc2" + "\xc1\x94\xbf\x32\xba\x38\xa3\x77", + .klen = 16, + .iv = "\x4d\x47\x61\x37\x2b\x47\x86\xf0" + "\xd6\x47\xb5\xc2\xe8\xcf\x85\x27", + .input = "\x59\xf2\x78\x4e\x10\x94\xf9\x5c" + "\x22\x23\x78\x2a\x30\x48\x11\x97" + "\xb1\xfe\x70\xc4\xef\xdf\x04\xef" + "\x16\x39\x04\xcf\xc0\x95\x9a\x98", + .ilen = 32, + .result = "\xb8\xee\x29\xe4\xa5\xd1\xe7\x55" + "\xd0\xfd\xe7\x22\x63\x76\x36\xe2" + "\xf8\x0c\xf8\xfe\x65\x76\xe7\xca" + "\xc1\x42\xf5\xca\x5a\xa8\xac\x2a", + .rlen = 32, + .also_non_np = 1, + .np = 3, + .tap = { 16, 13, 3 }, + }, { + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F", + .klen = 16, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\xe0\x40\xeb\xe9\x52\xbe\x65\x60" + "\xe4\x68\x68\xa3\x73\x75\xb8\x52" + "\xef\x38\x6a\x87\x25\x25\xf6\x04" + "\xe5\x8e\xbe\x14\x8b\x02\x14\x1f" + "\xa9\x73\xb7\xad\x15\xbe\x9c\xa0" + "\xd2\x8a\x2c\xdc\xd4\xe3\x05\x55" + "\x0a\xf5\xf8\x51\xee\xe5\x62\xa5" + "\x71\xa7\x7c\x15\x5d\x7a\x9e", + .ilen = 63, + .result = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00", + .rlen = 63, + .also_non_np = 1, + .np = 8, + .tap = { 20, 20, 10, 8, 2, 1, 1, 1 }, + }, { + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F", + .klen = 16, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x4b\x1a\x15\xa0\xaf\x08\x6d\x70" + "\xf0\xa7\x97\xb5\x31\x4b\x8c\xc3" + "\x4d\xf2\x7a\x9d\xdd\xd4\x15\x99" + "\x57\xad\xc6\xb1\x35\x69\xf5\x6a" + "\x2d\x70\xe4\x97\x49\xb2\x9f\x71" + "\xde\x22\xb5\x70\x8c\x69\x24\xd3" + "\xad\x80\x58\x48\x90\xe4\xed\xba" + "\x76\x3d\x71\x7c\x57\x25\x87", + .ilen = 63, + .result = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x01" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00", + .rlen = 63, + .also_non_np = 1, + .np = 8, + .tap = { 20, 20, 10, 8, 2, 1, 1, 1 }, + } +}; + static struct cipher_testvec aes_cbc_enc_tv_template[] = { { /* From RFC 3602 */ .key = "\x06\xa9\x21\x40\x36\xb8\xa1\x5b" From eba753c13d1e663986b159b858972ce07cae148d Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 11 Jan 2017 10:36:41 -0800 Subject: [PATCH 224/521] ANDROID: crypto: heh - factor out poly_hash algorithm Factor most of poly_hash() out into its own keyed hash algorithm so that optimized architecture-specific implementations of it will be possible. For now we call poly_hash through the shash API, since HEH already had an example of using shash for another algorithm (CMAC), and we will not be adding any poly_hash implementations that require ahash yet. We can however switch to ahash later if it becomes useful. Bug: 32508661 Signed-off-by: Eric Biggers Change-Id: I8de54ddcecd1d7fa6e9842a09506a08129bae0b6 --- crypto/heh.c | 330 ++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 232 insertions(+), 98 deletions(-) diff --git a/crypto/heh.c b/crypto/heh.c index 48a284cecaa2..10c00aaf797e 100644 --- a/crypto/heh.c +++ b/crypto/heh.c @@ -37,13 +37,14 @@ struct heh_instance_ctx { struct crypto_shash_spawn cmac; + struct crypto_shash_spawn poly_hash; struct crypto_skcipher_spawn ecb; }; struct heh_tfm_ctx { struct crypto_shash *cmac; + struct crypto_shash *poly_hash; /* keyed with tau_key */ struct crypto_ablkcipher *ecb; - struct gf128mul_4k *tau_key; }; struct heh_cmac_data { @@ -63,6 +64,10 @@ struct heh_req_ctx { /* aligned to alignmask */ struct shash_desc desc; /* + crypto_shash_descsize(cmac) */ } cmac; + struct { + struct shash_desc desc; + /* + crypto_shash_descsize(poly_hash) */ + } poly_hash; struct { u8 keystream[HEH_BLOCK_SIZE]; u8 tmp[HEH_BLOCK_SIZE]; @@ -157,73 +162,138 @@ static int generate_betas(struct ablkcipher_request *req, return 0; } +/*****************************************************************************/ + /* - * Evaluation of a polynomial over GF(2^128) using Horner's rule. The - * polynomial is evaluated at 'point'. The polynomial's coefficients are taken - * from 'coeffs_sgl' and are for terms with consecutive descending degree ending - * at degree 1. 'bytes_of_coeffs' is 16 times the number of terms. + * This is the generic version of poly_hash. It does the GF(2^128) + * multiplication by 'tau_key' using a precomputed table, without using any + * special CPU instructions. On some platforms, an accelerated version (with + * higher cra_priority) may be used instead. */ -static be128 evaluate_polynomial(struct gf128mul_4k *point, - struct scatterlist *coeffs_sgl, - unsigned int bytes_of_coeffs) + +struct poly_hash_tfm_ctx { + struct gf128mul_4k *tau_key; +}; + +struct poly_hash_desc_ctx { + be128 digest; + unsigned int count; +}; + +static int poly_hash_setkey(struct crypto_shash *tfm, + const u8 *key, unsigned int keylen) { - be128 value = {0}; - struct sg_mapping_iter miter; - unsigned int remaining = bytes_of_coeffs; - unsigned int needed = 0; + struct poly_hash_tfm_ctx *tctx = crypto_shash_ctx(tfm); + be128 key128; - sg_miter_start(&miter, coeffs_sgl, sg_nents(coeffs_sgl), - SG_MITER_FROM_SG | SG_MITER_ATOMIC); - while (remaining) { - be128 coeff; - const u8 *src; - unsigned int srclen; - u8 *dst = (u8 *)&value; + if (keylen != HEH_BLOCK_SIZE) { + crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN); + return -EINVAL; + } - /* - * Note: scatterlist elements are not necessarily evenly - * divisible into blocks, nor are they necessarily aligned to - * __alignof__(be128). - */ - sg_miter_next(&miter); + if (tctx->tau_key) + gf128mul_free_4k(tctx->tau_key); + memcpy(&key128, key, HEH_BLOCK_SIZE); + tctx->tau_key = gf128mul_init_4k_ble(&key128); + if (!tctx->tau_key) + return -ENOMEM; + return 0; +} - src = miter.addr; - srclen = min_t(unsigned int, miter.length, remaining); - remaining -= srclen; +static int poly_hash_init(struct shash_desc *desc) +{ + struct poly_hash_desc_ctx *ctx = shash_desc_ctx(desc); - if (needed) { - unsigned int n = min(srclen, needed); - u8 *pos = dst + (HEH_BLOCK_SIZE - needed); + ctx->digest = (be128) { 0 }; + ctx->count = 0; + return 0; +} - needed -= n; - srclen -= n; +static int poly_hash_update(struct shash_desc *desc, const u8 *src, + unsigned int len) +{ + struct poly_hash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); + struct poly_hash_desc_ctx *ctx = shash_desc_ctx(desc); + unsigned int partial = ctx->count % HEH_BLOCK_SIZE; + u8 *dst = (u8 *)&ctx->digest + partial; - while (n--) - *pos++ ^= *src++; + ctx->count += len; - if (!needed) - gf128mul_4k_ble(&value, point); - } + /* Finishing at least one block? */ + if (partial + len >= HEH_BLOCK_SIZE) { - while (srclen >= HEH_BLOCK_SIZE) { - memcpy(&coeff, src, HEH_BLOCK_SIZE); - be128_xor(&value, &value, &coeff); - gf128mul_4k_ble(&value, point); - src += HEH_BLOCK_SIZE; - srclen -= HEH_BLOCK_SIZE; - } + if (partial) { + /* Finish the pending block. */ + unsigned int n = HEH_BLOCK_SIZE - partial; - if (srclen) { - needed = HEH_BLOCK_SIZE - srclen; + len -= n; do { *dst++ ^= *src++; - } while (--srclen); + } while (--n); + + gf128mul_4k_ble(&ctx->digest, tctx->tau_key); } + + /* Process zero or more full blocks. */ + while (len >= HEH_BLOCK_SIZE) { + be128 coeff; + + memcpy(&coeff, src, HEH_BLOCK_SIZE); + be128_xor(&ctx->digest, &ctx->digest, &coeff); + src += HEH_BLOCK_SIZE; + len -= HEH_BLOCK_SIZE; + gf128mul_4k_ble(&ctx->digest, tctx->tau_key); + } + dst = (u8 *)&ctx->digest; } - sg_miter_stop(&miter); - return value; + + /* Continue adding the next block to 'digest'. */ + while (len--) + *dst++ ^= *src++; + return 0; } +static int poly_hash_final(struct shash_desc *desc, u8 *out) +{ + struct poly_hash_desc_ctx *ctx = shash_desc_ctx(desc); + + /* Finish the last block if needed. */ + if (ctx->count % HEH_BLOCK_SIZE) { + struct poly_hash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); + + gf128mul_4k_ble(&ctx->digest, tctx->tau_key); + } + + memcpy(out, &ctx->digest, HEH_BLOCK_SIZE); + return 0; +} + +static void poly_hash_exit(struct crypto_tfm *tfm) +{ + struct poly_hash_tfm_ctx *tctx = crypto_tfm_ctx(tfm); + + gf128mul_free_4k(tctx->tau_key); +} + +static struct shash_alg poly_hash_alg = { + .digestsize = HEH_BLOCK_SIZE, + .init = poly_hash_init, + .update = poly_hash_update, + .final = poly_hash_final, + .setkey = poly_hash_setkey, + .descsize = sizeof(struct poly_hash_desc_ctx), + .base = { + .cra_name = "poly_hash", + .cra_driver_name = "poly_hash-generic", + .cra_priority = 100, + .cra_ctxsize = sizeof(struct poly_hash_tfm_ctx), + .cra_exit = poly_hash_exit, + .cra_module = THIS_MODULE, + }, +}; + +/*****************************************************************************/ + /* * Split the message into 16 byte blocks, padding out the last block, and use * the blocks as coefficients in the evaluation of a polynomial over GF(2^128) @@ -242,18 +312,42 @@ static be128 evaluate_polynomial(struct gf128mul_4k *point, * N is the number of full blocks in the message * m_i is the i-th full block in the message for i = 0 to N-1 inclusive * m_N is the partial block of the message zero-padded up to 16 bytes + * + * Note that most of this is now separated out into its own keyed hash + * algorithm, to allow optimized implementations. However, we still handle the + * swapping of the last two coefficients here in the HEH template because this + * simplifies the poly_hash algorithms: they don't have to buffer an extra + * block, don't have to duplicate as much code, and are more similar to GHASH. */ -static be128 poly_hash(struct crypto_ablkcipher *tfm, struct scatterlist *sgl, - unsigned int len) +static int poly_hash(struct ablkcipher_request *req, struct scatterlist *sgl, + be128 *hash) { + struct heh_req_ctx *rctx = heh_req_ctx(req); + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); struct heh_tfm_ctx *ctx = crypto_ablkcipher_ctx(tfm); - unsigned int tail_offset = get_tail_offset(len); - unsigned int tail_len = len - tail_offset; - be128 hash; + struct shash_desc *desc = &rctx->u.poly_hash.desc; + unsigned int tail_offset = get_tail_offset(req->nbytes); + unsigned int tail_len = req->nbytes - tail_offset; be128 tail[2]; + unsigned int i, n; + struct sg_mapping_iter miter; + int err; + + desc->tfm = ctx->poly_hash; + desc->flags = req->base.flags; /* Handle all full blocks except the last */ - hash = evaluate_polynomial(ctx->tau_key, sgl, tail_offset); + err = crypto_shash_init(desc); + sg_miter_start(&miter, sgl, sg_nents(sgl), + SG_MITER_FROM_SG | SG_MITER_ATOMIC); + for (i = 0; i < tail_offset && !err; i += n) { + sg_miter_next(&miter); + n = min_t(unsigned int, miter.length, tail_offset - i); + err = crypto_shash_update(desc, miter.addr, n); + } + sg_miter_stop(&miter); + if (err) + return err; /* Handle the last full block and the partial block */ scatterwalk_map_and_copy(tail, sgl, tail_offset, tail_len, 0); @@ -261,11 +355,15 @@ static be128 poly_hash(struct crypto_ablkcipher *tfm, struct scatterlist *sgl, if (tail_len != HEH_BLOCK_SIZE) { /* handle the partial block */ memset((u8 *)tail + tail_len, 0, sizeof(tail) - tail_len); - be128_xor(&hash, &hash, &tail[1]); - gf128mul_4k_ble(&hash, ctx->tau_key); + err = crypto_shash_update(desc, (u8 *)&tail[1], HEH_BLOCK_SIZE); + if (err) + return err; } - be128_xor(&hash, &hash, &tail[0]); - return hash; + err = crypto_shash_final(desc, (u8 *)hash); + if (err) + return err; + be128_xor(hash, hash, &tail[0]); + return 0; } /* @@ -323,13 +421,14 @@ static int heh_tfm_blocks(struct ablkcipher_request *req, static int heh_hash(struct ablkcipher_request *req, const be128 *beta_key) { be128 hash; - struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); unsigned int tail_offset = get_tail_offset(req->nbytes); unsigned int partial_len = req->nbytes % HEH_BLOCK_SIZE; int err; /* poly_hash() the full message including the partial block */ - hash = poly_hash(tfm, req->src, req->nbytes); + err = poly_hash(req, req->src, &hash); + if (err) + return err; /* Transform all full blocks except the last */ err = heh_tfm_blocks(req, req->src, req->dst, tail_offset, &hash, @@ -361,10 +460,8 @@ static int heh_hash_inv(struct ablkcipher_request *req, const be128 *beta_key) be128 tmp; struct scatterlist tmp_sgl[2]; struct scatterlist *tail_sgl; - unsigned int len = req->nbytes; - unsigned int tail_offset = get_tail_offset(len); + unsigned int tail_offset = get_tail_offset(req->nbytes); struct scatterlist *sgl = req->dst; - struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); int err; /* @@ -388,7 +485,9 @@ static int heh_hash_inv(struct ablkcipher_request *req, const be128 *beta_key) */ memset(&tmp, 0, sizeof(tmp)); scatterwalk_map_and_copy(&tmp, tail_sgl, 0, HEH_BLOCK_SIZE, 1); - tmp = poly_hash(tfm, sgl, len); + err = poly_hash(req, sgl, &tmp); + if (err) + return err; be128_xor(&tmp, &tmp, &hash); scatterwalk_map_and_copy(&tmp, tail_sgl, 0, HEH_BLOCK_SIZE, 1); return 0; @@ -522,8 +621,6 @@ static int heh_ecb(struct ablkcipher_request *req, bool decrypt) static int heh_crypt(struct ablkcipher_request *req, bool decrypt) { - struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); - struct heh_tfm_ctx *ctx = crypto_ablkcipher_ctx(tfm); struct heh_req_ctx *rctx = heh_req_ctx(req); int err; @@ -531,9 +628,6 @@ static int heh_crypt(struct ablkcipher_request *req, bool decrypt) if (req->nbytes < HEH_BLOCK_SIZE) return -EINVAL; - /* Key must have been set */ - if (!ctx->tau_key) - return -ENOKEY; err = generate_betas(req, &rctx->beta1_key, &rctx->beta2_key); if (err) return err; @@ -602,11 +696,8 @@ static int heh_setkey(struct crypto_ablkcipher *parent, const u8 *key, memcpy(derived_keys + i, digest, HEH_BLOCK_SIZE); } - if (ctx->tau_key) - gf128mul_free_4k(ctx->tau_key); - err = -ENOMEM; - ctx->tau_key = gf128mul_init_4k_ble((const be128 *)derived_keys); - if (!ctx->tau_key) + err = crypto_shash_setkey(ctx->poly_hash, derived_keys, HEH_BLOCK_SIZE); + if (err) goto out; crypto_ablkcipher_clear_flags(ecb, CRYPTO_TFM_REQ_MASK); @@ -627,6 +718,7 @@ static int heh_init_tfm(struct crypto_tfm *tfm) struct heh_instance_ctx *ictx = crypto_instance_ctx(inst); struct heh_tfm_ctx *ctx = crypto_tfm_ctx(tfm); struct crypto_shash *cmac; + struct crypto_shash *poly_hash; struct crypto_ablkcipher *ecb; unsigned int reqsize; int err; @@ -635,25 +727,37 @@ static int heh_init_tfm(struct crypto_tfm *tfm) if (IS_ERR(cmac)) return PTR_ERR(cmac); + poly_hash = crypto_spawn_shash(&ictx->poly_hash); + err = PTR_ERR(poly_hash); + if (IS_ERR(poly_hash)) + goto err_free_cmac; + ecb = crypto_spawn_skcipher(&ictx->ecb); err = PTR_ERR(ecb); if (IS_ERR(ecb)) - goto err_free_cmac; + goto err_free_poly_hash; ctx->cmac = cmac; + ctx->poly_hash = poly_hash; ctx->ecb = ecb; reqsize = crypto_tfm_alg_alignmask(tfm) & ~(crypto_tfm_ctx_alignment() - 1); - reqsize += max(offsetof(struct heh_req_ctx, u.cmac.desc) + - sizeof(struct shash_desc) + - crypto_shash_descsize(cmac), - offsetof(struct heh_req_ctx, u.ecb.req) + - sizeof(struct ablkcipher_request) + - crypto_ablkcipher_reqsize(ecb)); + reqsize += max3(offsetof(struct heh_req_ctx, u.cmac.desc) + + sizeof(struct shash_desc) + + crypto_shash_descsize(cmac), + offsetof(struct heh_req_ctx, u.poly_hash.desc) + + sizeof(struct shash_desc) + + crypto_shash_descsize(poly_hash), + offsetof(struct heh_req_ctx, u.ecb.req) + + sizeof(struct ablkcipher_request) + + crypto_ablkcipher_reqsize(ecb)); tfm->crt_ablkcipher.reqsize = reqsize; + return 0; +err_free_poly_hash: + crypto_free_shash(poly_hash); err_free_cmac: crypto_free_shash(cmac); return err; @@ -663,8 +767,8 @@ static void heh_exit_tfm(struct crypto_tfm *tfm) { struct heh_tfm_ctx *ctx = crypto_tfm_ctx(tfm); - gf128mul_free_4k(ctx->tau_key); crypto_free_shash(ctx->cmac); + crypto_free_shash(ctx->poly_hash); crypto_free_ablkcipher(ctx->ecb); } @@ -673,6 +777,7 @@ static void heh_free_instance(struct crypto_instance *inst) struct heh_instance_ctx *ctx = crypto_instance_ctx(inst); crypto_drop_shash(&ctx->cmac); + crypto_drop_shash(&ctx->poly_hash); crypto_drop_skcipher(&ctx->ecb); kfree(inst); } @@ -689,12 +794,13 @@ static void heh_free_instance(struct crypto_instance *inst) */ static int heh_create_common(struct crypto_template *tmpl, struct rtattr **tb, const char *full_name, const char *cmac_name, - const char *ecb_name) + const char *poly_hash_name, const char *ecb_name) { struct crypto_attr_type *algt; struct crypto_instance *inst; struct heh_instance_ctx *ctx; struct shash_alg *cmac; + struct shash_alg *poly_hash; struct crypto_alg *ecb; int err; @@ -713,10 +819,9 @@ static int heh_create_common(struct crypto_template *tmpl, struct rtattr **tb, ctx = crypto_instance_ctx(inst); - /* Set up the cmac and ecb spawns */ - + /* Set up the cmac spawn */ ctx->cmac.base.inst = inst; - err = crypto_grab_shash(&ctx->cmac, cmac_name, 0, CRYPTO_ALG_ASYNC); + err = crypto_grab_shash(&ctx->cmac, cmac_name, 0, 0); if (err) goto err_free_inst; cmac = crypto_spawn_shash_alg(&ctx->cmac); @@ -724,12 +829,23 @@ static int heh_create_common(struct crypto_template *tmpl, struct rtattr **tb, if (cmac->digestsize != HEH_BLOCK_SIZE) goto err_drop_cmac; + /* Set up the poly_hash spawn */ + ctx->poly_hash.base.inst = inst; + err = crypto_grab_shash(&ctx->poly_hash, poly_hash_name, 0, 0); + if (err) + goto err_drop_cmac; + poly_hash = crypto_spawn_shash_alg(&ctx->poly_hash); + err = -EINVAL; + if (poly_hash->digestsize != HEH_BLOCK_SIZE) + goto err_drop_poly_hash; + + /* Set up the ecb spawn */ ctx->ecb.base.inst = inst; err = crypto_grab_skcipher(&ctx->ecb, ecb_name, 0, crypto_requires_sync(algt->type, algt->mask)); if (err) - goto err_drop_cmac; + goto err_drop_poly_hash; ecb = crypto_skcipher_spawn_alg(&ctx->ecb); /* HEH only supports block ciphers with 16 byte block size */ @@ -750,7 +866,8 @@ static int heh_create_common(struct crypto_template *tmpl, struct rtattr **tb, /* Set the instance names */ err = -ENAMETOOLONG; if (snprintf(inst->alg.cra_driver_name, CRYPTO_MAX_ALG_NAME, - "heh_base(%s,%s)", cmac->base.cra_driver_name, + "heh_base(%s,%s,%s)", cmac->base.cra_driver_name, + poly_hash->base.cra_driver_name, ecb->cra_driver_name) >= CRYPTO_MAX_ALG_NAME) goto err_drop_ecb; @@ -762,8 +879,7 @@ static int heh_create_common(struct crypto_template *tmpl, struct rtattr **tb, /* Finish initializing the instance */ inst->alg.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | - ((cmac->base.cra_flags | ecb->cra_flags) & - CRYPTO_ALG_ASYNC); + (ecb->cra_flags & CRYPTO_ALG_ASYNC); inst->alg.cra_blocksize = HEH_BLOCK_SIZE; inst->alg.cra_ctxsize = sizeof(struct heh_tfm_ctx); inst->alg.cra_alignmask = ecb->cra_alignmask | (__alignof__(be128) - 1); @@ -792,6 +908,8 @@ static int heh_create_common(struct crypto_template *tmpl, struct rtattr **tb, err_drop_ecb: crypto_drop_skcipher(&ctx->ecb); +err_drop_poly_hash: + crypto_drop_shash(&ctx->poly_hash); err_drop_cmac: crypto_drop_shash(&ctx->cmac); err_free_inst: @@ -823,7 +941,8 @@ static int heh_create(struct crypto_template *tmpl, struct rtattr **tb) CRYPTO_MAX_ALG_NAME) return -ENAMETOOLONG; - return heh_create_common(tmpl, tb, full_name, cmac_name, ecb_name); + return heh_create_common(tmpl, tb, full_name, cmac_name, "poly_hash", + ecb_name); } static struct crypto_template heh_tmpl = { @@ -837,26 +956,34 @@ static int heh_base_create(struct crypto_template *tmpl, struct rtattr **tb) { char full_name[CRYPTO_MAX_ALG_NAME]; const char *cmac_name; + const char *poly_hash_name; const char *ecb_name; cmac_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(cmac_name)) return PTR_ERR(cmac_name); - ecb_name = crypto_attr_alg_name(tb[2]); + poly_hash_name = crypto_attr_alg_name(tb[2]); + if (IS_ERR(poly_hash_name)) + return PTR_ERR(poly_hash_name); + + ecb_name = crypto_attr_alg_name(tb[3]); if (IS_ERR(ecb_name)) return PTR_ERR(ecb_name); - if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "heh_base(%s,%s)", - cmac_name, ecb_name) >= CRYPTO_MAX_ALG_NAME) + if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "heh_base(%s,%s,%s)", + cmac_name, poly_hash_name, ecb_name) >= + CRYPTO_MAX_ALG_NAME) return -ENAMETOOLONG; - return heh_create_common(tmpl, tb, full_name, cmac_name, ecb_name); + return heh_create_common(tmpl, tb, full_name, cmac_name, poly_hash_name, + ecb_name); } /* * If HEH is instantiated as "heh_base" instead of "heh", then specific - * implementations of cmac and ecb can be specified instead of just the cipher + * implementations of cmac, poly_hash, and ecb can be specified instead of just + * the cipher. */ static struct crypto_template heh_base_tmpl = { .name = "heh_base", @@ -877,8 +1004,14 @@ static int __init heh_module_init(void) if (err) goto out_undo_heh; + err = crypto_register_shash(&poly_hash_alg); + if (err) + goto out_undo_heh_base; + return 0; +out_undo_heh_base: + crypto_unregister_template(&heh_base_tmpl); out_undo_heh: crypto_unregister_template(&heh_tmpl); return err; @@ -888,6 +1021,7 @@ static void __exit heh_module_exit(void) { crypto_unregister_template(&heh_tmpl); crypto_unregister_template(&heh_base_tmpl); + crypto_unregister_shash(&poly_hash_alg); } module_init(heh_module_init); From 830a526837f6b4a949559c45c311d0ade0132589 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 10 Jan 2017 18:32:19 -0800 Subject: [PATCH 225/521] ANDROID: arm64/crypto: add ARMv8-CE optimized poly_hash algorithm poly_hash is part of the HEH (Hash-Encrypt-Hash) encryption mode, proposed in Internet Draft https://tools.ietf.org/html/draft-cope-heh-01. poly_hash is very similar to GHASH; besides the swapping of the last two coefficients which we opted to handle in the HEH template, poly_hash just uses a different finite field representation. As with GHASH, poly_hash becomes much faster and more secure against timing attacks when implemented using carryless multiplication instructions instead of tables. This patch adds an ARMv8-CE optimized version of poly_hash, based roughly on the existing ARMv8-CE optimized version of GHASH. Benchmark results are shown below, but note that the resistance to timing attacks may be even more important than the performance gain. poly_hash only: poly_hash-generic: 1,000,000 setkey() takes 1185 ms hashing is 328 MB/s poly_hash-ce: 1,000,000 setkey() takes 8 ms hashing is 1756 MB/s heh(aes) with 4096-byte inputs (this is the ideal case, as the improvement is less significant with smaller inputs): encryption with "heh_base(cmac(aes-ce),poly_hash-generic,ecb-aes-ce)": 118 MB/s decryption with "heh_base(cmac(aes-ce),poly_hash-generic,ecb-aes-ce)": 120 MB/s encryption with "heh_base(cmac(aes-ce),poly_hash-ce,ecb-aes-ce)": 291 MB/s decryption with "heh_base(cmac(aes-ce),poly_hash-ce,ecb-aes-ce)": 293 MB/s Bug: 32508661 Signed-off-by: Eric Biggers Change-Id: I621ec0e1115df7e6f5cbd7e864a4a9d8d2e94cf2 --- arch/arm64/crypto/Kconfig | 5 + arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/poly-hash-ce-core.S | 163 +++++++++++++++++++++++++ arch/arm64/crypto/poly-hash-ce-glue.c | 166 ++++++++++++++++++++++++++ crypto/Kconfig | 1 + 5 files changed, 338 insertions(+) create mode 100644 arch/arm64/crypto/poly-hash-ce-core.S create mode 100644 arch/arm64/crypto/poly-hash-ce-glue.c diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index 2cf32e9887e1..de1aab4b5da8 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE depends on ARM64 && KERNEL_MODE_NEON select CRYPTO_HASH +config CRYPTO_POLY_HASH_ARM64_CE + tristate "poly_hash (for HEH encryption mode) using ARMv8 Crypto Extensions" + depends on ARM64 && KERNEL_MODE_NEON + select CRYPTO_HASH + config CRYPTO_AES_ARM64_CE tristate "AES core cipher using ARMv8 Crypto Extensions" depends on ARM64 && KERNEL_MODE_NEON diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index abb79b3cfcfe..f0a8f2475ea3 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -17,6 +17,9 @@ sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o +obj-$(CONFIG_CRYPTO_POLY_HASH_ARM64_CE) += poly-hash-ce.o +poly-hash-ce-y := poly-hash-ce-glue.o poly-hash-ce-core.o + obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto diff --git a/arch/arm64/crypto/poly-hash-ce-core.S b/arch/arm64/crypto/poly-hash-ce-core.S new file mode 100644 index 000000000000..8ccb544c5526 --- /dev/null +++ b/arch/arm64/crypto/poly-hash-ce-core.S @@ -0,0 +1,163 @@ +/* + * Accelerated poly_hash implementation with ARMv8 PMULL instructions. + * + * Based on ghash-ce-core.S. + * + * Copyright (C) 2014 Linaro Ltd. + * Copyright (C) 2017 Google, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include +#include + + KEY .req v0 + KEY2 .req v1 + T1 .req v2 + T2 .req v3 + GSTAR .req v4 + XL .req v5 + XM .req v6 + XH .req v7 + + .text + .arch armv8-a+crypto + + /* 16-byte aligned (2**4 = 16); not required, but might as well */ + .align 4 +.Lgstar: + .quad 0x87, 0x87 + +/* + * void pmull_poly_hash_update(le128 *digest, const le128 *key, + * const u8 *src, unsigned int blocks, + * unsigned int partial); + */ +ENTRY(pmull_poly_hash_update) + + /* Load digest into XL */ + ld1 {XL.16b}, [x0] + + /* Load key into KEY */ + ld1 {KEY.16b}, [x1] + + /* Load g*(x) = g(x) + x^128 = x^7 + x^2 + x + 1 into both halves of + * GSTAR */ + adr x1, .Lgstar + ld1 {GSTAR.2d}, [x1] + + /* Set KEY2 to (KEY[1]+KEY[0]):(KEY[1]+KEY[0]). This is needed for + * Karatsuba multiplication. */ + ext KEY2.16b, KEY.16b, KEY.16b, #8 + eor KEY2.16b, KEY2.16b, KEY.16b + + /* If 'partial' is nonzero, then we're finishing a pending block and + * should go right to the multiplication. */ + cbnz w4, 1f + +0: + /* Add the next block from 'src' to the digest */ + ld1 {T1.16b}, [x2], #16 + eor XL.16b, XL.16b, T1.16b + sub w3, w3, #1 + +1: + /* + * Multiply the current 128-bit digest (a1:a0, in XL) by the 128-bit key + * (b1:b0, in KEY) using Karatsuba multiplication. + */ + + /* T1 = (a1+a0):(a1+a0) */ + ext T1.16b, XL.16b, XL.16b, #8 + eor T1.16b, T1.16b, XL.16b + + /* XH = a1 * b1 */ + pmull2 XH.1q, XL.2d, KEY.2d + + /* XL = a0 * b0 */ + pmull XL.1q, XL.1d, KEY.1d + + /* XM = (a1+a0) * (b1+b0) */ + pmull XM.1q, T1.1d, KEY2.1d + + /* XM += (XH[0]:XL[1]) + XL + XH */ + ext T1.16b, XL.16b, XH.16b, #8 + eor T2.16b, XL.16b, XH.16b + eor XM.16b, XM.16b, T1.16b + eor XM.16b, XM.16b, T2.16b + + /* + * Now the 256-bit product is in XH[1]:XM:XL[0]. It represents a + * polynomial over GF(2) with degree as large as 255. We need to + * compute its remainder modulo g(x) = x^128+x^7+x^2+x+1. For this it + * is sufficient to compute the remainder of the high half 'c(x)x^128' + * add it to the low half. To reduce the high half we use the Barrett + * reduction method. The basic idea is that we can express the + * remainder p(x) as g(x)q(x) mod x^128, where q(x) = (c(x)x^128)/g(x). + * As detailed in [1], to avoid having to divide by g(x) at runtime the + * following equivalent expression can be derived: + * + * p(x) = [ g*(x)((c(x)q+(x))/x^128) ] mod x^128 + * + * where g*(x) = x^128+g(x) = x^7+x^2+x+1, and q+(x) = x^256/g(x) = g(x) + * in this case. This is also equivalent to: + * + * p(x) = [ g*(x)((c(x)(x^128 + g*(x)))/x^128) ] mod x^128 + * = [ g*(x)(c(x) + (c(x)g*(x))/x^128) ] mod x^128 + * + * Since deg g*(x) < 64: + * + * p(x) = [ g*(x)(c(x) + ((c(x)/x^64)g*(x))/x^64) ] mod x^128 + * = [ g*(x)((c(x)/x^64)x^64 + (c(x) mod x^64) + + * ((c(x)/x^64)g*(x))/x^64) ] mod x^128 + * + * Letting t(x) = g*(x)(c(x)/x^64): + * + * p(x) = [ t(x)x^64 + g*(x)((c(x) mod x^64) + t(x)/x^64) ] mod x^128 + * + * Therefore, to do the reduction we only need to issue two 64-bit => + * 128-bit carryless multiplications: g*(x) times c(x)/x^64, and g*(x) + * times ((c(x) mod x^64) + t(x)/x^64). (Multiplication by x^64 doesn't + * count since it is simply a shift or move.) + * + * An alternate reduction method, also based on Barrett reduction and + * described in [1], uses only shifts and XORs --- no multiplications. + * However, the method with multiplications requires fewer instructions + * and is faster on processors with fast carryless multiplication. + * + * [1] "Intel Carry-Less Multiplication Instruction and its Usage for + * Computing the GCM Mode", + * https://software.intel.com/sites/default/files/managed/72/cc/clmul-wp-rev-2.02-2014-04-20.pdf + */ + + /* 256-bit product is XH[1]:XM:XL[0], so c(x) is XH[1]:XM[1] */ + + /* T1 = t(x) = g*(x)(c(x)/x^64) */ + pmull2 T1.1q, GSTAR.2d, XH.2d + + /* T2 = g*(x)((c(x) mod x^64) + t(x)/x^64) */ + eor T2.16b, XM.16b, T1.16b + pmull2 T2.1q, GSTAR.2d, T2.2d + + /* Make XL[0] be the low half of the 128-bit result by adding the low 64 + * bits of the T2 term to what was already there. The 't(x)x^64' term + * makes no difference, so skip it. */ + eor XL.16b, XL.16b, T2.16b + + /* Make XL[1] be the high half of the 128-bit result by adding the high + * 64 bits of the 't(x)x^64' and T2 terms to what was already in XM[0], + * then moving XM[0] to XL[1]. */ + eor XM.16b, XM.16b, T1.16b + ext T2.16b, T2.16b, T2.16b, #8 + eor XM.16b, XM.16b, T2.16b + mov XL.d[1], XM.d[0] + + /* If more blocks remain, then loop back to process the next block; + * else, store the digest and return. */ + cbnz w3, 0b + st1 {XL.16b}, [x0] + ret +ENDPROC(pmull_poly_hash_update) diff --git a/arch/arm64/crypto/poly-hash-ce-glue.c b/arch/arm64/crypto/poly-hash-ce-glue.c new file mode 100644 index 000000000000..e195740c9ecf --- /dev/null +++ b/arch/arm64/crypto/poly-hash-ce-glue.c @@ -0,0 +1,166 @@ +/* + * Accelerated poly_hash implementation with ARMv8 PMULL instructions. + * + * Based on ghash-ce-glue.c. + * + * poly_hash is part of the HEH (Hash-Encrypt-Hash) encryption mode, proposed in + * Internet Draft https://tools.ietf.org/html/draft-cope-heh-01. + * + * poly_hash is very similar to GHASH: both algorithms are keyed hashes which + * interpret their input data as coefficients of a polynomial over GF(2^128), + * then calculate a hash value by evaluating that polynomial at the point given + * by the key, e.g. using Horner's rule. The difference is that poly_hash uses + * the more natural "ble" convention to represent GF(2^128) elements, whereas + * GHASH uses the less natural "lle" convention (see include/crypto/gf128mul.h). + * The ble convention makes it simpler to implement GF(2^128) multiplication. + * + * Copyright (C) 2014 Linaro Ltd. + * Copyright (C) 2017 Google Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include + +/* + * Note: in this algorithm we currently use 'le128' to represent GF(2^128) + * elements, even though poly_hash-generic uses 'be128'. Both types are + * actually "wrong" because the elements are actually in 'ble' format, and there + * should be a ble type to represent this --- as well as lle, bbe, and lbe types + * for the other conventions for representing GF(2^128) elements. But + * practically it doesn't matter which type we choose here, so we just use le128 + * since it's arguably more accurate, while poly_hash-generic still has to use + * be128 because the generic GF(2^128) multiplication functions all take be128. + */ + +struct poly_hash_desc_ctx { + le128 digest; + unsigned int count; +}; + +asmlinkage void pmull_poly_hash_update(le128 *digest, const le128 *key, + const u8 *src, unsigned int blocks, + unsigned int partial); + +static int poly_hash_setkey(struct crypto_shash *tfm, + const u8 *key, unsigned int keylen) +{ + if (keylen != sizeof(le128)) { + crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN); + return -EINVAL; + } + + memcpy(crypto_shash_ctx(tfm), key, sizeof(le128)); + return 0; +} + +static int poly_hash_init(struct shash_desc *desc) +{ + struct poly_hash_desc_ctx *ctx = shash_desc_ctx(desc); + + ctx->digest = (le128) { 0 }; + ctx->count = 0; + return 0; +} + +static int poly_hash_update(struct shash_desc *desc, const u8 *src, + unsigned int len) +{ + struct poly_hash_desc_ctx *ctx = shash_desc_ctx(desc); + unsigned int partial = ctx->count % sizeof(le128); + u8 *dst = (u8 *)&ctx->digest + partial; + + ctx->count += len; + + /* Finishing at least one block? */ + if (partial + len >= sizeof(le128)) { + const le128 *key = crypto_shash_ctx(desc->tfm); + + if (partial) { + /* Finish the pending block. */ + unsigned int n = sizeof(le128) - partial; + + len -= n; + do { + *dst++ ^= *src++; + } while (--n); + } + + /* + * Do the real work. If 'partial' is nonzero, this starts by + * multiplying 'digest' by 'key'. Then for each additional full + * block it adds the block to 'digest' and multiplies by 'key'. + */ + kernel_neon_begin_partial(8); + pmull_poly_hash_update(&ctx->digest, key, src, + len / sizeof(le128), partial); + kernel_neon_end(); + + src += len - (len % sizeof(le128)); + len %= sizeof(le128); + dst = (u8 *)&ctx->digest; + } + + /* Continue adding the next block to 'digest'. */ + while (len--) + *dst++ ^= *src++; + return 0; +} + +static int poly_hash_final(struct shash_desc *desc, u8 *out) +{ + struct poly_hash_desc_ctx *ctx = shash_desc_ctx(desc); + unsigned int partial = ctx->count % sizeof(le128); + + /* Finish the last block if needed. */ + if (partial) { + const le128 *key = crypto_shash_ctx(desc->tfm); + + kernel_neon_begin_partial(8); + pmull_poly_hash_update(&ctx->digest, key, NULL, 0, partial); + kernel_neon_end(); + } + + memcpy(out, &ctx->digest, sizeof(le128)); + return 0; +} + +static struct shash_alg poly_hash_alg = { + .digestsize = sizeof(le128), + .init = poly_hash_init, + .update = poly_hash_update, + .final = poly_hash_final, + .setkey = poly_hash_setkey, + .descsize = sizeof(struct poly_hash_desc_ctx), + .base = { + .cra_name = "poly_hash", + .cra_driver_name = "poly_hash-ce", + .cra_priority = 300, + .cra_ctxsize = sizeof(le128), + .cra_module = THIS_MODULE, + }, +}; + +static int __init poly_hash_ce_mod_init(void) +{ + return crypto_register_shash(&poly_hash_alg); +} + +static void __exit poly_hash_ce_mod_exit(void) +{ + crypto_unregister_shash(&poly_hash_alg); +} + +MODULE_DESCRIPTION("Polynomial evaluation hash using ARMv8 Crypto Extensions"); +MODULE_AUTHOR("Eric Biggers "); +MODULE_LICENSE("GPL v2"); + +module_cpu_feature_match(PMULL, poly_hash_ce_mod_init); +module_exit(poly_hash_ce_mod_exit); diff --git a/crypto/Kconfig b/crypto/Kconfig index 627227f1162d..3240d394426c 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -295,6 +295,7 @@ config CRYPTO_HEH select CRYPTO_ECB select CRYPTO_GF128MUL select CRYPTO_MANAGER + select CRYPTO_POLY_HASH_ARM64_CE if ARM64 && KERNEL_MODE_NEON help HEH: Hash-Encrypt-Hash mode HEH is a proposed block cipher mode of operation which extends the From 0516ccdb1433d6579d2f52994753859d2bf9e9bc Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 10 Jan 2017 17:02:39 -0800 Subject: [PATCH 226/521] ANDROID: ext4: allow encrypting filenames using HEH algorithm Update ext4 encryption to allow filenames to be encrypted using the Hash-Encrypt-Hash (HEH) block cipher mode of operation, which is believed to be more secure than CBC, particularly within the constant initialization vector (IV) constraint of filename encryption. Notably, HEH avoids the "common prefix" problem of CBC. Both algorithms use AES-256 as the underlying block cipher and take a 256-bit key. We assign mode number 126 to HEH, just below 127 (EXT4_ENCRYPTION_MODE_PRIVATE) which in some kernels is reserved for inline encryption on MSM chipsets. Note that these modes are not yet upstream, which is why these numbers are being used; it's preferable to avoid collisions with modes that may be added upstream. Also, although HEH is not hardware-specific, we aren't currently reserving mode number 5 for HEH upstream, since for now we are tying HEH to the new key derivation method which might become an independent flag upstream, and there's also a chance that details of HEH will change after it gets wider review. Bug: 32975945 Signed-off-by: Eric Biggers Change-Id: I81418709d47da0e0ac607ae3f91088063c2d5dd4 --- fs/ext4/Kconfig | 1 + fs/ext4/crypto_fname.c | 3 ++- fs/ext4/crypto_key.c | 3 +++ fs/ext4/ext4.h | 1 + fs/ext4/ext4_crypto.h | 3 +++ 5 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig index b46e9fc64196..3c8293215603 100644 --- a/fs/ext4/Kconfig +++ b/fs/ext4/Kconfig @@ -106,6 +106,7 @@ config EXT4_ENCRYPTION select CRYPTO_ECB select CRYPTO_XTS select CRYPTO_CTS + select CRYPTO_HEH select CRYPTO_CTR select CRYPTO_SHA256 select KEYS diff --git a/fs/ext4/crypto_fname.c b/fs/ext4/crypto_fname.c index 2fbef8a14760..e2645ca9b95e 100644 --- a/fs/ext4/crypto_fname.c +++ b/fs/ext4/crypto_fname.c @@ -44,7 +44,8 @@ static void ext4_dir_crypt_complete(struct crypto_async_request *req, int res) bool ext4_valid_filenames_enc_mode(uint32_t mode) { - return (mode == EXT4_ENCRYPTION_MODE_AES_256_CTS); + return (mode == EXT4_ENCRYPTION_MODE_AES_256_CTS || + mode == EXT4_ENCRYPTION_MODE_AES_256_HEH); } static unsigned max_name_len(struct inode *inode) diff --git a/fs/ext4/crypto_key.c b/fs/ext4/crypto_key.c index 505f8afde57c..20e7f1f50283 100644 --- a/fs/ext4/crypto_key.c +++ b/fs/ext4/crypto_key.c @@ -172,6 +172,9 @@ int ext4_get_encryption_info(struct inode *inode) case EXT4_ENCRYPTION_MODE_AES_256_CTS: cipher_str = "cts(cbc(aes))"; break; + case EXT4_ENCRYPTION_MODE_AES_256_HEH: + cipher_str = "heh(aes)"; + break; default: printk_once(KERN_WARNING "ext4: unsupported key mode %d (ino %u)\n", diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index de05f161c850..ac720a31e4ff 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -589,6 +589,7 @@ enum { #define EXT4_ENCRYPTION_MODE_AES_256_GCM 2 #define EXT4_ENCRYPTION_MODE_AES_256_CBC 3 #define EXT4_ENCRYPTION_MODE_AES_256_CTS 4 +#define EXT4_ENCRYPTION_MODE_AES_256_HEH 126 #include "ext4_crypto.h" diff --git a/fs/ext4/ext4_crypto.h b/fs/ext4/ext4_crypto.h index 1b17b05b9f4d..acdeb3e032d3 100644 --- a/fs/ext4/ext4_crypto.h +++ b/fs/ext4/ext4_crypto.h @@ -60,6 +60,7 @@ struct ext4_encryption_context { #define EXT4_AES_256_GCM_KEY_SIZE 32 #define EXT4_AES_256_CBC_KEY_SIZE 32 #define EXT4_AES_256_CTS_KEY_SIZE 32 +#define EXT4_AES_256_HEH_KEY_SIZE 32 #define EXT4_AES_256_XTS_KEY_SIZE 64 #define EXT4_MAX_KEY_SIZE 64 @@ -120,6 +121,8 @@ static inline int ext4_encryption_key_size(int mode) return EXT4_AES_256_CBC_KEY_SIZE; case EXT4_ENCRYPTION_MODE_AES_256_CTS: return EXT4_AES_256_CTS_KEY_SIZE; + case EXT4_ENCRYPTION_MODE_AES_256_HEH: + return EXT4_AES_256_HEH_KEY_SIZE; default: BUG(); } From e178cb830d14322e96f3616eef127649d809babc Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 10 Jan 2017 17:02:40 -0800 Subject: [PATCH 227/521] ANDROID: ext4: add a non-reversible key derivation method Add a new per-file key derivation method to ext4 encryption defined as: derived_key[0:127] = AES-256-ENCRYPT(master_key[0:255], nonce) derived_key[128:255] = AES-256-ENCRYPT(master_key[0:255], nonce ^ 0x01) derived_key[256:383] = AES-256-ENCRYPT(master_key[256:511], nonce) derived_key[384:511] = AES-256-ENCRYPT(master_key[256:511], nonce ^ 0x01) ... where the derived key and master key are both 512 bits, the nonce is 128 bits, AES-256-ENCRYPT takes the arguments (key, plaintext), and 'nonce ^ 0x01' denotes flipping the low order bit of the last byte. The existing key derivation method is 'derived_key = AES-128-ECB-ENCRYPT(key=nonce, plaintext=master_key)'. We want to make this change because currently, given a derived key you can easily compute the master key by computing 'AES-128-ECB-DECRYPT(key=nonce, ciphertext=derived_key)'. This was formerly OK because the previous threat model assumed that the master key and derived keys are equally hard to obtain by an attacker. However, we are looking to move the master key into secure hardware in some cases, so we want to make sure that an attacker with access to a derived key cannot compute the master key. We are doing this instead of increasing the nonce to 512 bits because it's important that the per-file xattr fit in the inode itself. By default, inodes are 256 bytes, and on Android we're already pretty close to that limit. If we increase the nonce size, we end up allocating a new filesystem block for each and every encrypted file, which has a substantial performance and disk utilization impact. Another option considered was to use the HMAC-SHA512 of the nonce, keyed by the master key. However this would be a little less performant, would be less extensible to other key sizes and MAC algorithms, and would pull in a dependency (security-wise and code-wise) on SHA-512. Due to the use of "aes" rather than "ecb(aes)" in the implementation, the new key derivation method is actually about twice as fast as the old one, though the old one could be optimized similarly as well. This patch makes the new key derivation method be used whenever HEH is used to encrypt filenames. Although these two features are logically independent, it was decided to bundle them together for now. Note that neither feature is upstream yet, and it cannot be guaranteed that the on-disk format won't change if/when these features are upstreamed. For this reason, and as noted in the previous patch, the features are both behind a special mode number for now. Signed-off-by: Eric Biggers Change-Id: Iee4113f57e59dc8c0b7dc5238d7003c83defb986 --- fs/ext4/crypto_key.c | 98 +++++++++++++++++++++++++++++++++++++++---- fs/ext4/ext4_crypto.h | 1 + 2 files changed, 92 insertions(+), 7 deletions(-) diff --git a/fs/ext4/crypto_key.c b/fs/ext4/crypto_key.c index 20e7f1f50283..22096e31a720 100644 --- a/fs/ext4/crypto_key.c +++ b/fs/ext4/crypto_key.c @@ -29,16 +29,16 @@ static void derive_crypt_complete(struct crypto_async_request *req, int rc) } /** - * ext4_derive_key_aes() - Derive a key using AES-128-ECB + * ext4_derive_key_v1() - Derive a key using AES-128-ECB * @deriving_key: Encryption key used for derivation. * @source_key: Source key to which to apply derivation. * @derived_key: Derived key. * - * Return: Zero on success; non-zero otherwise. + * Return: 0 on success, -errno on failure */ -static int ext4_derive_key_aes(char deriving_key[EXT4_AES_128_ECB_KEY_SIZE], - char source_key[EXT4_AES_256_XTS_KEY_SIZE], - char derived_key[EXT4_AES_256_XTS_KEY_SIZE]) +static int ext4_derive_key_v1(const char deriving_key[EXT4_AES_128_ECB_KEY_SIZE], + const char source_key[EXT4_AES_256_XTS_KEY_SIZE], + char derived_key[EXT4_AES_256_XTS_KEY_SIZE]) { int res = 0; struct ablkcipher_request *req = NULL; @@ -83,6 +83,91 @@ static int ext4_derive_key_aes(char deriving_key[EXT4_AES_128_ECB_KEY_SIZE], return res; } +/** + * ext4_derive_key_v2() - Derive a key non-reversibly + * @nonce: the nonce associated with the file + * @master_key: the master key referenced by the file + * @derived_key: (output) the resulting derived key + * + * This function computes the following: + * derived_key[0:127] = AES-256-ENCRYPT(master_key[0:255], nonce) + * derived_key[128:255] = AES-256-ENCRYPT(master_key[0:255], nonce ^ 0x01) + * derived_key[256:383] = AES-256-ENCRYPT(master_key[256:511], nonce) + * derived_key[384:511] = AES-256-ENCRYPT(master_key[256:511], nonce ^ 0x01) + * + * 'nonce ^ 0x01' denotes flipping the low order bit of the last byte. + * + * Unlike the v1 algorithm, the v2 algorithm is "non-reversible", meaning that + * compromising a derived key does not also compromise the master key. + * + * Return: 0 on success, -errno on failure + */ +static int ext4_derive_key_v2(const char nonce[EXT4_KEY_DERIVATION_NONCE_SIZE], + const char master_key[EXT4_MAX_KEY_SIZE], + char derived_key[EXT4_MAX_KEY_SIZE]) +{ + const int noncelen = EXT4_KEY_DERIVATION_NONCE_SIZE; + struct crypto_cipher *tfm; + int err; + int i; + + /* + * Since we only use each transform for a small number of encryptions, + * requesting just "aes" turns out to be significantly faster than + * "ecb(aes)", by about a factor of two. + */ + tfm = crypto_alloc_cipher("aes", 0, 0); + if (IS_ERR(tfm)) + return PTR_ERR(tfm); + + BUILD_BUG_ON(4 * EXT4_KEY_DERIVATION_NONCE_SIZE != EXT4_MAX_KEY_SIZE); + BUILD_BUG_ON(2 * EXT4_AES_256_ECB_KEY_SIZE != EXT4_MAX_KEY_SIZE); + for (i = 0; i < 2; i++) { + memcpy(derived_key, nonce, noncelen); + memcpy(derived_key + noncelen, nonce, noncelen); + derived_key[2 * noncelen - 1] ^= 0x01; + err = crypto_cipher_setkey(tfm, master_key, + EXT4_AES_256_ECB_KEY_SIZE); + if (err) + break; + crypto_cipher_encrypt_one(tfm, derived_key, derived_key); + crypto_cipher_encrypt_one(tfm, derived_key + noncelen, + derived_key + noncelen); + master_key += EXT4_AES_256_ECB_KEY_SIZE; + derived_key += 2 * noncelen; + } + crypto_free_cipher(tfm); + return err; +} + +/** + * ext4_derive_key() - Derive a per-file key from a nonce and master key + * @ctx: the encryption context associated with the file + * @master_key: the master key referenced by the file + * @derived_key: (output) the resulting derived key + * + * Return: 0 on success, -errno on failure + */ +static int ext4_derive_key(const struct ext4_encryption_context *ctx, + const char master_key[EXT4_MAX_KEY_SIZE], + char derived_key[EXT4_MAX_KEY_SIZE]) +{ + BUILD_BUG_ON(EXT4_AES_128_ECB_KEY_SIZE != EXT4_KEY_DERIVATION_NONCE_SIZE); + BUILD_BUG_ON(EXT4_AES_256_XTS_KEY_SIZE != EXT4_MAX_KEY_SIZE); + + /* + * Although the key derivation algorithm is logically independent of the + * choice of encryption modes, in this kernel it is bundled with HEH + * encryption of filenames, which is another crypto improvement that + * requires an on-disk format change and requires userspace to specify + * different encryption policies. + */ + if (ctx->filenames_encryption_mode == EXT4_ENCRYPTION_MODE_AES_256_HEH) + return ext4_derive_key_v2(ctx->nonce, master_key, derived_key); + else + return ext4_derive_key_v1(ctx->nonce, master_key, derived_key); +} + void ext4_free_crypt_info(struct ext4_crypt_info *ci) { if (!ci) @@ -223,8 +308,7 @@ int ext4_get_encryption_info(struct inode *inode) up_read(&keyring_key->sem); goto out; } - res = ext4_derive_key_aes(ctx.nonce, master_key->raw, - raw_key); + res = ext4_derive_key(&ctx, master_key->raw, raw_key); up_read(&keyring_key->sem); if (res) goto out; diff --git a/fs/ext4/ext4_crypto.h b/fs/ext4/ext4_crypto.h index acdeb3e032d3..e52637d969db 100644 --- a/fs/ext4/ext4_crypto.h +++ b/fs/ext4/ext4_crypto.h @@ -58,6 +58,7 @@ struct ext4_encryption_context { #define EXT4_XTS_TWEAK_SIZE 16 #define EXT4_AES_128_ECB_KEY_SIZE 16 #define EXT4_AES_256_GCM_KEY_SIZE 32 +#define EXT4_AES_256_ECB_KEY_SIZE 32 #define EXT4_AES_256_CBC_KEY_SIZE 32 #define EXT4_AES_256_CTS_KEY_SIZE 32 #define EXT4_AES_256_HEH_KEY_SIZE 32 From 9068b415feec9fb4c04ee586d3d047dcfc0dc8a9 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Tue, 14 Feb 2017 20:47:17 -0800 Subject: [PATCH 228/521] ANDROID: sdcardfs: Fix incorrect hash This adds back the hash calculation removed as part of the previous patch, as it is in fact necessary. Signed-off-by: Daniel Rosenberg Bug: 35307857 Change-Id: Ie607332bcf2c5d2efdf924e4060ef3f576bf25dc --- fs/sdcardfs/lookup.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index 9135866b7766..6b595e892316 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -221,6 +221,7 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, struct dentry *lower_dentry; const struct qstr *name; struct path lower_path; + struct qstr dname; struct sdcardfs_sb_info *sbi; sbi = SDCARDFS_SB(dentry->d_sb); @@ -306,11 +307,14 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, goto out; /* instatiate a new negative dentry */ - lower_dentry = d_lookup(lower_dir_dentry, name); + dname.name = name->name; + dname.len = name->len; + dname.hash = full_name_hash(dname.name, dname.len); + lower_dentry = d_lookup(lower_dir_dentry, &dname); if (lower_dentry) goto setup_lower; - lower_dentry = d_alloc(lower_dir_dentry, name); + lower_dentry = d_alloc(lower_dir_dentry, &dname); if (!lower_dentry) { err = -ENOMEM; goto out; From ddce85b2380591d209b4083813256a816e10341f Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 16 Feb 2017 17:55:22 -0800 Subject: [PATCH 229/521] ANDROID: sdcardfs: Add missing path_put "ANDROID: sdcardfs: Add GID Derivation to sdcardfs" introduced an unbalanced pat_get, leading to storage space not being freed after deleting a file until rebooting. This adds the missing path_put. Signed-off-by: Daniel Rosenberg Bug: 34691169 Change-Id: Ia7ef97ec2eca2c555cc06b235715635afc87940e --- fs/sdcardfs/derived_perm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index 0bb442338a85..ca239a942065 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -243,6 +243,7 @@ void fixup_lower_ownership(struct dentry* dentry, const char *name) { if (error) pr_err("sdcardfs: Failed to touch up lower fs gid/uid.\n"); } + sdcardfs_put_lower_path(dentry, &path); } static int descendant_may_need_fixup(struct sdcardfs_inode_info *info, struct limit_search *limit) { From bab06182f0c8c1baeb93ae13c279f3e660bb094d Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Wed, 22 Feb 2017 14:41:58 -0800 Subject: [PATCH 230/521] ANDROID: sdcardfs: Don't bother deleting freelist There is no point deleting entries from dlist, as that is a temporary list on the stack from which contains only entries that are being deleted. Not all code paths set up dlist, so those that don't were performing invalid accesses in hash_del_rcu. As an additional means to prevent any other issue, we null out the list entries when we allocate from the cache. Signed-off-by: Daniel Rosenberg Bug: 35666680 Change-Id: Ibb1e28c08c3a600c29418d39ba1c0f3db3bf31e5 --- fs/sdcardfs/packagelist.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/sdcardfs/packagelist.c b/fs/sdcardfs/packagelist.c index d96fcde041cc..56d643f4a9ee 100644 --- a/fs/sdcardfs/packagelist.c +++ b/fs/sdcardfs/packagelist.c @@ -178,6 +178,8 @@ static struct hashtable_entry *alloc_hashtable_entry(const struct qstr *key, GFP_KERNEL); if (!ret) return NULL; + INIT_HLIST_NODE(&ret->dlist); + INIT_HLIST_NODE(&ret->hlist); if (!qstr_copy(key, &ret->key)) { kmem_cache_free(hashtable_entry_cachep, ret); @@ -326,7 +328,6 @@ static int insert_userid_exclude_entry(const struct qstr *key, userid_t value) static void free_hashtable_entry(struct hashtable_entry *entry) { kfree(entry->key.name); - hash_del_rcu(&entry->dlist); kmem_cache_free(hashtable_entry_cachep, entry); } From b20531a034b622854ff2ce5feef5c617b1d21bc6 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Fri, 24 Feb 2017 15:41:48 -0800 Subject: [PATCH 231/521] ANDROID: sdcardfs: implement vm_ops->page_mkwrite This comes from the wrapfs patch 3dfec0ffe5e2 Wrapfs: implement vm_ops->page_mkwrite Some file systems (e.g., ext4) require it. Reported by Ted Ts'o. Signed-off-by: Erez Zadok Signed-off-by: Daniel Rosenberg Bug: 34133558 Change-Id: I1a389b2422c654a6d3046bb8ec3e20511aebfa8e --- fs/sdcardfs/mmap.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/fs/sdcardfs/mmap.c b/fs/sdcardfs/mmap.c index e21f64675a80..7dd715875ef1 100644 --- a/fs/sdcardfs/mmap.c +++ b/fs/sdcardfs/mmap.c @@ -48,6 +48,39 @@ static int sdcardfs_fault(struct vm_area_struct *vma, struct vm_fault *vmf) return err; } +static int sdcardfs_page_mkwrite(struct vm_area_struct *vma, + struct vm_fault *vmf) +{ + int err = 0; + struct file *file, *lower_file; + const struct vm_operations_struct *lower_vm_ops; + struct vm_area_struct lower_vma; + + memcpy(&lower_vma, vma, sizeof(struct vm_area_struct)); + file = lower_vma.vm_file; + lower_vm_ops = SDCARDFS_F(file)->lower_vm_ops; + BUG_ON(!lower_vm_ops); + if (!lower_vm_ops->page_mkwrite) + goto out; + + lower_file = sdcardfs_lower_file(file); + /* + * XXX: vm_ops->page_mkwrite may be called in parallel. + * Because we have to resort to temporarily changing the + * vma->vm_file to point to the lower file, a concurrent + * invocation of sdcardfs_page_mkwrite could see a different + * value. In this workaround, we keep a different copy of the + * vma structure in our stack, so we never expose a different + * value of the vma->vm_file called to us, even temporarily. + * A better fix would be to change the calling semantics of + * ->page_mkwrite to take an explicit file pointer. + */ + lower_vma.vm_file = lower_file; + err = lower_vm_ops->page_mkwrite(&lower_vma, vmf); +out: + return err; +} + static ssize_t sdcardfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t pos) { @@ -78,4 +111,5 @@ const struct address_space_operations sdcardfs_aops = { const struct vm_operations_struct sdcardfs_vm_ops = { .fault = sdcardfs_fault, + .page_mkwrite = sdcardfs_page_mkwrite, }; From ba832b0760889ce6434f9895789ffcda3d5da6e6 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Fri, 24 Feb 2017 15:49:45 -0800 Subject: [PATCH 232/521] ANDROID: sdcardfs: support direct-IO (DIO) operations This comes from the wrapfs patch 2e346c83b26e Wrapfs: support direct-IO (DIO) operations Signed-off-by: Li Mengyang Signed-off-by: Erez Zadok Signed-off-by: Daniel Rosenberg Bug: 34133558 Change-Id: I3fd779c510ab70d56b1d918f99c20421b524cdc4 --- fs/sdcardfs/mmap.c | 21 ++++----------------- fs/sdcardfs/sdcardfs.h | 1 + 2 files changed, 5 insertions(+), 17 deletions(-) diff --git a/fs/sdcardfs/mmap.c b/fs/sdcardfs/mmap.c index 7dd715875ef1..0d4089c62c3a 100644 --- a/fs/sdcardfs/mmap.c +++ b/fs/sdcardfs/mmap.c @@ -85,27 +85,14 @@ static ssize_t sdcardfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t pos) { /* - * This function returns zero on purpose in order to support direct IO. - * __dentry_open checks a_ops->direct_IO and returns EINVAL if it is null. - * - * However, this function won't be called by certain file operations - * including generic fs functions. * reads and writes are delivered to - * the lower file systems and the direct IOs will be handled by them. - * - * NOTE: exceptionally, on the recent kernels (since Linux 3.8.x), - * swap_writepage invokes this function directly. + * This function should never be called directly. We need it + * to exist, to get past a check in open_check_o_direct(), + * which is called from do_last(). */ - printk(KERN_INFO "%s, operation is not supported\n", __func__); - return 0; + return -EINVAL; } -/* - * XXX: the default address_space_ops for sdcardfs is empty. We cannot set - * our inode->i_mapping->a_ops to NULL because too many code paths expect - * the a_ops vector to be non-NULL. - */ const struct address_space_operations sdcardfs_aops = { - /* empty on purpose */ .direct_IO = sdcardfs_direct_IO, }; diff --git a/fs/sdcardfs/sdcardfs.h b/fs/sdcardfs/sdcardfs.h index f3cced313108..042f989f0bea 100644 --- a/fs/sdcardfs/sdcardfs.h +++ b/fs/sdcardfs/sdcardfs.h @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include From 3894650a05f6a0c5d3fe7c4fa72ad0627f6e9f3d Mon Sep 17 00:00:00 2001 From: Chris Redpath Date: Mon, 17 Jun 2013 18:36:56 +0100 Subject: [PATCH 233/521] cpufreq: interactive governor drops bits in time calculation Keep time calculation in 64-bit throughout. If we have long times between idle calculations this can result in deltas > 32 bits which causes incorrect load percentage calculations and selecting the wrong frequencies if we truncate here. Signed-off-by: Chris Redpath --- drivers/cpufreq/cpufreq_interactive.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/cpufreq/cpufreq_interactive.c b/drivers/cpufreq/cpufreq_interactive.c index f2929e628820..889c9b8b2237 100644 --- a/drivers/cpufreq/cpufreq_interactive.c +++ b/drivers/cpufreq/cpufreq_interactive.c @@ -312,13 +312,13 @@ static u64 update_load(int cpu) pcpu->policy->governor_data; u64 now; u64 now_idle; - unsigned int delta_idle; - unsigned int delta_time; + u64 delta_idle; + u64 delta_time; u64 active_time; now_idle = get_cpu_idle_time(cpu, &now, tunables->io_is_busy); - delta_idle = (unsigned int)(now_idle - pcpu->time_in_idle); - delta_time = (unsigned int)(now - pcpu->time_in_idle_timestamp); + delta_idle = (now_idle - pcpu->time_in_idle); + delta_time = (now - pcpu->time_in_idle_timestamp); if (delta_time <= delta_idle) active_time = 0; From e5a34995d014b60826045c90d6a1027e87a6bb66 Mon Sep 17 00:00:00 2001 From: Anson Jacob Date: Thu, 17 Nov 2016 02:32:40 -0500 Subject: [PATCH 234/521] ANDROID: usb: gadget: function: Fix commenting style Fix checkpatch.pl warning: Block comments use * on subsequent lines Change-Id: I9c92f128fdb3aeeb6ab9c7039e11f857bebb9539 Signed-off-by: Anson Jacob --- drivers/usb/gadget/function/f_accessory.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/usb/gadget/function/f_accessory.c b/drivers/usb/gadget/function/f_accessory.c index 9d3ec0e37475..f2fa0c271d70 100644 --- a/drivers/usb/gadget/function/f_accessory.c +++ b/drivers/usb/gadget/function/f_accessory.c @@ -676,9 +676,10 @@ static ssize_t acc_write(struct file *fp, const char __user *buf, req->zero = 0; } else { xfer = count; - /* If the data length is a multple of the + /* + * If the data length is a multple of the * maxpacket size then send a zero length packet(ZLP). - */ + */ req->zero = ((xfer % dev->ep_in->maxpacket) == 0); } if (copy_from_user(req->buf, buf, xfer)) { @@ -820,11 +821,11 @@ int acc_ctrlrequest(struct usb_composite_dev *cdev, unsigned long flags; /* - printk(KERN_INFO "acc_ctrlrequest " - "%02x.%02x v%04x i%04x l%u\n", - b_requestType, b_request, - w_value, w_index, w_length); -*/ + * printk(KERN_INFO "acc_ctrlrequest " + * "%02x.%02x v%04x i%04x l%u\n", + * b_requestType, b_request, + * w_value, w_index, w_length); + */ if (b_requestType == (USB_DIR_OUT | USB_TYPE_VENDOR)) { if (b_request == ACCESSORY_START) { From 6d2d31fe1b905f1db047d1ea4c09999a15b32e07 Mon Sep 17 00:00:00 2001 From: Greg Hackmann Date: Mon, 14 Nov 2016 09:48:02 -0800 Subject: [PATCH 235/521] ANDROID: dm: android-verity: fix table_make_digest() error handling If table_make_digest() fails, verify_verity_signature() would try to pass the returned ERR_PTR() to kfree(). This fixes the smatch error: drivers/md/dm-android-verity.c:601 verify_verity_signature() error: 'pks' dereferencing possible ERR_PTR() Change-Id: I9b9b7764b538cb4a5f94337660e9b0f149b139be Signed-off-by: Greg Hackmann --- drivers/md/dm-android-verity.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/md/dm-android-verity.c b/drivers/md/dm-android-verity.c index bb6c1285e499..ec0a4d19ca3e 100644 --- a/drivers/md/dm-android-verity.c +++ b/drivers/md/dm-android-verity.c @@ -585,6 +585,8 @@ static int verify_verity_signature(char *key_id, if (IS_ERR(pks)) { DMERR("hashing failed"); + retval = PTR_ERR(pks); + pks = NULL; goto error; } From f2688d5b1c6844b8660218e30a5b74fffd0a117f Mon Sep 17 00:00:00 2001 From: Subash Abhinov Kasiviswanathan Date: Thu, 10 Nov 2016 19:36:15 -0700 Subject: [PATCH 236/521] nf: IDLETIMER: Fix use after free condition during work schedule_work(&timer->work) appears to be called after cancel_work_sync(&info->timer->work) is completed. Work can be scheduled from the PM_POST_SUSPEND notification event even after cancel_work_sync is called. Call stack -004|notify_netlink_uevent( | [X19] timer = 0xFFFFFFC0A5DFC780 -> ( | ... | [NSD:0xFFFFFFC0A5DFC800] kobj = 0x6B6B6B6B6B6B6B6B, | [NSD:0xFFFFFFC0A5DFC868] timeout = 0x6B6B6B6B, | [NSD:0xFFFFFFC0A5DFC86C] refcnt = 0x6B6B6B6B, | [NSD:0xFFFFFFC0A5DFC870] work_pending = 0x6B, | [NSD:0xFFFFFFC0A5DFC871] send_nl_msg = 0x6B, | [NSD:0xFFFFFFC0A5DFC872] active = 0x6B, | [NSD:0xFFFFFFC0A5DFC874] uid = 0x6B6B6B6B, | [NSD:0xFFFFFFC0A5DFC878] suspend_time_valid = 0x6B)) -005|idletimer_tg_work( -006|__read_once_size(inline) -006|static_key_count(inline) -006|static_key_false(inline) -006|trace_workqueue_execute_end(inline) -006|process_one_work( -007|worker_thread( -008|kthread( -009|ret_from_fork(asm) ---|end of frame Force any pending idletimer_tg_work() to complete before freeing the associated work struct and after unregistering to the pm_notifier callback. Change-Id: I4c5f0a1c142f7d698c092cf7bcafdb0f9fbaa9c1 Signed-off-by: Subash Abhinov Kasiviswanathan --- net/netfilter/xt_IDLETIMER.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c index 0975c993a94e..ada5a304e61e 100644 --- a/net/netfilter/xt_IDLETIMER.c +++ b/net/netfilter/xt_IDLETIMER.c @@ -456,6 +456,7 @@ static void idletimer_tg_destroy(const struct xt_tgdtor_param *par) del_timer_sync(&info->timer->timer); sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr); unregister_pm_notifier(&info->timer->pm_nb); + cancel_work_sync(&info->timer->work); kfree(info->timer->attr.attr.name); kfree(info->timer); } else { From 216682ad26c1690e66e177440cff1cc254d7debe Mon Sep 17 00:00:00 2001 From: Subash Abhinov Kasiviswanathan Date: Wed, 2 Nov 2016 11:56:40 -0600 Subject: [PATCH 237/521] nf: IDLETIMER: Use fullsock when querying uid sock_i_uid() acquires the sk_callback_lock which does not exist for sockets in TCP_NEW_SYN_RECV state. This results in errors showing up as spinlock bad magic. Fix this by looking for the full sock as suggested by Eric. Callstack for reference - -003|rwlock_bug -004|arch_read_lock -004|do_raw_read_lock -005|raw_read_lock_bh -006|sock_i_uid -007|from_kuid_munged(inline) -007|reset_timer -008|idletimer_tg_target -009|ipt_do_table -010|iptable_mangle_hook -011|nf_iterate -012|nf_hook_slow -013|NF_HOOK_COND(inline) -013|ip_output -014|ip_local_out -015|ip_build_and_send_pkt -016|tcp_v4_send_synack -017|atomic_sub_return(inline) -017|reqsk_put(inline) -017|tcp_conn_request -018|tcp_v4_conn_request -019|tcp_rcv_state_process -020|tcp_v4_do_rcv -021|tcp_v4_rcv -022|ip_local_deliver_finish -023|NF_HOOK_THRESH(inline) -023|NF_HOOK(inline) -023|ip_local_deliver -024|ip_rcv_finish -025|NF_HOOK_THRESH(inline) -025|NF_HOOK(inline) -025|ip_rcv -026|deliver_skb(inline) -026|deliver_ptype_list_skb(inline) -026|__netif_receive_skb_core -027|__netif_receive_skb -028|netif_receive_skb_internal -029|netif_receive_skb Change-Id: Ic8f3a3d2d7af31434d1163b03971994e2125d552 Signed-off-by: Subash Abhinov Kasiviswanathan Cc: Eric Dumazet --- net/netfilter/xt_IDLETIMER.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c index ada5a304e61e..f11aa28b96ce 100644 --- a/net/netfilter/xt_IDLETIMER.c +++ b/net/netfilter/xt_IDLETIMER.c @@ -49,6 +49,7 @@ #include #include #include +#include struct idletimer_tg_attr { struct attribute attr; @@ -355,7 +356,7 @@ static void reset_timer(const struct idletimer_tg_info *info, /* Stores the uid resposible for waking up the radio */ if (skb && (skb->sk)) { timer->uid = from_kuid_munged(current_user_ns(), - sock_i_uid(skb->sk)); + sock_i_uid(skb_to_full_sk(skb))); } /* checks if there is a pending inactive notification*/ From 0ffb1bdf34dfb31ff27af0fef8e4d8dea97d8ae5 Mon Sep 17 00:00:00 2001 From: Martijn Coenen Date: Tue, 7 Mar 2017 15:51:18 +0100 Subject: [PATCH 238/521] binder: use group leader instead of open thread The binder allocator assumes that the thread that called binder_open will never die for the lifetime of that proc. That thread is normally the group_leader, however it may not be. Use the group_leader instead of current. Bug: 35707103 Test: Created test case to open with temporary thread Change-Id: Id693f74b3591f3524a8c6e9508e70f3e5a80c588 Signed-off-by: Todd Kjos Signed-off-by: Martijn Coenen --- drivers/android/binder.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 6c24673990bb..24737816c4fe 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -3360,7 +3360,7 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma) const char *failure_string; struct binder_buffer *buffer; - if (proc->tsk != current) + if (proc->tsk != current->group_leader) return -EINVAL; if ((vma->vm_end - vma->vm_start) > SZ_4M) @@ -3462,8 +3462,8 @@ static int binder_open(struct inode *nodp, struct file *filp) proc = kzalloc(sizeof(*proc), GFP_KERNEL); if (proc == NULL) return -ENOMEM; - get_task_struct(current); - proc->tsk = current; + get_task_struct(current->group_leader); + proc->tsk = current->group_leader; INIT_LIST_HEAD(&proc->todo); init_waitqueue_head(&proc->wait); proc->default_priority = task_nice(current); From 687c683aa63d21d39fe9f8e309d63c040569b1a5 Mon Sep 17 00:00:00 2001 From: Martijn Coenen Date: Tue, 7 Mar 2017 15:54:56 +0100 Subject: [PATCH 239/521] android: binder: add padding to binder_fd_array_object. binder_fd_array_object starts with a 4-byte header, followed by a few fields that are 8 bytes when ANDROID_BINDER_IPC_32BIT=N. This can cause alignment issues in a 64-bit kernel with a 32-bit userspace, as on x86_32 an 8-byte primitive may be aligned to a 4-byte address. Pad with a __u32 to fix this. Change-Id: I4374ed2cc3ccd3c6a1474cb7209b53ebfd91077b Signed-off-by: Martijn Coenen --- include/uapi/linux/android/binder.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h index 51f891fb1b18..7668b5791c91 100644 --- a/include/uapi/linux/android/binder.h +++ b/include/uapi/linux/android/binder.h @@ -132,6 +132,7 @@ enum { /* struct binder_fd_array_object - object describing an array of fds in a buffer * @hdr: common header structure + * @pad: padding to ensure correct alignment * @num_fds: number of file descriptors in the buffer * @parent: index in offset array to buffer holding the fd array * @parent_offset: start offset of fd array in the buffer @@ -152,6 +153,7 @@ enum { */ struct binder_fd_array_object { struct binder_object_header hdr; + __u32 pad; binder_size_t num_fds; binder_size_t parent; binder_size_t parent_offset; From 2a4a4e058799a06b45025fc28842e2df8a011e7a Mon Sep 17 00:00:00 2001 From: Martijn Coenen Date: Fri, 30 Sep 2016 16:40:04 +0200 Subject: [PATCH 240/521] android: binder: move global binder state into context struct. This change moves all global binder state into the context struct, thereby completely separating the state and the locks between two different contexts. The debugfs entries remain global, printing entries from all contexts. Change-Id: If8e3e2bece7bc6f974b66fbcf1d91d529ffa62f0 Signed-off-by: Martijn Coenen --- drivers/android/binder.c | 392 ++++++++++++++++++++++++++------------- 1 file changed, 259 insertions(+), 133 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 24737816c4fe..48cae6cb9b7b 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -18,6 +18,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include #include #include #include @@ -46,19 +47,11 @@ #include #include "binder_trace.h" -static DEFINE_MUTEX(binder_main_lock); -static DEFINE_MUTEX(binder_deferred_lock); -static DEFINE_MUTEX(binder_mmap_lock); - static HLIST_HEAD(binder_devices); -static HLIST_HEAD(binder_procs); -static HLIST_HEAD(binder_deferred_list); -static HLIST_HEAD(binder_dead_nodes); static struct dentry *binder_debugfs_dir_entry_root; static struct dentry *binder_debugfs_dir_entry_proc; -static int binder_last_id; -static struct workqueue_struct *binder_deferred_workqueue; +atomic_t binder_last_id; #define BINDER_DEBUG_ENTRY(name) \ static int binder_##name##_open(struct inode *inode, struct file *file) \ @@ -173,20 +166,24 @@ enum binder_stat_types { struct binder_stats { int br[_IOC_NR(BR_FAILED_REPLY) + 1]; int bc[_IOC_NR(BC_REPLY_SG) + 1]; - int obj_created[BINDER_STAT_COUNT]; - int obj_deleted[BINDER_STAT_COUNT]; }; -static struct binder_stats binder_stats; +/* These are still global, since it's not always easy to get the context */ +struct binder_obj_stats { + atomic_t obj_created[BINDER_STAT_COUNT]; + atomic_t obj_deleted[BINDER_STAT_COUNT]; +}; + +static struct binder_obj_stats binder_obj_stats; static inline void binder_stats_deleted(enum binder_stat_types type) { - binder_stats.obj_deleted[type]++; + atomic_inc(&binder_obj_stats.obj_deleted[type]); } static inline void binder_stats_created(enum binder_stat_types type) { - binder_stats.obj_created[type]++; + atomic_inc(&binder_obj_stats.obj_created[type]); } struct binder_transaction_log_entry { @@ -207,8 +204,6 @@ struct binder_transaction_log { int full; struct binder_transaction_log_entry entry[32]; }; -static struct binder_transaction_log binder_transaction_log; -static struct binder_transaction_log binder_transaction_log_failed; static struct binder_transaction_log_entry *binder_transaction_log_add( struct binder_transaction_log *log) @@ -229,6 +224,21 @@ struct binder_context { struct binder_node *binder_context_mgr_node; kuid_t binder_context_mgr_uid; const char *name; + + struct mutex binder_main_lock; + struct mutex binder_deferred_lock; + struct mutex binder_mmap_lock; + + struct hlist_head binder_procs; + struct hlist_head binder_dead_nodes; + struct hlist_head binder_deferred_list; + + struct work_struct deferred_work; + struct workqueue_struct *binder_deferred_workqueue; + struct binder_transaction_log transaction_log; + struct binder_transaction_log transaction_log_failed; + + struct binder_stats binder_stats; }; struct binder_device { @@ -451,17 +461,18 @@ static long task_close_fd(struct binder_proc *proc, unsigned int fd) return retval; } -static inline void binder_lock(const char *tag) +static inline void binder_lock(struct binder_context *context, const char *tag) { trace_binder_lock(tag); - mutex_lock(&binder_main_lock); + mutex_lock(&context->binder_main_lock); trace_binder_locked(tag); } -static inline void binder_unlock(const char *tag) +static inline void binder_unlock(struct binder_context *context, + const char *tag) { trace_binder_unlock(tag); - mutex_unlock(&binder_main_lock); + mutex_unlock(&context->binder_main_lock); } static void binder_set_nice(long nice) @@ -946,7 +957,7 @@ static struct binder_node *binder_new_node(struct binder_proc *proc, binder_stats_created(BINDER_STAT_NODE); rb_link_node(&node->rb_node, parent, p); rb_insert_color(&node->rb_node, &proc->nodes); - node->debug_id = ++binder_last_id; + node->debug_id = atomic_inc_return(&binder_last_id); node->proc = proc; node->ptr = ptr; node->cookie = cookie; @@ -1088,7 +1099,7 @@ static struct binder_ref *binder_get_ref_for_node(struct binder_proc *proc, if (new_ref == NULL) return NULL; binder_stats_created(BINDER_STAT_REF); - new_ref->debug_id = ++binder_last_id; + new_ref->debug_id = atomic_inc_return(&binder_last_id); new_ref->proc = proc; new_ref->node = node; rb_link_node(&new_ref->rb_node_node, parent, p); @@ -1849,7 +1860,7 @@ static void binder_transaction(struct binder_proc *proc, binder_size_t last_fixup_min_off = 0; struct binder_context *context = proc->context; - e = binder_transaction_log_add(&binder_transaction_log); + e = binder_transaction_log_add(&context->transaction_log); e->call_type = reply ? 2 : !!(tr->flags & TF_ONE_WAY); e->from_proc = proc->pid; e->from_thread = thread->pid; @@ -1971,7 +1982,7 @@ static void binder_transaction(struct binder_proc *proc, } binder_stats_created(BINDER_STAT_TRANSACTION_COMPLETE); - t->debug_id = ++binder_last_id; + t->debug_id = atomic_inc_return(&binder_last_id); e->debug_id = t->debug_id; if (reply) @@ -2235,7 +2246,8 @@ static void binder_transaction(struct binder_proc *proc, { struct binder_transaction_log_entry *fe; - fe = binder_transaction_log_add(&binder_transaction_log_failed); + fe = binder_transaction_log_add( + &context->transaction_log_failed); *fe = *e; } @@ -2263,8 +2275,8 @@ static int binder_thread_write(struct binder_proc *proc, return -EFAULT; ptr += sizeof(uint32_t); trace_binder_command(cmd); - if (_IOC_NR(cmd) < ARRAY_SIZE(binder_stats.bc)) { - binder_stats.bc[_IOC_NR(cmd)]++; + if (_IOC_NR(cmd) < ARRAY_SIZE(context->binder_stats.bc)) { + context->binder_stats.bc[_IOC_NR(cmd)]++; proc->stats.bc[_IOC_NR(cmd)]++; thread->stats.bc[_IOC_NR(cmd)]++; } @@ -2629,8 +2641,8 @@ static void binder_stat_br(struct binder_proc *proc, struct binder_thread *thread, uint32_t cmd) { trace_binder_return(cmd); - if (_IOC_NR(cmd) < ARRAY_SIZE(binder_stats.br)) { - binder_stats.br[_IOC_NR(cmd)]++; + if (_IOC_NR(cmd) < ARRAY_SIZE(proc->stats.br)) { + proc->context->binder_stats.br[_IOC_NR(cmd)]++; proc->stats.br[_IOC_NR(cmd)]++; thread->stats.br[_IOC_NR(cmd)]++; } @@ -2694,7 +2706,7 @@ static int binder_thread_read(struct binder_proc *proc, if (wait_for_proc_work) proc->ready_threads++; - binder_unlock(__func__); + binder_unlock(proc->context, __func__); trace_binder_wait_for_work(wait_for_proc_work, !!thread->transaction_stack, @@ -2721,7 +2733,7 @@ static int binder_thread_read(struct binder_proc *proc, ret = wait_event_freezable(thread->wait, binder_has_thread_work(thread)); } - binder_lock(__func__); + binder_lock(proc->context, __func__); if (wait_for_proc_work) proc->ready_threads--; @@ -3108,14 +3120,14 @@ static unsigned int binder_poll(struct file *filp, struct binder_thread *thread = NULL; int wait_for_proc_work; - binder_lock(__func__); + binder_lock(proc->context, __func__); thread = binder_get_thread(proc); wait_for_proc_work = thread->transaction_stack == NULL && list_empty(&thread->todo) && thread->return_error == BR_OK; - binder_unlock(__func__); + binder_unlock(proc->context, __func__); if (wait_for_proc_work) { if (binder_has_proc_work(proc, thread)) @@ -3242,6 +3254,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { int ret; struct binder_proc *proc = filp->private_data; + struct binder_context *context = proc->context; struct binder_thread *thread; unsigned int size = _IOC_SIZE(cmd); void __user *ubuf = (void __user *)arg; @@ -3255,7 +3268,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (ret) goto err_unlocked; - binder_lock(__func__); + binder_lock(context, __func__); thread = binder_get_thread(proc); if (thread == NULL) { ret = -ENOMEM; @@ -3307,7 +3320,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) err: if (thread) thread->looper &= ~BINDER_LOOPER_STATE_NEED_RETURN; - binder_unlock(__func__); + binder_unlock(context, __func__); wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2); if (ret && ret != -ERESTARTSYS) pr_info("%d:%d ioctl %x %lx returned %d\n", proc->pid, current->pid, cmd, arg, ret); @@ -3379,7 +3392,7 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma) } vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE; - mutex_lock(&binder_mmap_lock); + mutex_lock(&proc->context->binder_mmap_lock); if (proc->buffer) { ret = -EBUSY; failure_string = "already mapped"; @@ -3394,7 +3407,7 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma) } proc->buffer = area->addr; proc->user_buffer_offset = vma->vm_start - (uintptr_t)proc->buffer; - mutex_unlock(&binder_mmap_lock); + mutex_unlock(&proc->context->binder_mmap_lock); #ifdef CONFIG_CPU_CACHE_VIPT if (cache_is_vipt_aliasing()) { @@ -3439,12 +3452,12 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma) kfree(proc->pages); proc->pages = NULL; err_alloc_pages_failed: - mutex_lock(&binder_mmap_lock); + mutex_lock(&proc->context->binder_mmap_lock); vfree(proc->buffer); proc->buffer = NULL; err_get_vm_area_failed: err_already_mapped: - mutex_unlock(&binder_mmap_lock); + mutex_unlock(&proc->context->binder_mmap_lock); err_bad_arg: pr_err("binder_mmap: %d %lx-%lx %s failed %d\n", proc->pid, vma->vm_start, vma->vm_end, failure_string, ret); @@ -3471,15 +3484,15 @@ static int binder_open(struct inode *nodp, struct file *filp) miscdev); proc->context = &binder_dev->context; - binder_lock(__func__); + binder_lock(proc->context, __func__); binder_stats_created(BINDER_STAT_PROC); - hlist_add_head(&proc->proc_node, &binder_procs); + hlist_add_head(&proc->proc_node, &proc->context->binder_procs); proc->pid = current->group_leader->pid; INIT_LIST_HEAD(&proc->delivered_death); filp->private_data = proc; - binder_unlock(__func__); + binder_unlock(proc->context, __func__); if (binder_debugfs_dir_entry_proc) { char strbuf[11]; @@ -3544,6 +3557,7 @@ static int binder_release(struct inode *nodp, struct file *filp) static int binder_node_release(struct binder_node *node, int refs) { struct binder_ref *ref; + struct binder_context *context = node->proc->context; int death = 0; list_del_init(&node->work.entry); @@ -3559,7 +3573,7 @@ static int binder_node_release(struct binder_node *node, int refs) node->proc = NULL; node->local_strong_refs = 0; node->local_weak_refs = 0; - hlist_add_head(&node->dead_node, &binder_dead_nodes); + hlist_add_head(&node->dead_node, &context->binder_dead_nodes); hlist_for_each_entry(ref, &node->refs, node_entry) { refs++; @@ -3624,7 +3638,8 @@ static void binder_deferred_release(struct binder_proc *proc) node = rb_entry(n, struct binder_node, rb_node); nodes++; rb_erase(&node->rb_node, &proc->nodes); - incoming_refs = binder_node_release(node, incoming_refs); + incoming_refs = binder_node_release(node, + incoming_refs); } outgoing_refs = 0; @@ -3696,14 +3711,16 @@ static void binder_deferred_func(struct work_struct *work) { struct binder_proc *proc; struct files_struct *files; + struct binder_context *context = + container_of(work, struct binder_context, deferred_work); int defer; do { - binder_lock(__func__); - mutex_lock(&binder_deferred_lock); - if (!hlist_empty(&binder_deferred_list)) { - proc = hlist_entry(binder_deferred_list.first, + binder_lock(context, __func__); + mutex_lock(&context->binder_deferred_lock); + if (!hlist_empty(&context->binder_deferred_list)) { + proc = hlist_entry(context->binder_deferred_list.first, struct binder_proc, deferred_work_node); hlist_del_init(&proc->deferred_work_node); defer = proc->deferred_work; @@ -3712,7 +3729,7 @@ static void binder_deferred_func(struct work_struct *work) proc = NULL; defer = 0; } - mutex_unlock(&binder_deferred_lock); + mutex_unlock(&context->binder_deferred_lock); files = NULL; if (defer & BINDER_DEFERRED_PUT_FILES) { @@ -3727,24 +3744,24 @@ static void binder_deferred_func(struct work_struct *work) if (defer & BINDER_DEFERRED_RELEASE) binder_deferred_release(proc); /* frees proc */ - binder_unlock(__func__); + binder_unlock(context, __func__); if (files) put_files_struct(files); } while (proc); } -static DECLARE_WORK(binder_deferred_work, binder_deferred_func); static void binder_defer_work(struct binder_proc *proc, enum binder_deferred_state defer) { - mutex_lock(&binder_deferred_lock); + mutex_lock(&proc->context->binder_deferred_lock); proc->deferred_work |= defer; if (hlist_unhashed(&proc->deferred_work_node)) { hlist_add_head(&proc->deferred_work_node, - &binder_deferred_list); - queue_work(binder_deferred_workqueue, &binder_deferred_work); + &proc->context->binder_deferred_list); + queue_work(proc->context->binder_deferred_workqueue, + &proc->context->deferred_work); } - mutex_unlock(&binder_deferred_lock); + mutex_unlock(&proc->context->binder_deferred_lock); } static void print_binder_transaction(struct seq_file *m, const char *prefix, @@ -3975,8 +3992,20 @@ static const char * const binder_objstat_strings[] = { "transaction_complete" }; +static void add_binder_stats(struct binder_stats *from, struct binder_stats *to) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(to->bc); i++) + to->bc[i] += from->bc[i]; + + for (i = 0; i < ARRAY_SIZE(to->br); i++) + to->br[i] += from->br[i]; +} + static void print_binder_stats(struct seq_file *m, const char *prefix, - struct binder_stats *stats) + struct binder_stats *stats, + struct binder_obj_stats *obj_stats) { int i; @@ -3996,16 +4025,21 @@ static void print_binder_stats(struct seq_file *m, const char *prefix, binder_return_strings[i], stats->br[i]); } - BUILD_BUG_ON(ARRAY_SIZE(stats->obj_created) != + if (!obj_stats) + return; + + BUILD_BUG_ON(ARRAY_SIZE(obj_stats->obj_created) != ARRAY_SIZE(binder_objstat_strings)); - BUILD_BUG_ON(ARRAY_SIZE(stats->obj_created) != - ARRAY_SIZE(stats->obj_deleted)); - for (i = 0; i < ARRAY_SIZE(stats->obj_created); i++) { - if (stats->obj_created[i] || stats->obj_deleted[i]) + BUILD_BUG_ON(ARRAY_SIZE(obj_stats->obj_created) != + ARRAY_SIZE(obj_stats->obj_deleted)); + for (i = 0; i < ARRAY_SIZE(obj_stats->obj_created); i++) { + int obj_created = atomic_read(&obj_stats->obj_created[i]); + int obj_deleted = atomic_read(&obj_stats->obj_deleted[i]); + + if (obj_created || obj_deleted) seq_printf(m, "%s%s: active %d total %d\n", prefix, - binder_objstat_strings[i], - stats->obj_created[i] - stats->obj_deleted[i], - stats->obj_created[i]); + binder_objstat_strings[i], + obj_created - obj_deleted, obj_created); } } @@ -4060,85 +4094,131 @@ static void print_binder_proc_stats(struct seq_file *m, } seq_printf(m, " pending transactions: %d\n", count); - print_binder_stats(m, " ", &proc->stats); + print_binder_stats(m, " ", &proc->stats, NULL); } static int binder_state_show(struct seq_file *m, void *unused) { + struct binder_device *device; + struct binder_context *context; struct binder_proc *proc; struct binder_node *node; int do_lock = !binder_debug_no_lock; - - if (do_lock) - binder_lock(__func__); + bool wrote_dead_nodes_header = false; seq_puts(m, "binder state:\n"); - if (!hlist_empty(&binder_dead_nodes)) - seq_puts(m, "dead nodes:\n"); - hlist_for_each_entry(node, &binder_dead_nodes, dead_node) - print_binder_node(m, node); + hlist_for_each_entry(device, &binder_devices, hlist) { + context = &device->context; + if (do_lock) + binder_lock(context, __func__); + if (!wrote_dead_nodes_header && + !hlist_empty(&context->binder_dead_nodes)) { + seq_puts(m, "dead nodes:\n"); + wrote_dead_nodes_header = true; + } + hlist_for_each_entry(node, &context->binder_dead_nodes, + dead_node) + print_binder_node(m, node); - hlist_for_each_entry(proc, &binder_procs, proc_node) - print_binder_proc(m, proc, 1); - if (do_lock) - binder_unlock(__func__); + if (do_lock) + binder_unlock(context, __func__); + } + + hlist_for_each_entry(device, &binder_devices, hlist) { + context = &device->context; + if (do_lock) + binder_lock(context, __func__); + + hlist_for_each_entry(proc, &context->binder_procs, proc_node) + print_binder_proc(m, proc, 1); + if (do_lock) + binder_unlock(context, __func__); + } return 0; } static int binder_stats_show(struct seq_file *m, void *unused) { + struct binder_device *device; + struct binder_context *context; struct binder_proc *proc; + struct binder_stats total_binder_stats; int do_lock = !binder_debug_no_lock; - if (do_lock) - binder_lock(__func__); + memset(&total_binder_stats, 0, sizeof(struct binder_stats)); + + hlist_for_each_entry(device, &binder_devices, hlist) { + context = &device->context; + if (do_lock) + binder_lock(context, __func__); + + add_binder_stats(&context->binder_stats, &total_binder_stats); + + if (do_lock) + binder_unlock(context, __func__); + } seq_puts(m, "binder stats:\n"); + print_binder_stats(m, "", &total_binder_stats, &binder_obj_stats); - print_binder_stats(m, "", &binder_stats); + hlist_for_each_entry(device, &binder_devices, hlist) { + context = &device->context; + if (do_lock) + binder_lock(context, __func__); - hlist_for_each_entry(proc, &binder_procs, proc_node) - print_binder_proc_stats(m, proc); - if (do_lock) - binder_unlock(__func__); + hlist_for_each_entry(proc, &context->binder_procs, proc_node) + print_binder_proc_stats(m, proc); + if (do_lock) + binder_unlock(context, __func__); + } return 0; } static int binder_transactions_show(struct seq_file *m, void *unused) { + struct binder_device *device; + struct binder_context *context; struct binder_proc *proc; int do_lock = !binder_debug_no_lock; - if (do_lock) - binder_lock(__func__); - seq_puts(m, "binder transactions:\n"); - hlist_for_each_entry(proc, &binder_procs, proc_node) - print_binder_proc(m, proc, 0); - if (do_lock) - binder_unlock(__func__); + hlist_for_each_entry(device, &binder_devices, hlist) { + context = &device->context; + if (do_lock) + binder_lock(context, __func__); + + hlist_for_each_entry(proc, &context->binder_procs, proc_node) + print_binder_proc(m, proc, 0); + if (do_lock) + binder_unlock(context, __func__); + } return 0; } static int binder_proc_show(struct seq_file *m, void *unused) { + struct binder_device *device; + struct binder_context *context; struct binder_proc *itr; int pid = (unsigned long)m->private; int do_lock = !binder_debug_no_lock; - if (do_lock) - binder_lock(__func__); + hlist_for_each_entry(device, &binder_devices, hlist) { + context = &device->context; + if (do_lock) + binder_lock(context, __func__); - hlist_for_each_entry(itr, &binder_procs, proc_node) { - if (itr->pid == pid) { - seq_puts(m, "binder proc state:\n"); - print_binder_proc(m, itr, 1); + hlist_for_each_entry(itr, &context->binder_procs, proc_node) { + if (itr->pid == pid) { + seq_puts(m, "binder proc state:\n"); + print_binder_proc(m, itr, 1); + } } + if (do_lock) + binder_unlock(context, __func__); } - if (do_lock) - binder_unlock(__func__); return 0; } @@ -4153,11 +4233,10 @@ static void print_binder_transaction_log_entry(struct seq_file *m, e->to_node, e->target_handle, e->data_size, e->offsets_size); } -static int binder_transaction_log_show(struct seq_file *m, void *unused) +static int print_binder_transaction_log(struct seq_file *m, + struct binder_transaction_log *log) { - struct binder_transaction_log *log = m->private; int i; - if (log->full) { for (i = log->next; i < ARRAY_SIZE(log->entry); i++) print_binder_transaction_log_entry(m, &log->entry[i]); @@ -4167,6 +4246,31 @@ static int binder_transaction_log_show(struct seq_file *m, void *unused) return 0; } +static int binder_transaction_log_show(struct seq_file *m, void *unused) +{ + struct binder_device *device; + struct binder_context *context; + + hlist_for_each_entry(device, &binder_devices, hlist) { + context = &device->context; + print_binder_transaction_log(m, &context->transaction_log); + } + return 0; +} + +static int binder_failed_transaction_log_show(struct seq_file *m, void *unused) +{ + struct binder_device *device; + struct binder_context *context; + + hlist_for_each_entry(device, &binder_devices, hlist) { + context = &device->context; + print_binder_transaction_log(m, + &context->transaction_log_failed); + } + return 0; +} + static const struct file_operations binder_fops = { .owner = THIS_MODULE, .poll = binder_poll, @@ -4182,11 +4286,20 @@ BINDER_DEBUG_ENTRY(state); BINDER_DEBUG_ENTRY(stats); BINDER_DEBUG_ENTRY(transactions); BINDER_DEBUG_ENTRY(transaction_log); +BINDER_DEBUG_ENTRY(failed_transaction_log); + +static void __init free_binder_device(struct binder_device *device) +{ + if (device->context.binder_deferred_workqueue) + destroy_workqueue(device->context.binder_deferred_workqueue); + kfree(device); +} static int __init init_binder_device(const char *name) { int ret; struct binder_device *binder_device; + struct binder_context *context; binder_device = kzalloc(sizeof(*binder_device), GFP_KERNEL); if (!binder_device) @@ -4196,31 +4309,65 @@ static int __init init_binder_device(const char *name) binder_device->miscdev.minor = MISC_DYNAMIC_MINOR; binder_device->miscdev.name = name; - binder_device->context.binder_context_mgr_uid = INVALID_UID; - binder_device->context.name = name; + context = &binder_device->context; + context->binder_context_mgr_uid = INVALID_UID; + context->name = name; + + mutex_init(&context->binder_main_lock); + mutex_init(&context->binder_deferred_lock); + mutex_init(&context->binder_mmap_lock); + + context->binder_deferred_workqueue = + create_singlethread_workqueue(name); + + if (!context->binder_deferred_workqueue) { + ret = -ENOMEM; + goto err_create_singlethread_workqueue_failed; + } + + INIT_HLIST_HEAD(&context->binder_procs); + INIT_HLIST_HEAD(&context->binder_dead_nodes); + INIT_HLIST_HEAD(&context->binder_deferred_list); + INIT_WORK(&context->deferred_work, binder_deferred_func); ret = misc_register(&binder_device->miscdev); if (ret < 0) { - kfree(binder_device); - return ret; + goto err_misc_register_failed; } hlist_add_head(&binder_device->hlist, &binder_devices); + return ret; + +err_create_singlethread_workqueue_failed: +err_misc_register_failed: + free_binder_device(binder_device); return ret; } static int __init binder_init(void) { - int ret; + int ret = 0; char *device_name, *device_names; struct binder_device *device; struct hlist_node *tmp; - binder_deferred_workqueue = create_singlethread_workqueue("binder"); - if (!binder_deferred_workqueue) + /* + * Copy the module_parameter string, because we don't want to + * tokenize it in-place. + */ + device_names = kzalloc(strlen(binder_devices_param) + 1, GFP_KERNEL); + if (!device_names) return -ENOMEM; + strcpy(device_names, binder_devices_param); + + while ((device_name = strsep(&device_names, ","))) { + ret = init_binder_device(device_name); + if (ret) + goto err_init_binder_device_failed; + } + binder_debugfs_dir_entry_root = debugfs_create_dir("binder", NULL); if (binder_debugfs_dir_entry_root) binder_debugfs_dir_entry_proc = debugfs_create_dir("proc", @@ -4245,30 +4392,13 @@ static int __init binder_init(void) debugfs_create_file("transaction_log", S_IRUGO, binder_debugfs_dir_entry_root, - &binder_transaction_log, + NULL, &binder_transaction_log_fops); debugfs_create_file("failed_transaction_log", S_IRUGO, binder_debugfs_dir_entry_root, - &binder_transaction_log_failed, - &binder_transaction_log_fops); - } - - /* - * Copy the module_parameter string, because we don't want to - * tokenize it in-place. - */ - device_names = kzalloc(strlen(binder_devices_param) + 1, GFP_KERNEL); - if (!device_names) { - ret = -ENOMEM; - goto err_alloc_device_names_failed; - } - strcpy(device_names, binder_devices_param); - - while ((device_name = strsep(&device_names, ","))) { - ret = init_binder_device(device_name); - if (ret) - goto err_init_binder_device_failed; + NULL, + &binder_failed_transaction_log_fops); } return ret; @@ -4277,12 +4407,8 @@ static int __init binder_init(void) hlist_for_each_entry_safe(device, tmp, &binder_devices, hlist) { misc_deregister(&device->miscdev); hlist_del(&device->hlist); - kfree(device); + free_binder_device(device); } -err_alloc_device_names_failed: - debugfs_remove_recursive(binder_debugfs_dir_entry_root); - - destroy_workqueue(binder_deferred_workqueue); return ret; } From 71fc9cea6e8b903f72c807c283ed731967a76fbc Mon Sep 17 00:00:00 2001 From: Anson Jacob Date: Fri, 12 Aug 2016 20:38:10 -0400 Subject: [PATCH 241/521] usb: gadget: f_accessory: Fix for UsbAccessory clean unbind. Reapplying fix by Darren Whobrey (Change 69674) Fixes issues: 20545, 59667 and 61390. With prior version of f_accessory.c, UsbAccessories would not unbind cleanly when application is closed or i/o stopped while the usb cable is still connected. The accessory gadget driver would be left in an invalid state which was not reset on subsequent binding or opening. A reboot was necessary to clear. In some phones this issues causes the phone to reboot upon unplugging the USB cable. Main problem was that acc_disconnect was being called on I/O error which reset disconnected and online. Minor fix required to properly track setting and unsetting of disconnected and online flags. Also added urb Q wakeup's on unbind to help unblock waiting threads. Tested on Nexus 7 grouper. Expected behaviour now observed: closing accessory causes blocked i/o to interrupt with IOException. Accessory can be restarted following closing of file handle and re-opening. This is a generic fix that applies to all devices. Change-Id: I4e08b326730dd3a2820c863124cee10f7cb5501e Signed-off-by: Darren Whobrey Signed-off-by: Anson Jacob --- drivers/usb/gadget/function/f_accessory.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/usb/gadget/function/f_accessory.c b/drivers/usb/gadget/function/f_accessory.c index f2fa0c271d70..76b8ae08a551 100644 --- a/drivers/usb/gadget/function/f_accessory.c +++ b/drivers/usb/gadget/function/f_accessory.c @@ -77,9 +77,13 @@ struct acc_dev { struct usb_ep *ep_in; struct usb_ep *ep_out; - /* set to 1 when we connect */ + /* online indicates state of function_set_alt & function_unbind + * set to 1 when we connect + */ int online:1; - /* Set to 1 when we disconnect. + + /* disconnected indicates state of open & release + * Set to 1 when we disconnect. * Not cleared until our file is closed. */ int disconnected:1; @@ -263,7 +267,6 @@ static struct usb_request *req_get(struct acc_dev *dev, struct list_head *head) static void acc_set_disconnected(struct acc_dev *dev) { - dev->online = 0; dev->disconnected = 1; } @@ -764,7 +767,10 @@ static int acc_release(struct inode *ip, struct file *fp) printk(KERN_INFO "acc_release\n"); WARN_ON(!atomic_xchg(&_acc_dev->open_excl, 0)); - _acc_dev->disconnected = 0; + /* indicate that we are disconnected + * still could be online so don't touch online flag + */ + _acc_dev->disconnected = 1; return 0; } @@ -1012,6 +1018,10 @@ acc_function_unbind(struct usb_configuration *c, struct usb_function *f) struct usb_request *req; int i; + dev->online = 0; /* clear online flag */ + wake_up(&dev->read_wq); /* unblock reads on closure */ + wake_up(&dev->write_wq); /* likewise for writes */ + while ((req = req_get(dev, &dev->tx_idle))) acc_request_free(req, dev->ep_in); for (i = 0; i < RX_REQ_MAX; i++) @@ -1143,6 +1153,7 @@ static int acc_function_set_alt(struct usb_function *f, } dev->online = 1; + dev->disconnected = 0; /* if online then not disconnected */ /* readers may be blocked waiting for us to go online */ wake_up(&dev->read_wq); @@ -1155,7 +1166,8 @@ static void acc_function_disable(struct usb_function *f) struct usb_composite_dev *cdev = dev->cdev; DBG(cdev, "acc_function_disable\n"); - acc_set_disconnected(dev); + acc_set_disconnected(dev); /* this now only sets disconnected */ + dev->online = 0; /* so now need to clear online flag here too */ usb_ep_disable(dev->ep_in); usb_ep_disable(dev->ep_out); From 9e9c17398f7e6036446d067dab3f94da8ebd8b98 Mon Sep 17 00:00:00 2001 From: Mohan Srinivasan Date: Fri, 10 Mar 2017 16:08:30 -0800 Subject: [PATCH 242/521] ANDROID: Replace spaces by '_' for some android filesystem tracepoints. Andoid files frequently have spaces in them, as do cmdline strings. Replace these spaces with '_', so tools that parse these tracepoints don't get terribly confused. Change-Id: I1cbbedf5c803aa6a58d9b8b7836e9125683c49d1 Signed-off-by: Mohan Srinivasan (cherry picked from commit 5035d5f0933758dd515327d038e5bef7e40dbaa7) --- include/trace/events/android_fs_template.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/trace/events/android_fs_template.h b/include/trace/events/android_fs_template.h index 4e61ffe7a814..b23d17b56c63 100644 --- a/include/trace/events/android_fs_template.h +++ b/include/trace/events/android_fs_template.h @@ -18,11 +18,18 @@ DECLARE_EVENT_CLASS(android_fs_data_start_template, ), TP_fast_assign( { + /* + * Replace the spaces in filenames and cmdlines + * because this screws up the tooling that parses + * the traces. + */ __assign_str(pathbuf, pathname); + (void)strreplace(__get_str(pathbuf), ' ', '_'); __entry->offset = offset; __entry->bytes = bytes; __entry->i_size = i_size_read(inode); __assign_str(cmdline, command); + (void)strreplace(__get_str(cmdline), ' ', '_'); __entry->pid = pid; __entry->ino = inode->i_ino; } From b898d9e2686c62f8af75420260d0a4e83440ce8e Mon Sep 17 00:00:00 2001 From: "Jon Medhurst (Tixy)" Date: Wed, 9 Dec 2015 09:40:53 +0000 Subject: [PATCH 243/521] arm64: dts: juno: Add idle-states to device tree This patch adds idle-states bindings data collected through a set of benchmarking experiments (latency and energy consumption) on Juno boards. Latencies data represents the worst case scenarios as required by the DT idle-states bindings. Change-Id: I7b2d81fa66f8ce8b229457cfefff06e9edd545c7 (cherry picked from commit 286896f43b0248960f69660159b507b23751b38a) Signed-off-by: Jon Medhurst Acked-by: Lorenzo Pieralisi Signed-off-by: Olof Johansson --- arch/arm64/boot/dts/arm/juno-r1.dts | 28 ++++++++++++++++++++++++++++ arch/arm64/boot/dts/arm/juno.dts | 28 ++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+) diff --git a/arch/arm64/boot/dts/arm/juno-r1.dts b/arch/arm64/boot/dts/arm/juno-r1.dts index 93bc3d7d51c0..8826f834f54f 100644 --- a/arch/arm64/boot/dts/arm/juno-r1.dts +++ b/arch/arm64/boot/dts/arm/juno-r1.dts @@ -60,6 +60,28 @@ core3 { }; }; + idle-states { + entry-method = "arm,psci"; + + CPU_SLEEP_0: cpu-sleep-0 { + compatible = "arm,idle-state"; + arm,psci-suspend-param = <0x0010000>; + local-timer-stop; + entry-latency-us = <300>; + exit-latency-us = <1200>; + min-residency-us = <2000>; + }; + + CLUSTER_SLEEP_0: cluster-sleep-0 { + compatible = "arm,idle-state"; + arm,psci-suspend-param = <0x1010000>; + local-timer-stop; + entry-latency-us = <300>; + exit-latency-us = <1200>; + min-residency-us = <2500>; + }; + }; + A57_0: cpu@0 { compatible = "arm,cortex-a57","arm,armv8"; reg = <0x0 0x0>; @@ -67,6 +89,7 @@ A57_0: cpu@0 { enable-method = "psci"; next-level-cache = <&A57_L2>; clocks = <&scpi_dvfs 0>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A57_1: cpu@1 { @@ -76,6 +99,7 @@ A57_1: cpu@1 { enable-method = "psci"; next-level-cache = <&A57_L2>; clocks = <&scpi_dvfs 0>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A53_0: cpu@100 { @@ -85,6 +109,7 @@ A53_0: cpu@100 { enable-method = "psci"; next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A53_1: cpu@101 { @@ -94,6 +119,7 @@ A53_1: cpu@101 { enable-method = "psci"; next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A53_2: cpu@102 { @@ -103,6 +129,7 @@ A53_2: cpu@102 { enable-method = "psci"; next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A53_3: cpu@103 { @@ -112,6 +139,7 @@ A53_3: cpu@103 { enable-method = "psci"; next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A57_L2: l2-cache0 { diff --git a/arch/arm64/boot/dts/arm/juno.dts b/arch/arm64/boot/dts/arm/juno.dts index 3e1a84b01b50..2e4f04247aa5 100644 --- a/arch/arm64/boot/dts/arm/juno.dts +++ b/arch/arm64/boot/dts/arm/juno.dts @@ -60,6 +60,28 @@ core3 { }; }; + idle-states { + entry-method = "arm,psci"; + + CPU_SLEEP_0: cpu-sleep-0 { + compatible = "arm,idle-state"; + arm,psci-suspend-param = <0x0010000>; + local-timer-stop; + entry-latency-us = <300>; + exit-latency-us = <1200>; + min-residency-us = <2000>; + }; + + CLUSTER_SLEEP_0: cluster-sleep-0 { + compatible = "arm,idle-state"; + arm,psci-suspend-param = <0x1010000>; + local-timer-stop; + entry-latency-us = <300>; + exit-latency-us = <1200>; + min-residency-us = <2500>; + }; + }; + A57_0: cpu@0 { compatible = "arm,cortex-a57","arm,armv8"; reg = <0x0 0x0>; @@ -67,6 +89,7 @@ A57_0: cpu@0 { enable-method = "psci"; next-level-cache = <&A57_L2>; clocks = <&scpi_dvfs 0>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A57_1: cpu@1 { @@ -76,6 +99,7 @@ A57_1: cpu@1 { enable-method = "psci"; next-level-cache = <&A57_L2>; clocks = <&scpi_dvfs 0>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A53_0: cpu@100 { @@ -85,6 +109,7 @@ A53_0: cpu@100 { enable-method = "psci"; next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A53_1: cpu@101 { @@ -94,6 +119,7 @@ A53_1: cpu@101 { enable-method = "psci"; next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A53_2: cpu@102 { @@ -103,6 +129,7 @@ A53_2: cpu@102 { enable-method = "psci"; next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A53_3: cpu@103 { @@ -112,6 +139,7 @@ A53_3: cpu@103 { enable-method = "psci"; next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; + cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; }; A57_L2: l2-cache0 { From 241f0efeac22f02c86243e2719435c62f082d1de Mon Sep 17 00:00:00 2001 From: Chris Redpath Date: Fri, 13 Nov 2015 10:21:39 +0000 Subject: [PATCH 244/521] DTB: Add EAS compatible Juno Energy model to 'juno.dts' EAS expects the energy model for the CPUs and cluster states to be available in the DTB. The energy model data comes from previous versions. Change-Id: I87535c8d802797361333929d809b43383bc8954b (cherry picked from commit bf137f205f312a1814ae38f908ec7bdbdddeaa3e (LSK 4.4)) Signed-off-by: Chris Redpath Signed-off-by: Punit Agrawal Signed-off-by: Jon Medhurst --- .../arm64/boot/dts/arm/juno-sched-energy.dtsi | 147 ++++++++++++++++++ arch/arm64/boot/dts/arm/juno.dts | 8 + 2 files changed, 155 insertions(+) create mode 100644 arch/arm64/boot/dts/arm/juno-sched-energy.dtsi diff --git a/arch/arm64/boot/dts/arm/juno-sched-energy.dtsi b/arch/arm64/boot/dts/arm/juno-sched-energy.dtsi new file mode 100644 index 000000000000..38207e4391ab --- /dev/null +++ b/arch/arm64/boot/dts/arm/juno-sched-energy.dtsi @@ -0,0 +1,147 @@ +/* + * ARM JUNO specific energy cost model data. There are no unit requirements for + * the data. Data can be normalized to any reference point, but the + * normalization must be consistent. That is, one bogo-joule/watt must be the + * same quantity for all data, but we don't care what it is. + */ + +/* static struct idle_state idle_states_cluster_a53[] = { */ +/* { .power = 56 }, /\* arch_cpu_idle() (active idle) = WFI *\/ */ +/* { .power = 56 }, /\* WFI *\/ */ +/* { .power = 56 }, /\* cpu-sleep-0 *\/ */ +/* { .power = 17 }, /\* cluster-sleep-0 *\/ */ +/* }; */ + +/* static struct idle_state idle_states_cluster_a57[] = { */ +/* { .power = 65 }, /\* arch_cpu_idle() (active idle) = WFI *\/ */ +/* { .power = 65 }, /\* WFI *\/ */ +/* { .power = 65 }, /\* cpu-sleep-0 *\/ */ +/* { .power = 24 }, /\* cluster-sleep-0 *\/ */ +/* }; */ + +/* static struct capacity_state cap_states_cluster_a53[] = { */ +/* /\* Power per cluster *\/ */ +/* { .cap = 235, .power = 26, }, /\* 450 MHz *\/ */ +/* { .cap = 303, .power = 30, }, /\* 575 MHz *\/ */ +/* { .cap = 368, .power = 39, }, /\* 700 MHz *\/ */ +/* { .cap = 406, .power = 47, }, /\* 775 MHz *\/ */ +/* { .cap = 447, .power = 57, }, /\* 850 Mhz *\/ */ +/* }; */ + +/* static struct capacity_state cap_states_cluster_a57[] = { */ +/* /\* Power per cluster *\/ */ +/* { .cap = 417, .power = 24, }, /\* 450 MHz *\/ */ +/* { .cap = 579, .power = 32, }, /\* 625 MHz *\/ */ +/* { .cap = 744, .power = 43, }, /\* 800 MHz *\/ */ +/* { .cap = 883, .power = 49, }, /\* 950 MHz *\/ */ +/* { .cap = 1024, .power = 64, }, /\* 1100 MHz *\/ */ +/* }; */ + +/* static struct sched_group_energy energy_cluster_a53 = { */ +/* .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a53), */ +/* .idle_states = idle_states_cluster_a53, */ +/* .nr_cap_states = ARRAY_SIZE(cap_states_cluster_a53), */ +/* .cap_states = cap_states_cluster_a53, */ +/* }; */ + +/* static struct sched_group_energy energy_cluster_a57 = { */ +/* .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a57), */ +/* .idle_states = idle_states_cluster_a57, */ +/* .nr_cap_states = ARRAY_SIZE(cap_states_cluster_a57), */ +/* .cap_states = cap_states_cluster_a57, */ +/* }; */ + +/* static struct idle_state idle_states_core_a53[] = { */ +/* { .power = 6 }, /\* arch_cpu_idle() (active idle) = WFI *\/ */ +/* { .power = 6 }, /\* WFI *\/ */ +/* { .power = 0 }, /\* cpu-sleep-0 *\/ */ +/* { .power = 0 }, /\* cluster-sleep-0 *\/ */ +/* }; */ + +/* static struct idle_state idle_states_core_a57[] = { */ +/* { .power = 15 }, /\* arch_cpu_idle() (active idle) = WFI *\/ */ +/* { .power = 15 }, /\* WFI *\/ */ +/* { .power = 0 }, /\* cpu-sleep-0 *\/ */ +/* { .power = 0 }, /\* cluster-sleep-0 *\/ */ +/* }; */ + +/* static struct capacity_state cap_states_core_a53[] = { */ +/* /\* Power per cpu *\/ */ +/* { .cap = 235, .power = 33, }, /\* 450 MHz *\/ */ +/* { .cap = 302, .power = 46, }, /\* 575 MHz *\/ */ +/* { .cap = 368, .power = 61, }, /\* 700 MHz *\/ */ +/* { .cap = 406, .power = 76, }, /\* 775 MHz *\/ */ +/* { .cap = 447, .power = 93, }, /\* 850 Mhz *\/ */ +/* }; */ + +/* static struct capacity_state cap_states_core_a57[] = { */ +/* /\* Power per cpu *\/ */ +/* { .cap = 417, .power = 168, }, /\* 450 MHz *\/ */ +/* { .cap = 579, .power = 251, }, /\* 625 MHz *\/ */ +/* { .cap = 744, .power = 359, }, /\* 800 MHz *\/ */ +/* { .cap = 883, .power = 479, }, /\* 950 MHz *\/ */ +/* { .cap = 1024, .power = 616, }, /\* 1100 MHz *\/ */ +/* }; */ + +energy-costs { + CPU_COST_A57: core-cost0 { + busy-cost-data = < + 417 168 + 579 251 + 744 359 + 883 479 + 1023 616 + >; + idle-cost-data = < + 15 + 15 + 0 + 0 + >; + }; + CPU_COST_A53: core-cost1 { + busy-cost-data = < + 235 33 + 302 46 + 368 61 + 406 76 + 447 93 + >; + idle-cost-data = < + 6 + 6 + 0 + 0 + >; + }; + CLUSTER_COST_A57: cluster-cost0 { + busy-cost-data = < + 417 24 + 579 32 + 744 43 + 883 49 + 1024 64 + >; + idle-cost-data = < + 65 + 65 + 65 + 24 + >; + }; + CLUSTER_COST_A53: cluster-cost1 { + busy-cost-data = < + 235 26 + 303 30 + 368 39 + 406 47 + 447 57 + >; + idle-cost-data = < + 56 + 56 + 56 + 17 + >; + }; +}; diff --git a/arch/arm64/boot/dts/arm/juno.dts b/arch/arm64/boot/dts/arm/juno.dts index 2e4f04247aa5..8830f12a8289 100644 --- a/arch/arm64/boot/dts/arm/juno.dts +++ b/arch/arm64/boot/dts/arm/juno.dts @@ -90,6 +90,7 @@ A57_0: cpu@0 { next-level-cache = <&A57_L2>; clocks = <&scpi_dvfs 0>; cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; + sched-energy-costs = <&CPU_COST_A57 &CLUSTER_COST_A57>; }; A57_1: cpu@1 { @@ -100,6 +101,7 @@ A57_1: cpu@1 { next-level-cache = <&A57_L2>; clocks = <&scpi_dvfs 0>; cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; + sched-energy-costs = <&CPU_COST_A57 &CLUSTER_COST_A57>; }; A53_0: cpu@100 { @@ -110,6 +112,7 @@ A53_0: cpu@100 { next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; + sched-energy-costs = <&CPU_COST_A53 &CLUSTER_COST_A53>; }; A53_1: cpu@101 { @@ -120,6 +123,7 @@ A53_1: cpu@101 { next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; + sched-energy-costs = <&CPU_COST_A53 &CLUSTER_COST_A53>; }; A53_2: cpu@102 { @@ -130,6 +134,7 @@ A53_2: cpu@102 { next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; + sched-energy-costs = <&CPU_COST_A53 &CLUSTER_COST_A53>; }; A53_3: cpu@103 { @@ -140,6 +145,7 @@ A53_3: cpu@103 { next-level-cache = <&A53_L2>; clocks = <&scpi_dvfs 1>; cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>; + sched-energy-costs = <&CPU_COST_A53 &CLUSTER_COST_A53>; }; A57_L2: l2-cache0 { @@ -149,6 +155,8 @@ A57_L2: l2-cache0 { A53_L2: l2-cache1 { compatible = "cache"; }; + + /include/ "juno-sched-energy.dtsi" }; pmu_a57 { From cc52d2bc7a54e5431f0734886574bedb7cce4ed4 Mon Sep 17 00:00:00 2001 From: Jin Qian Date: Tue, 10 Jan 2017 16:10:35 -0800 Subject: [PATCH 245/521] ANDROID: uid_cputime: add per-uid IO usage accounting IO usages are accounted in foreground and background buckets. For each uid, io usage is calculated in two steps. delta = current total of all uid tasks - previus total current bucket += delta Bucket is determined by current uid stat. Userspace writes to /proc/uid_procstat/set when uid stat is updated. /proc/uid_io/stats shows IO usage in this format. Signed-off-by: Jin Qian Bug: 34198239 Change-Id: Ib8bebda53e7a56f45ea3eb0ec9a3153d44188102 --- drivers/misc/uid_cputime.c | 236 ++++++++++++++++++++++++++++++++++--- 1 file changed, 220 insertions(+), 16 deletions(-) diff --git a/drivers/misc/uid_cputime.c b/drivers/misc/uid_cputime.c index c1ad5246f564..f5135bb01a7a 100644 --- a/drivers/misc/uid_cputime.c +++ b/drivers/misc/uid_cputime.c @@ -30,7 +30,24 @@ DECLARE_HASHTABLE(hash_table, UID_HASH_BITS); static DEFINE_MUTEX(uid_lock); -static struct proc_dir_entry *parent; +static struct proc_dir_entry *cpu_parent; +static struct proc_dir_entry *io_parent; +static struct proc_dir_entry *proc_parent; + +struct io_stats { + u64 read_bytes; + u64 write_bytes; + u64 rchar; + u64 wchar; +}; + +#define UID_STATE_FOREGROUND 0 +#define UID_STATE_BACKGROUND 1 +#define UID_STATE_BUCKET_SIZE 2 + +#define UID_STATE_TOTAL_CURR 2 +#define UID_STATE_TOTAL_LAST 3 +#define UID_STATE_SIZE 4 struct uid_entry { uid_t uid; @@ -38,6 +55,8 @@ struct uid_entry { cputime_t stime; cputime_t active_utime; cputime_t active_stime; + int state; + struct io_stats io[UID_STATE_SIZE]; struct hlist_node hash; }; @@ -70,7 +89,7 @@ static struct uid_entry *find_or_register_uid(uid_t uid) return uid_entry; } -static int uid_stat_show(struct seq_file *m, void *v) +static int uid_cputime_show(struct seq_file *m, void *v) { struct uid_entry *uid_entry; struct task_struct *task, *temp; @@ -119,13 +138,13 @@ static int uid_stat_show(struct seq_file *m, void *v) return 0; } -static int uid_stat_open(struct inode *inode, struct file *file) +static int uid_cputime_open(struct inode *inode, struct file *file) { - return single_open(file, uid_stat_show, PDE_DATA(inode)); + return single_open(file, uid_cputime_show, PDE_DATA(inode)); } -static const struct file_operations uid_stat_fops = { - .open = uid_stat_open, +static const struct file_operations uid_cputime_fops = { + .open = uid_cputime_open, .read = seq_read, .llseek = seq_lseek, .release = single_release, @@ -184,6 +203,162 @@ static const struct file_operations uid_remove_fops = { .write = uid_remove_write, }; +static void add_uid_io_curr_stats(struct uid_entry *uid_entry, + struct task_struct *task) +{ + struct io_stats *io_curr = &uid_entry->io[UID_STATE_TOTAL_CURR]; + + io_curr->read_bytes += task->ioac.read_bytes; + io_curr->write_bytes += + task->ioac.write_bytes - task->ioac.cancelled_write_bytes; + io_curr->rchar += task->ioac.rchar; + io_curr->wchar += task->ioac.wchar; +} + +static void clean_uid_io_last_stats(struct uid_entry *uid_entry, + struct task_struct *task) +{ + struct io_stats *io_last = &uid_entry->io[UID_STATE_TOTAL_LAST]; + + io_last->read_bytes -= task->ioac.read_bytes; + io_last->write_bytes -= + task->ioac.write_bytes - task->ioac.cancelled_write_bytes; + io_last->rchar -= task->ioac.rchar; + io_last->wchar -= task->ioac.wchar; +} + +static void update_io_stats_locked(void) +{ + struct uid_entry *uid_entry; + struct task_struct *task, *temp; + struct io_stats *io_bucket, *io_curr, *io_last; + unsigned long bkt; + + BUG_ON(!mutex_is_locked(&uid_lock)); + + hash_for_each(hash_table, bkt, uid_entry, hash) + memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0, + sizeof(struct io_stats)); + + read_lock(&tasklist_lock); + do_each_thread(temp, task) { + uid_entry = find_or_register_uid(from_kuid_munged( + current_user_ns(), task_uid(task))); + if (!uid_entry) + continue; + add_uid_io_curr_stats(uid_entry, task); + } while_each_thread(temp, task); + read_unlock(&tasklist_lock); + + hash_for_each(hash_table, bkt, uid_entry, hash) { + io_bucket = &uid_entry->io[uid_entry->state]; + io_curr = &uid_entry->io[UID_STATE_TOTAL_CURR]; + io_last = &uid_entry->io[UID_STATE_TOTAL_LAST]; + + io_bucket->read_bytes += + io_curr->read_bytes - io_last->read_bytes; + io_bucket->write_bytes += + io_curr->write_bytes - io_last->write_bytes; + io_bucket->rchar += io_curr->rchar - io_last->rchar; + io_bucket->wchar += io_curr->wchar - io_last->wchar; + + io_last->read_bytes = io_curr->read_bytes; + io_last->write_bytes = io_curr->write_bytes; + io_last->rchar = io_curr->rchar; + io_last->wchar = io_curr->wchar; + } +} + +static int uid_io_show(struct seq_file *m, void *v) +{ + struct uid_entry *uid_entry; + unsigned long bkt; + + mutex_lock(&uid_lock); + + update_io_stats_locked(); + + hash_for_each(hash_table, bkt, uid_entry, hash) { + seq_printf(m, "%d %llu %llu %llu %llu %llu %llu %llu %llu\n", + uid_entry->uid, + uid_entry->io[UID_STATE_FOREGROUND].rchar, + uid_entry->io[UID_STATE_FOREGROUND].wchar, + uid_entry->io[UID_STATE_FOREGROUND].read_bytes, + uid_entry->io[UID_STATE_FOREGROUND].write_bytes, + uid_entry->io[UID_STATE_BACKGROUND].rchar, + uid_entry->io[UID_STATE_BACKGROUND].wchar, + uid_entry->io[UID_STATE_BACKGROUND].read_bytes, + uid_entry->io[UID_STATE_BACKGROUND].write_bytes); + } + + mutex_unlock(&uid_lock); + + return 0; +} + +static int uid_io_open(struct inode *inode, struct file *file) +{ + return single_open(file, uid_io_show, PDE_DATA(inode)); +} + +static const struct file_operations uid_io_fops = { + .open = uid_io_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int uid_procstat_open(struct inode *inode, struct file *file) +{ + return single_open(file, NULL, NULL); +} + +static ssize_t uid_procstat_write(struct file *file, + const char __user *buffer, size_t count, loff_t *ppos) +{ + struct uid_entry *uid_entry; + uid_t uid; + int argc, state; + char input[128]; + + if (count >= sizeof(input)) + return -EINVAL; + + if (copy_from_user(input, buffer, count)) + return -EFAULT; + + input[count] = '\0'; + + argc = sscanf(input, "%u %d", &uid, &state); + if (argc != 2) + return -EINVAL; + + if (state != UID_STATE_BACKGROUND && state != UID_STATE_FOREGROUND) + return -EINVAL; + + mutex_lock(&uid_lock); + + uid_entry = find_or_register_uid(uid); + if (!uid_entry || uid_entry->state == state) { + mutex_unlock(&uid_lock); + return -EINVAL; + } + + update_io_stats_locked(); + + uid_entry->state = state; + + mutex_unlock(&uid_lock); + + return count; +} + +static const struct file_operations uid_procstat_fops = { + .open = uid_procstat_open, + .release = single_release, + .write = uid_procstat_write, +}; + static int process_notifier(struct notifier_block *self, unsigned long cmd, void *v) { @@ -207,6 +382,9 @@ static int process_notifier(struct notifier_block *self, uid_entry->utime += utime; uid_entry->stime += stime; + update_io_stats_locked(); + clean_uid_io_last_stats(uid_entry, task); + exit: mutex_unlock(&uid_lock); return NOTIFY_OK; @@ -216,25 +394,51 @@ static struct notifier_block process_notifier_block = { .notifier_call = process_notifier, }; -static int __init proc_uid_cputime_init(void) +static int __init proc_uid_sys_stats_init(void) { hash_init(hash_table); - parent = proc_mkdir("uid_cputime", NULL); - if (!parent) { - pr_err("%s: failed to create proc entry\n", __func__); - return -ENOMEM; + cpu_parent = proc_mkdir("uid_cputime", NULL); + if (!cpu_parent) { + pr_err("%s: failed to create uid_cputime proc entry\n", + __func__); + goto err; } - proc_create_data("remove_uid_range", S_IWUGO, parent, &uid_remove_fops, - NULL); + proc_create_data("remove_uid_range", 0222, cpu_parent, + &uid_remove_fops, NULL); + proc_create_data("show_uid_stat", 0444, cpu_parent, + &uid_cputime_fops, NULL); - proc_create_data("show_uid_stat", S_IRUGO, parent, &uid_stat_fops, - NULL); + io_parent = proc_mkdir("uid_io", NULL); + if (!io_parent) { + pr_err("%s: failed to create uid_io proc entry\n", + __func__); + goto err; + } + + proc_create_data("stats", 0444, io_parent, + &uid_io_fops, NULL); + + proc_parent = proc_mkdir("uid_procstat", NULL); + if (!proc_parent) { + pr_err("%s: failed to create uid_procstat proc entry\n", + __func__); + goto err; + } + + proc_create_data("set", 0222, proc_parent, + &uid_procstat_fops, NULL); profile_event_register(PROFILE_TASK_EXIT, &process_notifier_block); return 0; + +err: + remove_proc_subtree("uid_cputime", NULL); + remove_proc_subtree("uid_io", NULL); + remove_proc_subtree("uid_procstat", NULL); + return -ENOMEM; } -early_initcall(proc_uid_cputime_init); +early_initcall(proc_uid_sys_stats_init); From 2ea16502ca0049e5c199d18fef7dc23dccb1b35b Mon Sep 17 00:00:00 2001 From: Jin Qian Date: Tue, 10 Jan 2017 16:11:07 -0800 Subject: [PATCH 246/521] ANDROID: uid_sys_stats: rename uid_cputime.c to uid_sys_stats.c This module tracks cputime and io stats. Signed-off-by: Jin Qian Bug: 34198239 Change-Id: I9ee7d9e915431e0bb714b36b5a2282e1fdcc7342 --- android/configs/android-base.cfg | 2 +- drivers/misc/Kconfig | 6 ++++-- drivers/misc/Makefile | 2 +- drivers/misc/{uid_cputime.c => uid_sys_stats.c} | 0 4 files changed, 6 insertions(+), 4 deletions(-) rename drivers/misc/{uid_cputime.c => uid_sys_stats.c} (100%) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index f10371a981b7..2098fe97198c 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -157,7 +157,7 @@ CONFIG_STAGING=y CONFIG_SWP_EMULATION=y CONFIG_SYNC=y CONFIG_TUN=y -CONFIG_UID_CPUTIME=y +CONFIG_UID_SYS_STATS=y CONFIG_UNIX=y CONFIG_USB_GADGET=y CONFIG_USB_CONFIGFS=y diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 06eddc0cb24f..5847d3be0835 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -525,11 +525,13 @@ config VEXPRESS_SYSCFG bus. System Configuration interface is one of the possible means of generating transactions on this bus. -config UID_CPUTIME - bool "Per-UID cpu time statistics" +config UID_SYS_STATS + bool "Per-UID statistics" depends on PROFILING help Per UID based cpu time statistics exported to /proc/uid_cputime + Per UID based io statistics exported to /proc/uid_io + Per UID based procstat control in /proc/uid_procstat config MEMORY_STATE_TIME tristate "Memory freq/bandwidth time statistics" diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index b76b4c9fe104..9a3b402921b2 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -56,5 +56,5 @@ obj-$(CONFIG_GENWQE) += genwqe/ obj-$(CONFIG_ECHO) += echo/ obj-$(CONFIG_VEXPRESS_SYSCFG) += vexpress-syscfg.o obj-$(CONFIG_CXL_BASE) += cxl/ -obj-$(CONFIG_UID_CPUTIME) += uid_cputime.o +obj-$(CONFIG_UID_SYS_STATS) += uid_sys_stats.o obj-$(CONFIG_MEMORY_STATE_TIME) += memory_state_time.o diff --git a/drivers/misc/uid_cputime.c b/drivers/misc/uid_sys_stats.c similarity index 100% rename from drivers/misc/uid_cputime.c rename to drivers/misc/uid_sys_stats.c From 6309385d125045325813495274a0965fc353fd9b Mon Sep 17 00:00:00 2001 From: Jin Qian Date: Tue, 17 Jan 2017 17:26:07 -0800 Subject: [PATCH 247/521] ANDROID: uid_sys_stats: allow writing same state Signed-off-by: Jin Qian Bug: 34360629 Change-Id: Ia748351e07910b1febe54f0484ca1be58c4eb9c7 --- drivers/misc/uid_sys_stats.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/misc/uid_sys_stats.c b/drivers/misc/uid_sys_stats.c index f5135bb01a7a..5f82d17e7ad9 100644 --- a/drivers/misc/uid_sys_stats.c +++ b/drivers/misc/uid_sys_stats.c @@ -339,11 +339,16 @@ static ssize_t uid_procstat_write(struct file *file, mutex_lock(&uid_lock); uid_entry = find_or_register_uid(uid); - if (!uid_entry || uid_entry->state == state) { + if (!uid_entry) { mutex_unlock(&uid_lock); return -EINVAL; } + if (uid_entry->state == state) { + mutex_unlock(&uid_lock); + return count; + } + update_io_stats_locked(); uid_entry->state = state; From 1d7eb6de57af39750c5196512109c8309defa6c5 Mon Sep 17 00:00:00 2001 From: Jin Qian Date: Tue, 28 Feb 2017 15:09:42 -0800 Subject: [PATCH 248/521] ANDROID: uid_sys_stats: fix negative write bytes. A task can cancel writes made by other tasks. In rare cases, cancelled_write_bytes is larger than write_bytes if the task itself didn't make any write. This doesn't affect total size but may cause confusion when looking at IO usage on individual tasks. Bug: 35851986 Change-Id: If6cb549aeef9e248e18d804293401bb2b91918ca Signed-off-by: Jin Qian --- drivers/misc/uid_sys_stats.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/misc/uid_sys_stats.c b/drivers/misc/uid_sys_stats.c index 5f82d17e7ad9..7b746acf416c 100644 --- a/drivers/misc/uid_sys_stats.c +++ b/drivers/misc/uid_sys_stats.c @@ -203,14 +203,21 @@ static const struct file_operations uid_remove_fops = { .write = uid_remove_write, }; +static u64 compute_write_bytes(struct task_struct *task) +{ + if (task->ioac.write_bytes <= task->ioac.cancelled_write_bytes) + return 0; + + return task->ioac.write_bytes - task->ioac.cancelled_write_bytes; +} + static void add_uid_io_curr_stats(struct uid_entry *uid_entry, struct task_struct *task) { struct io_stats *io_curr = &uid_entry->io[UID_STATE_TOTAL_CURR]; io_curr->read_bytes += task->ioac.read_bytes; - io_curr->write_bytes += - task->ioac.write_bytes - task->ioac.cancelled_write_bytes; + io_curr->write_bytes += compute_write_bytes(task); io_curr->rchar += task->ioac.rchar; io_curr->wchar += task->ioac.wchar; } @@ -221,8 +228,7 @@ static void clean_uid_io_last_stats(struct uid_entry *uid_entry, struct io_stats *io_last = &uid_entry->io[UID_STATE_TOTAL_LAST]; io_last->read_bytes -= task->ioac.read_bytes; - io_last->write_bytes -= - task->ioac.write_bytes - task->ioac.cancelled_write_bytes; + io_last->write_bytes -= compute_write_bytes(task); io_last->rchar -= task->ioac.rchar; io_last->wchar -= task->ioac.wchar; } From 6a0a471069e6ebdc8ed660c57f7a72bb933b722b Mon Sep 17 00:00:00 2001 From: Jin Qian Date: Thu, 2 Mar 2017 13:32:59 -0800 Subject: [PATCH 249/521] ANDROID: sched: add a counter to track fsync Change-Id: I6c138de5b2332eea70f57e098134d1d141247b3f Signed-off-by: Jin Qian --- fs/sync.c | 1 + include/linux/sched.h | 8 ++++++++ include/linux/task_io_accounting.h | 2 ++ include/linux/task_io_accounting_ops.h | 1 + 4 files changed, 12 insertions(+) diff --git a/fs/sync.c b/fs/sync.c index dd5d1711c7ac..452179e31c39 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -218,6 +218,7 @@ static int do_fsync(unsigned int fd, int datasync) if (f.file) { ret = vfs_fsync(f.file, datasync); fdput(f); + inc_syscfs(current); } return ret; } diff --git a/include/linux/sched.h b/include/linux/sched.h index 8be9f0dbdd0c..5b250c9f7718 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -3227,6 +3227,11 @@ static inline void inc_syscw(struct task_struct *tsk) { tsk->ioac.syscw++; } + +static inline void inc_syscfs(struct task_struct *tsk) +{ + tsk->ioac.syscfs++; +} #else static inline void add_rchar(struct task_struct *tsk, ssize_t amt) { @@ -3243,6 +3248,9 @@ static inline void inc_syscr(struct task_struct *tsk) static inline void inc_syscw(struct task_struct *tsk) { } +static inline void inc_syscfs(struct task_struct *tsk) +{ +} #endif #ifndef TASK_SIZE_OF diff --git a/include/linux/task_io_accounting.h b/include/linux/task_io_accounting.h index bdf855c2856f..2dd338fdf881 100644 --- a/include/linux/task_io_accounting.h +++ b/include/linux/task_io_accounting.h @@ -18,6 +18,8 @@ struct task_io_accounting { u64 syscr; /* # of write syscalls */ u64 syscw; + /* # of fsync syscalls */ + u64 syscfs; #endif /* CONFIG_TASK_XACCT */ #ifdef CONFIG_TASK_IO_ACCOUNTING diff --git a/include/linux/task_io_accounting_ops.h b/include/linux/task_io_accounting_ops.h index 4d090f9ee608..1b505c804af3 100644 --- a/include/linux/task_io_accounting_ops.h +++ b/include/linux/task_io_accounting_ops.h @@ -96,6 +96,7 @@ static inline void task_chr_io_accounting_add(struct task_io_accounting *dst, dst->wchar += src->wchar; dst->syscr += src->syscr; dst->syscw += src->syscw; + dst->syscfs += src->syscfs; } #else static inline void task_chr_io_accounting_add(struct task_io_accounting *dst, From 529da1fb6b441a530360b8026a675560bbd870f4 Mon Sep 17 00:00:00 2001 From: Jin Qian Date: Thu, 2 Mar 2017 13:39:43 -0800 Subject: [PATCH 250/521] ANDROID: uid_sys_stats: account for fsync syscalls Change-Id: Ie888d8a0f4ec7a27dea86dc4afba8e6fd4203488 Signed-off-by: Jin Qian --- drivers/misc/uid_sys_stats.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/misc/uid_sys_stats.c b/drivers/misc/uid_sys_stats.c index 7b746acf416c..4988e323cf02 100644 --- a/drivers/misc/uid_sys_stats.c +++ b/drivers/misc/uid_sys_stats.c @@ -39,6 +39,7 @@ struct io_stats { u64 write_bytes; u64 rchar; u64 wchar; + u64 fsync; }; #define UID_STATE_FOREGROUND 0 @@ -220,6 +221,7 @@ static void add_uid_io_curr_stats(struct uid_entry *uid_entry, io_curr->write_bytes += compute_write_bytes(task); io_curr->rchar += task->ioac.rchar; io_curr->wchar += task->ioac.wchar; + io_curr->fsync += task->ioac.syscfs; } static void clean_uid_io_last_stats(struct uid_entry *uid_entry, @@ -231,6 +233,7 @@ static void clean_uid_io_last_stats(struct uid_entry *uid_entry, io_last->write_bytes -= compute_write_bytes(task); io_last->rchar -= task->ioac.rchar; io_last->wchar -= task->ioac.wchar; + io_last->fsync -= task->ioac.syscfs; } static void update_io_stats_locked(void) @@ -267,11 +270,13 @@ static void update_io_stats_locked(void) io_curr->write_bytes - io_last->write_bytes; io_bucket->rchar += io_curr->rchar - io_last->rchar; io_bucket->wchar += io_curr->wchar - io_last->wchar; + io_bucket->fsync += io_curr->fsync - io_last->fsync; io_last->read_bytes = io_curr->read_bytes; io_last->write_bytes = io_curr->write_bytes; io_last->rchar = io_curr->rchar; io_last->wchar = io_curr->wchar; + io_last->fsync = io_curr->fsync; } } @@ -285,7 +290,7 @@ static int uid_io_show(struct seq_file *m, void *v) update_io_stats_locked(); hash_for_each(hash_table, bkt, uid_entry, hash) { - seq_printf(m, "%d %llu %llu %llu %llu %llu %llu %llu %llu\n", + seq_printf(m, "%d %llu %llu %llu %llu %llu %llu %llu %llu %llu %llu\n", uid_entry->uid, uid_entry->io[UID_STATE_FOREGROUND].rchar, uid_entry->io[UID_STATE_FOREGROUND].wchar, @@ -294,7 +299,9 @@ static int uid_io_show(struct seq_file *m, void *v) uid_entry->io[UID_STATE_BACKGROUND].rchar, uid_entry->io[UID_STATE_BACKGROUND].wchar, uid_entry->io[UID_STATE_BACKGROUND].read_bytes, - uid_entry->io[UID_STATE_BACKGROUND].write_bytes); + uid_entry->io[UID_STATE_BACKGROUND].write_bytes, + uid_entry->io[UID_STATE_FOREGROUND].fsync, + uid_entry->io[UID_STATE_BACKGROUND].fsync); } mutex_unlock(&uid_lock); From 8a11d0a178e20f5d6e17c724ac2fe72eabb4238e Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Wed, 1 Mar 2017 17:04:41 -0800 Subject: [PATCH 251/521] ANDROID: sdcardfs: Fix case insensitive lookup The previous case insensitive lookup relied on the entry being present in the dcache. This instead uses iterate_dir to find the correct case. Signed-off-by: Daniel Rosenberg bug: 35633782 Change-Id: I556f7090773468c1943c89a5e2aa07f746ba49c5 --- fs/sdcardfs/lookup.c | 68 +++++++++++++++++++++++++++++++++----------- 1 file changed, 51 insertions(+), 17 deletions(-) diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index 6b595e892316..bbfb0775c700 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -206,6 +206,28 @@ int sdcardfs_interpose(struct dentry *dentry, struct super_block *sb, return err; } +struct sdcardfs_name_data { + struct dir_context ctx; + const struct qstr *to_find; + char *name; + bool found; +}; + +static int sdcardfs_name_match(struct dir_context *ctx, const char *name, int namelen, + loff_t offset, u64 ino, unsigned int d_type) +{ + struct sdcardfs_name_data *buf = container_of(ctx, struct sdcardfs_name_data, ctx); + struct qstr candidate = QSTR_INIT(name, namelen); + + if (qstr_case_eq(buf->to_find, &candidate)) { + memcpy(buf->name, name, namelen); + buf->name[namelen] = 0; + buf->found = true; + return 1; + } + return 0; +} + /* * Main driver function for sdcardfs's lookup. * @@ -242,27 +264,39 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, &lower_path); /* check for other cases */ if (err == -ENOENT) { - struct dentry *child; - struct dentry *match = NULL; - mutex_lock(&d_inode(lower_dir_dentry)->i_mutex); - spin_lock(&lower_dir_dentry->d_lock); - list_for_each_entry(child, &lower_dir_dentry->d_subdirs, d_child) { - if (child && d_inode(child)) { - if (qstr_case_eq(&child->d_name, name)) { - match = dget(child); - break; - } - } + struct file *file; + const struct cred *cred = current_cred(); + + struct sdcardfs_name_data buffer = { + .ctx.actor = sdcardfs_name_match, + .to_find = name, + .name = __getname(), + .found = false, + }; + + if (!buffer.name) { + err = -ENOMEM; + goto out; } - spin_unlock(&lower_dir_dentry->d_lock); - mutex_unlock(&d_inode(lower_dir_dentry)->i_mutex); - if (match) { + file = dentry_open(lower_parent_path, O_RDONLY, cred); + if (IS_ERR(file)) { + err = PTR_ERR(file); + goto put_name; + } + err = iterate_dir(file, &buffer.ctx); + fput(file); + if (err) + goto put_name; + + if (buffer.found) err = vfs_path_lookup(lower_dir_dentry, lower_dir_mnt, - match->d_name.name, 0, + buffer.name, 0, &lower_path); - dput(match); - } + else + err = -ENOENT; +put_name: + __putname(buffer.name); } /* no error: handle positive dentries */ From 776f9ef84a9a8ee56ff0687fb60da77327ad7605 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 2 Mar 2017 18:07:21 -0800 Subject: [PATCH 252/521] ANDROID: sdcardfs: rate limit warning print Signed-off-by: Daniel Rosenberg Bug: 35848445 Change-Id: Ida72ea0ece191b2ae4a8babae096b2451eb563f6 --- fs/sdcardfs/inode.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/sdcardfs/inode.c b/fs/sdcardfs/inode.c index 68e615045616..aa85fcc8ec1a 100644 --- a/fs/sdcardfs/inode.c +++ b/fs/sdcardfs/inode.c @@ -20,6 +20,7 @@ #include "sdcardfs.h" #include +#include /* Do not directly use this function. Use OVERRIDE_CRED() instead. */ const struct cred * override_fsids(struct sdcardfs_sb_info* sbi, struct sdcardfs_inode_info *info) @@ -599,7 +600,7 @@ static const char *sdcardfs_follow_link(struct dentry *dentry, void **cookie) static int sdcardfs_permission_wrn(struct inode *inode, int mask) { - WARN(1, "sdcardfs does not support permission. Use permission2.\n"); + WARN_RATELIMIT(1, "sdcardfs does not support permission. Use permission2.\n"); return -EINVAL; } @@ -684,7 +685,7 @@ static int sdcardfs_permission(struct vfsmount *mnt, struct inode *inode, int ma static int sdcardfs_setattr_wrn(struct dentry *dentry, struct iattr *ia) { - WARN(1, "sdcardfs does not support setattr. User setattr2.\n"); + WARN_RATELIMIT(1, "sdcardfs does not support setattr. User setattr2.\n"); return -EINVAL; } From b9c93046dc7335bc2f5a6eb18c2c5b37d8e7db0d Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 2 Mar 2017 15:11:27 -0800 Subject: [PATCH 253/521] ANDROID: sdcardfs: Replace get/put with d_lock dput cannot be called with a spin_lock. Instead, we protect our accesses by holding the d_lock. Signed-off-by: Daniel Rosenberg Bug: 35643557 Change-Id: I22cf30856d75b5616cbb0c223724f5ab866b5114 --- fs/sdcardfs/derived_perm.c | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index ca239a942065..c0d5a9d72bde 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -261,40 +261,48 @@ static int needs_fixup(perm_t perm) { return 0; } -void fixup_perms_recursive(struct dentry *dentry, struct limit_search *limit) { +static void __fixup_perms_recursive(struct dentry *dentry, struct limit_search *limit, int depth) +{ struct dentry *child; struct sdcardfs_inode_info *info; - if (!dget(dentry)) - return; + + /* + * All paths will terminate their recursion on hitting PERM_ANDROID_OBB, + * PERM_ANDROID_MEDIA, or PERM_ANDROID_DATA. This happens at a depth of + * at most 3. + */ + WARN(depth > 3, "%s: Max expected depth exceeded!\n", __func__); + spin_lock_nested(&dentry->d_lock, depth); if (!d_inode(dentry)) { - dput(dentry); + spin_unlock(&dentry->d_lock); return; } info = SDCARDFS_I(d_inode(dentry)); if (needs_fixup(info->perm)) { - spin_lock(&dentry->d_lock); list_for_each_entry(child, &dentry->d_subdirs, d_child) { - dget(child); + spin_lock_nested(&child->d_lock, depth + 1); if (!(limit->flags & BY_NAME) || !strncasecmp(child->d_name.name, limit->name, limit->length)) { if (d_inode(child)) { get_derived_permission(dentry, child); fixup_tmp_permissions(d_inode(child)); - dput(child); + spin_unlock(&child->d_lock); break; } } - dput(child); + spin_unlock(&child->d_lock); } - spin_unlock(&dentry->d_lock); } else if (descendant_may_need_fixup(info, limit)) { - spin_lock(&dentry->d_lock); list_for_each_entry(child, &dentry->d_subdirs, d_child) { - fixup_perms_recursive(child, limit); + __fixup_perms_recursive(child, limit, depth + 1); } - spin_unlock(&dentry->d_lock); } - dput(dentry); + spin_unlock(&dentry->d_lock); +} + +void fixup_perms_recursive(struct dentry *dentry, struct limit_search *limit) +{ + __fixup_perms_recursive(dentry, limit, 0); } void drop_recursive(struct dentry *parent) { From 2a9fca6e53c1969941d29f41931f24dd7a73926f Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Wed, 8 Mar 2017 17:11:51 -0800 Subject: [PATCH 254/521] ANDROID: sdcardfs: Use spin_lock_nested Signed-off-by: Daniel Rosenberg Bug: 36007653 Change-Id: I805d5afec797669679853fb2bb993ee38e6276e4 --- fs/sdcardfs/dentry.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/sdcardfs/dentry.c b/fs/sdcardfs/dentry.c index 971928ab6c21..e6f8e9edf8a9 100644 --- a/fs/sdcardfs/dentry.c +++ b/fs/sdcardfs/dentry.c @@ -76,10 +76,10 @@ static int sdcardfs_d_revalidate(struct dentry *dentry, unsigned int flags) if (dentry < lower_dentry) { spin_lock(&dentry->d_lock); - spin_lock(&lower_dentry->d_lock); + spin_lock_nested(&lower_dentry->d_lock, DENTRY_D_LOCK_NESTED); } else { spin_lock(&lower_dentry->d_lock); - spin_lock(&dentry->d_lock); + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); } if (dentry->d_name.len != lower_dentry->d_name.len) { From d66d1e328be9c5156e2af93bf3bcf76aea0ab7aa Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Wed, 8 Mar 2017 17:20:02 -0800 Subject: [PATCH 255/521] ANDROID: sdcardfs: Switch to internal case insensitive compare There were still a few places where we called into a case insensitive lookup that was not defined by sdcardfs. Moving them all to the same place will allow us to switch the implementation in the future. Additionally, the check in fixup_perms_recursive did not take into account the length of both strings, causing extraneous matches when the name we were looking for was a prefix of the child name. Signed-off-by: Daniel Rosenberg Change-Id: I45ce768cd782cb4ea1ae183772781387c590ecc2 --- fs/sdcardfs/dentry.c | 8 ++------ fs/sdcardfs/derived_perm.c | 2 +- fs/sdcardfs/packagelist.c | 6 ++---- fs/sdcardfs/sdcardfs.h | 8 ++++++-- 4 files changed, 11 insertions(+), 13 deletions(-) diff --git a/fs/sdcardfs/dentry.c b/fs/sdcardfs/dentry.c index e6f8e9edf8a9..64494a50e250 100644 --- a/fs/sdcardfs/dentry.c +++ b/fs/sdcardfs/dentry.c @@ -82,11 +82,7 @@ static int sdcardfs_d_revalidate(struct dentry *dentry, unsigned int flags) spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); } - if (dentry->d_name.len != lower_dentry->d_name.len) { - __d_drop(dentry); - err = 0; - } else if (strncasecmp(dentry->d_name.name, lower_dentry->d_name.name, - dentry->d_name.len) != 0) { + if (!qstr_case_eq(&dentry->d_name, &lower_dentry->d_name)) { __d_drop(dentry); err = 0; } @@ -166,7 +162,7 @@ static int sdcardfs_cmp_ci(const struct dentry *parent, } */ if (name->len == len) { - if (strncasecmp(name->name, str, len) == 0) + if (str_n_case_eq(name->name, str, len)) return 0; } return 1; diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index c0d5a9d72bde..925692f0d20c 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -282,7 +282,7 @@ static void __fixup_perms_recursive(struct dentry *dentry, struct limit_search * if (needs_fixup(info->perm)) { list_for_each_entry(child, &dentry->d_subdirs, d_child) { spin_lock_nested(&child->d_lock, depth + 1); - if (!(limit->flags & BY_NAME) || !strncasecmp(child->d_name.name, limit->name, limit->length)) { + if (!(limit->flags & BY_NAME) || qstr_case_eq(&child->d_name, &limit->name)) { if (d_inode(child)) { get_derived_permission(dentry, child); fixup_tmp_permissions(d_inode(child)); diff --git a/fs/sdcardfs/packagelist.c b/fs/sdcardfs/packagelist.c index 56d643f4a9ee..68f8f4571615 100644 --- a/fs/sdcardfs/packagelist.c +++ b/fs/sdcardfs/packagelist.c @@ -251,8 +251,7 @@ static void fixup_all_perms_name(const struct qstr *key) struct sdcardfs_sb_info *sbinfo; struct limit_search limit = { .flags = BY_NAME, - .name = key->name, - .length = key->len, + .name = QSTR_INIT(key->name, key->len), }; list_for_each_entry(sbinfo, &sdcardfs_super_list, list) { if (sbinfo_has_sdcard_magic(sbinfo)) @@ -265,8 +264,7 @@ static void fixup_all_perms_name_userid(const struct qstr *key, userid_t userid) struct sdcardfs_sb_info *sbinfo; struct limit_search limit = { .flags = BY_NAME | BY_USERID, - .name = key->name, - .length = key->len, + .name = QSTR_INIT(key->name, key->len), .userid = userid, }; list_for_each_entry(sbinfo, &sdcardfs_super_list, list) { diff --git a/fs/sdcardfs/sdcardfs.h b/fs/sdcardfs/sdcardfs.h index 042f989f0bea..0778eb063b63 100644 --- a/fs/sdcardfs/sdcardfs.h +++ b/fs/sdcardfs/sdcardfs.h @@ -470,8 +470,7 @@ extern void packagelist_exit(void); #define BY_USERID (1 << 1) struct limit_search { unsigned int flags; - const char *name; - size_t length; + struct qstr name; userid_t userid; }; @@ -612,6 +611,11 @@ static inline bool str_case_eq(const char *s1, const char *s2) return !strcasecmp(s1, s2); } +static inline bool str_n_case_eq(const char *s1, const char *s2, size_t len) +{ + return !strncasecmp(s1, s2, len); +} + static inline bool qstr_case_eq(const struct qstr *q1, const struct qstr *q2) { return q1->len == q2->len && str_case_eq(q1->name, q2->name); From 9dea441687d6fa0cc853a5d9b242a9e57ba51c34 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Wed, 8 Mar 2017 17:45:46 -0800 Subject: [PATCH 256/521] ANDROID: sdcardfs: Use d_invalidate instead of drop_recurisve drop_recursive did not properly remove stale dentries. Instead, we use the vfs's d_invalidate, which does the proper cleanup. Additionally, remove the no longer used drop_recursive, and fixup_top_recursive that that are no longer used. Signed-off-by: Daniel Rosenberg Change-Id: Ibff61b0c34b725b024a050169047a415bc90f0d8 --- fs/sdcardfs/derived_perm.c | 38 -------------------------------------- fs/sdcardfs/inode.c | 2 +- fs/sdcardfs/sdcardfs.h | 2 -- 3 files changed, 1 insertion(+), 41 deletions(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index 925692f0d20c..4b365b95b437 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -305,44 +305,6 @@ void fixup_perms_recursive(struct dentry *dentry, struct limit_search *limit) __fixup_perms_recursive(dentry, limit, 0); } -void drop_recursive(struct dentry *parent) { - struct dentry *dentry; - struct sdcardfs_inode_info *info; - if (!d_inode(parent)) - return; - info = SDCARDFS_I(d_inode(parent)); - spin_lock(&parent->d_lock); - list_for_each_entry(dentry, &parent->d_subdirs, d_child) { - if (d_inode(dentry)) { - if (SDCARDFS_I(d_inode(parent))->top != SDCARDFS_I(d_inode(dentry))->top) { - drop_recursive(dentry); - d_drop(dentry); - } - } - } - spin_unlock(&parent->d_lock); -} - -void fixup_top_recursive(struct dentry *parent) { - struct dentry *dentry; - struct sdcardfs_inode_info *info; - - if (!d_inode(parent)) - return; - info = SDCARDFS_I(d_inode(parent)); - spin_lock(&parent->d_lock); - list_for_each_entry(dentry, &parent->d_subdirs, d_child) { - if (d_inode(dentry)) { - if (SDCARDFS_I(d_inode(parent))->top != SDCARDFS_I(d_inode(dentry))->top) { - get_derived_permission(parent, dentry); - fixup_tmp_permissions(d_inode(dentry)); - fixup_top_recursive(dentry); - } - } - } - spin_unlock(&parent->d_lock); -} - /* main function for updating derived permission */ inline void update_derived_permission_lock(struct dentry *dentry) { diff --git a/fs/sdcardfs/inode.c b/fs/sdcardfs/inode.c index aa85fcc8ec1a..8d0875ff5710 100644 --- a/fs/sdcardfs/inode.c +++ b/fs/sdcardfs/inode.c @@ -529,7 +529,7 @@ static int sdcardfs_rename(struct inode *old_dir, struct dentry *old_dentry, get_derived_permission_new(new_dentry->d_parent, old_dentry, &new_dentry->d_name); fixup_tmp_permissions(d_inode(old_dentry)); fixup_lower_ownership(old_dentry, new_dentry->d_name.name); - drop_recursive(old_dentry); /* Can't fixup ownership recursively :( */ + d_invalidate(old_dentry); /* Can't fixup ownership recursively :( */ out: unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry); dput(lower_old_dir_dentry); diff --git a/fs/sdcardfs/sdcardfs.h b/fs/sdcardfs/sdcardfs.h index 0778eb063b63..e28a7dd47ebb 100644 --- a/fs/sdcardfs/sdcardfs.h +++ b/fs/sdcardfs/sdcardfs.h @@ -478,8 +478,6 @@ extern void setup_derived_state(struct inode *inode, perm_t perm, userid_t useri uid_t uid, bool under_android, struct inode *top); extern void get_derived_permission(struct dentry *parent, struct dentry *dentry); extern void get_derived_permission_new(struct dentry *parent, struct dentry *dentry, const struct qstr *name); -extern void drop_recursive(struct dentry *parent); -extern void fixup_top_recursive(struct dentry *parent); extern void fixup_perms_recursive(struct dentry *dentry, struct limit_search *limit); extern void update_derived_permission_lock(struct dentry *dentry); From 078a4b13d8253ec7471273a8ff3d02d902c3c1a7 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 9 Mar 2017 18:12:16 -0800 Subject: [PATCH 257/521] ANDROID: sdcardfs: Get the blocksize from the lower fs This changes sdcardfs to be more in line with the getattr in wrapfs, which calls the lower fs's getattr to get the block size Signed-off-by: Daniel Rosenberg Bug: 34723223 Change-Id: I1c9e16604ba580a8cdefa17f02dcc489d7351aed --- fs/sdcardfs/inode.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/fs/sdcardfs/inode.c b/fs/sdcardfs/inode.c index 8d0875ff5710..f713fad909d8 100644 --- a/fs/sdcardfs/inode.c +++ b/fs/sdcardfs/inode.c @@ -856,9 +856,7 @@ static int sdcardfs_fillattr(struct vfsmount *mnt, struct inode *inode, struct k static int sdcardfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { - struct dentry *lower_dentry; - struct inode *inode; - struct inode *lower_inode; + struct kstat lower_stat; struct path lower_path; struct dentry *parent; int err; @@ -873,16 +871,15 @@ static int sdcardfs_getattr(struct vfsmount *mnt, struct dentry *dentry, } dput(parent); - inode = d_inode(dentry); - sdcardfs_get_lower_path(dentry, &lower_path); - lower_dentry = lower_path.dentry; - lower_inode = sdcardfs_lower_inode(inode); - - sdcardfs_copy_and_fix_attrs(inode, lower_inode); - fsstack_copy_inode_size(inode, lower_inode); - - err = sdcardfs_fillattr(mnt, inode, stat); + err = vfs_getattr(&lower_path, &lower_stat); + if (err) + goto out; + sdcardfs_copy_and_fix_attrs(d_inode(dentry), + d_inode(lower_path.dentry)); + err = sdcardfs_fillattr(mnt, d_inode(dentry), stat); + stat->blocks = lower_stat.blocks; +out: sdcardfs_put_lower_path(dentry, &lower_path); return err; } From 52d522b50584201ccc26a22bb588f805a2e6fa58 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 9 Mar 2017 20:59:18 -0800 Subject: [PATCH 258/521] ANDROID: sdcardfs: declare MODULE_ALIAS_FS From commit ee616b78aa87 ("Wrapfs: declare MODULE_ALIAS_FS") Signed-off-by: Daniel Rosenberg bug: 35766959 Change-Id: Ia4728ab49d065b1d2eb27825046f14b97c328cba --- fs/sdcardfs/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/sdcardfs/main.c b/fs/sdcardfs/main.c index 7a8eae29e44d..4e2aded8d1d9 100644 --- a/fs/sdcardfs/main.c +++ b/fs/sdcardfs/main.c @@ -432,6 +432,7 @@ static struct file_system_type sdcardfs_fs_type = { .kill_sb = sdcardfs_kill_sb, .fs_flags = 0, }; +MODULE_ALIAS_FS(SDCARDFS_NAME); static int __init init_sdcardfs_fs(void) { From 989624c14f9b69dfa67007886661b16d8aad05f9 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Fri, 10 Mar 2017 12:39:42 -0800 Subject: [PATCH 259/521] ANDROID: sdcardfs: Use case insensitive hash function Case insensitive comparisons don't help us much if we hash to different buckets... Signed-off-by: Daniel Rosenberg bug: 36004503 Change-Id: I91e00dbcd860a709cbd4f7fd7fc6d855779f3285 --- fs/sdcardfs/packagelist.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/sdcardfs/packagelist.c b/fs/sdcardfs/packagelist.c index 68f8f4571615..e72fe83f7837 100644 --- a/fs/sdcardfs/packagelist.c +++ b/fs/sdcardfs/packagelist.c @@ -20,6 +20,7 @@ #include "sdcardfs.h" #include +#include #include #include #include @@ -44,10 +45,18 @@ static DEFINE_HASHTABLE(ext_to_groupid, 8); static struct kmem_cache *hashtable_entry_cachep; +static unsigned int full_name_case_hash(const unsigned char *name, unsigned int len) +{ + unsigned long hash = init_name_hash(); + while (len--) + hash = partial_name_hash(tolower(*name++), hash); + return end_name_hash(hash); +} + static void inline qstr_init(struct qstr *q, const char *name) { q->name = name; q->len = strlen(q->name); - q->hash = full_name_hash(q->name, q->len); + q->hash = full_name_case_hash(q->name, q->len); } static inline int qstr_copy(const struct qstr *src, struct qstr *dest) { From 73b25e7521514bc277d705988447ad0fe1ef1eff Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Fri, 10 Mar 2017 13:54:30 -0800 Subject: [PATCH 260/521] ANDROID: sdcardfs: move path_put outside of spinlock Signed-off-by: Daniel Rosenberg Bug: 35643557 Change-Id: Ib279ebd7dd4e5884d184d67696a93e34993bc1ef --- fs/sdcardfs/derived_perm.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index 4b365b95b437..dba58d8c9d60 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -357,6 +357,8 @@ int is_obbpath_invalid(struct dentry *dent) struct sdcardfs_dentry_info *di = SDCARDFS_D(dent); struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dent->d_sb); char *path_buf, *obbpath_s; + int need_put = 0; + struct path lower_path; /* check the base obbpath has been changed. * this routine can check an uninitialized obb dentry as well. @@ -383,10 +385,13 @@ int is_obbpath_invalid(struct dentry *dent) } //unlock_dir(lower_parent); - path_put(&di->lower_path); + pathcpy(&lower_path, &di->lower_path); + need_put = 1; } } spin_unlock(&di->lock); + if (need_put) + path_put(&lower_path); return ret; } From cc52b54b85c9b401f0c4bf8e7d6bad3dbbaab974 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Fri, 10 Mar 2017 18:58:25 -0800 Subject: [PATCH 261/521] ANDROID: sdcardfs: Remove uninformative prints At best these prints do not provide useful information, and at worst, some allow userspace to abuse the kernel log. Signed-off-by: Daniel Rosenberg Bug: 36138424 Change-Id: I812c57cc6a22b37262935ab77f48f3af4c36827e --- fs/sdcardfs/derived_perm.c | 1 - fs/sdcardfs/file.c | 3 --- fs/sdcardfs/inode.c | 24 +----------------------- fs/sdcardfs/lookup.c | 3 --- 4 files changed, 1 insertion(+), 30 deletions(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index dba58d8c9d60..763f10340487 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -437,7 +437,6 @@ int setup_obb_dentry(struct dentry *dentry, struct path *lower_path) if(!err) { /* the obbpath base has been found */ - printk(KERN_INFO "sdcardfs: the sbi->obbpath is found\n"); pathcpy(lower_path, &obbpath); } else { /* if the sbi->obbpath is not available, we can optionally diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c index 23f8cd7f8877..0592facef704 100644 --- a/fs/sdcardfs/file.c +++ b/fs/sdcardfs/file.c @@ -217,9 +217,6 @@ static int sdcardfs_open(struct inode *inode, struct file *file) } if(!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { - printk(KERN_INFO "%s: need to check the caller's gid in packages.list\n" - " dentry: %s, task:%s\n", - __func__, dentry->d_name.name, current->comm); err = -EACCES; goto out_err; } diff --git a/fs/sdcardfs/inode.c b/fs/sdcardfs/inode.c index f713fad909d8..f19f11ea19fc 100644 --- a/fs/sdcardfs/inode.c +++ b/fs/sdcardfs/inode.c @@ -68,9 +68,6 @@ static int sdcardfs_create(struct inode *dir, struct dentry *dentry, struct fs_struct *copied_fs; if(!check_caller_access_to_name(dir, &dentry->d_name)) { - printk(KERN_INFO "%s: need to check the caller's gid in packages.list\n" - " dentry: %s, task:%s\n", - __func__, dentry->d_name.name, current->comm); err = -EACCES; goto out_eacces; } @@ -170,9 +167,6 @@ static int sdcardfs_unlink(struct inode *dir, struct dentry *dentry) const struct cred *saved_cred = NULL; if(!check_caller_access_to_name(dir, &dentry->d_name)) { - printk(KERN_INFO "%s: need to check the caller's gid in packages.list\n" - " dentry: %s, task:%s\n", - __func__, dentry->d_name.name, current->comm); err = -EACCES; goto out_eacces; } @@ -280,9 +274,6 @@ static int sdcardfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode struct qstr q_data = QSTR_LITERAL("data"); if(!check_caller_access_to_name(dir, &dentry->d_name)) { - printk(KERN_INFO "%s: need to check the caller's gid in packages.list\n" - " dentry: %s, task:%s\n", - __func__, dentry->d_name.name, current->comm); err = -EACCES; goto out_eacces; } @@ -392,9 +383,6 @@ static int sdcardfs_rmdir(struct inode *dir, struct dentry *dentry) const struct cred *saved_cred = NULL; if(!check_caller_access_to_name(dir, &dentry->d_name)) { - printk(KERN_INFO "%s: need to check the caller's gid in packages.list\n" - " dentry: %s, task:%s\n", - __func__, dentry->d_name.name, current->comm); err = -EACCES; goto out_eacces; } @@ -481,9 +469,6 @@ static int sdcardfs_rename(struct inode *old_dir, struct dentry *old_dentry, if(!check_caller_access_to_name(old_dir, &old_dentry->d_name) || !check_caller_access_to_name(new_dir, &new_dentry->d_name)) { - printk(KERN_INFO "%s: need to check the caller's gid in packages.list\n" - " new_dentry: %s, task:%s\n", - __func__, new_dentry->d_name.name, current->comm); err = -EACCES; goto out_eacces; } @@ -746,12 +731,8 @@ static int sdcardfs_setattr(struct vfsmount *mnt, struct dentry *dentry, struct if (!err) { /* check the Android group ID */ parent = dget_parent(dentry); - if(!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { - printk(KERN_INFO "%s: need to check the caller's gid in packages.list\n" - " dentry: %s, task:%s\n", - __func__, dentry->d_name.name, current->comm); + if (!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) err = -EACCES; - } dput(parent); } @@ -863,9 +844,6 @@ static int sdcardfs_getattr(struct vfsmount *mnt, struct dentry *dentry, parent = dget_parent(dentry); if(!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { - printk(KERN_INFO "%s: need to check the caller's gid in packages.list\n" - " dentry: %s, task:%s\n", - __func__, dentry->d_name.name, current->comm); dput(parent); return -EACCES; } diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index bbfb0775c700..fffb94c923c4 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -395,9 +395,6 @@ struct dentry *sdcardfs_lookup(struct inode *dir, struct dentry *dentry, if(!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { ret = ERR_PTR(-EACCES); - printk(KERN_INFO "%s: need to check the caller's gid in packages.list\n" - " dentry: %s, task:%s\n", - __func__, dentry->d_name.name, current->comm); goto out_err; } From 77de52e631bbb767d22c872a454b08727394032f Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Mon, 13 Mar 2017 13:53:54 -0700 Subject: [PATCH 262/521] ANDROID: sdcardfs: Use tabs instead of spaces in multiuser.h Signed-off-by: Daniel Rosenberg Bug: 35331000 Change-Id: Ic7801914a7dd377e270647f81070020e1f0bab9b --- fs/sdcardfs/multiuser.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/sdcardfs/multiuser.h b/fs/sdcardfs/multiuser.h index 52bc20080904..ca141ff40b49 100644 --- a/fs/sdcardfs/multiuser.h +++ b/fs/sdcardfs/multiuser.h @@ -29,21 +29,21 @@ typedef uid_t userid_t; typedef uid_t appid_t; static inline uid_t multiuser_get_uid(userid_t user_id, appid_t app_id) { - return (user_id * AID_USER_OFFSET) + (app_id % AID_USER_OFFSET); + return (user_id * AID_USER_OFFSET) + (app_id % AID_USER_OFFSET); } static inline gid_t multiuser_get_cache_gid(userid_t user_id, appid_t app_id) { - if (app_id >= AID_APP_START && app_id <= AID_APP_END) { - return multiuser_get_uid(user_id, (app_id - AID_APP_START) + AID_CACHE_GID_START); - } else { - return -1; - } + if (app_id >= AID_APP_START && app_id <= AID_APP_END) { + return multiuser_get_uid(user_id, (app_id - AID_APP_START) + AID_CACHE_GID_START); + } else { + return -1; + } } static inline gid_t multiuser_get_ext_gid(userid_t user_id, appid_t app_id) { - if (app_id >= AID_APP_START && app_id <= AID_APP_END) { - return multiuser_get_uid(user_id, (app_id - AID_APP_START) + AID_EXT_GID_START); - } else { - return -1; - } + if (app_id >= AID_APP_START && app_id <= AID_APP_END) { + return multiuser_get_uid(user_id, (app_id - AID_APP_START) + AID_EXT_GID_START); + } else { + return -1; + } } From 1d2509402f0531a082c81e963372d5d8e3368190 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Mon, 13 Mar 2017 15:34:03 -0700 Subject: [PATCH 263/521] ANDROID: sdcardfs: Fix gid issue We were already calculating most of these values, and erroring out because the check was confused by this. Instead of recalculating, adjust it as needed. Signed-off-by: Daniel Rosenberg Bug: 36160015 Change-Id: I9caf3e2fd32ca2e37ff8ed71b1d392f1761bc9a9 --- fs/sdcardfs/derived_perm.c | 4 ++-- fs/sdcardfs/multiuser.h | 21 ++++++++------------- 2 files changed, 10 insertions(+), 15 deletions(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index 763f10340487..28e9b8d42f9e 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -205,13 +205,13 @@ void fixup_lower_ownership(struct dentry* dentry, const char *name) { break; case PERM_ANDROID_PACKAGE: if (info->d_uid != 0) - gid = multiuser_get_ext_gid(info->userid, info->d_uid); + gid = multiuser_get_ext_gid(info->d_uid); else gid = multiuser_get_uid(info->userid, uid); break; case PERM_ANDROID_PACKAGE_CACHE: if (info->d_uid != 0) - gid = multiuser_get_cache_gid(info->userid, info->d_uid); + gid = multiuser_get_cache_gid(info->d_uid); else gid = multiuser_get_uid(info->userid, uid); break; diff --git a/fs/sdcardfs/multiuser.h b/fs/sdcardfs/multiuser.h index ca141ff40b49..2e89b5872314 100644 --- a/fs/sdcardfs/multiuser.h +++ b/fs/sdcardfs/multiuser.h @@ -28,22 +28,17 @@ typedef uid_t userid_t; typedef uid_t appid_t; -static inline uid_t multiuser_get_uid(userid_t user_id, appid_t app_id) { +static inline uid_t multiuser_get_uid(userid_t user_id, appid_t app_id) +{ return (user_id * AID_USER_OFFSET) + (app_id % AID_USER_OFFSET); } -static inline gid_t multiuser_get_cache_gid(userid_t user_id, appid_t app_id) { - if (app_id >= AID_APP_START && app_id <= AID_APP_END) { - return multiuser_get_uid(user_id, (app_id - AID_APP_START) + AID_CACHE_GID_START); - } else { - return -1; - } +static inline gid_t multiuser_get_cache_gid(uid_t uid) +{ + return uid - AID_APP_START + AID_CACHE_GID_START; } -static inline gid_t multiuser_get_ext_gid(userid_t user_id, appid_t app_id) { - if (app_id >= AID_APP_START && app_id <= AID_APP_END) { - return multiuser_get_uid(user_id, (app_id - AID_APP_START) + AID_EXT_GID_START); - } else { - return -1; - } +static inline gid_t multiuser_get_ext_gid(uid_t uid) +{ + return uid - AID_APP_START + AID_EXT_GID_START; } From dabcf1d17a04db801ac0d9068b3b20520369b8be Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Tue, 14 Mar 2017 15:39:05 -0700 Subject: [PATCH 264/521] ANDROID: vfs: user permission2 in notify_change2 This allows filesystems to use their mount private data to influence the permissions they use when attempting to touch. Signed-off-by: Daniel Rosenberg Bug: 36228261 Change-Id: I1052319ba1c3ce5d5e586aa7f8a80c08851a5c7f --- fs/attr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/attr.c b/fs/attr.c index 11be2265a2d5..c86b37c38fb7 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -211,7 +211,7 @@ int notify_change2(struct vfsmount *mnt, struct dentry * dentry, struct iattr * return -EPERM; if (!inode_owner_or_capable(inode)) { - error = inode_permission(inode, MAY_WRITE); + error = inode_permission2(mnt, inode, MAY_WRITE); if (error) return error; } From a870c9b4c9404ecb38449e22699ab4e85f996599 Mon Sep 17 00:00:00 2001 From: Wei Wang Date: Mon, 13 Mar 2017 12:22:21 -0700 Subject: [PATCH 265/521] uid_sys_stats: change to use rt_mutex We see this happens multiple times in heavy workload in systrace and AMS stuck in uid_lock. Running process: Process 953 Running thread: android.ui State: Uninterruptible Sleep Start: 1,025.628 ms Duration: 27,955.949 ms On CPU: Running instead: system_server Args: {kernel callsite when blocked:: "uid_procstat_write+0xb8/0x144"} Changing to rt_mutex can mitigate the priority inversion Bug: 34991231 Bug: 34193533 Test: on marlin Change-Id: I28eb3971331cea60b1075740c792ab87d103262c Signed-off-by: Wei Wang --- drivers/misc/uid_sys_stats.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/misc/uid_sys_stats.c b/drivers/misc/uid_sys_stats.c index 4988e323cf02..204b23484266 100644 --- a/drivers/misc/uid_sys_stats.c +++ b/drivers/misc/uid_sys_stats.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -29,7 +30,7 @@ #define UID_HASH_BITS 10 DECLARE_HASHTABLE(hash_table, UID_HASH_BITS); -static DEFINE_MUTEX(uid_lock); +static DEFINE_RT_MUTEX(uid_lock); static struct proc_dir_entry *cpu_parent; static struct proc_dir_entry *io_parent; static struct proc_dir_entry *proc_parent; @@ -98,7 +99,7 @@ static int uid_cputime_show(struct seq_file *m, void *v) cputime_t stime; unsigned long bkt; - mutex_lock(&uid_lock); + rt_mutex_lock(&uid_lock); hash_for_each(hash_table, bkt, uid_entry, hash) { uid_entry->active_stime = 0; @@ -111,7 +112,7 @@ static int uid_cputime_show(struct seq_file *m, void *v) current_user_ns(), task_uid(task))); if (!uid_entry) { read_unlock(&tasklist_lock); - mutex_unlock(&uid_lock); + rt_mutex_unlock(&uid_lock); pr_err("%s: failed to find the uid_entry for uid %d\n", __func__, from_kuid_munged(current_user_ns(), task_uid(task))); @@ -135,7 +136,7 @@ static int uid_cputime_show(struct seq_file *m, void *v) cputime_to_jiffies(total_stime)) * USEC_PER_MSEC); } - mutex_unlock(&uid_lock); + rt_mutex_unlock(&uid_lock); return 0; } @@ -182,7 +183,7 @@ static ssize_t uid_remove_write(struct file *file, kstrtol(end_uid, 10, &uid_end) != 0) { return -EINVAL; } - mutex_lock(&uid_lock); + rt_mutex_lock(&uid_lock); for (; uid_start <= uid_end; uid_start++) { hash_for_each_possible_safe(hash_table, uid_entry, tmp, @@ -194,7 +195,7 @@ static ssize_t uid_remove_write(struct file *file, } } - mutex_unlock(&uid_lock); + rt_mutex_unlock(&uid_lock); return count; } @@ -243,7 +244,7 @@ static void update_io_stats_locked(void) struct io_stats *io_bucket, *io_curr, *io_last; unsigned long bkt; - BUG_ON(!mutex_is_locked(&uid_lock)); + BUG_ON(!rt_mutex_is_locked(&uid_lock)); hash_for_each(hash_table, bkt, uid_entry, hash) memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0, @@ -285,7 +286,7 @@ static int uid_io_show(struct seq_file *m, void *v) struct uid_entry *uid_entry; unsigned long bkt; - mutex_lock(&uid_lock); + rt_mutex_lock(&uid_lock); update_io_stats_locked(); @@ -304,7 +305,7 @@ static int uid_io_show(struct seq_file *m, void *v) uid_entry->io[UID_STATE_BACKGROUND].fsync); } - mutex_unlock(&uid_lock); + rt_mutex_unlock(&uid_lock); return 0; } @@ -349,16 +350,16 @@ static ssize_t uid_procstat_write(struct file *file, if (state != UID_STATE_BACKGROUND && state != UID_STATE_FOREGROUND) return -EINVAL; - mutex_lock(&uid_lock); + rt_mutex_lock(&uid_lock); uid_entry = find_or_register_uid(uid); if (!uid_entry) { - mutex_unlock(&uid_lock); + rt_mutex_unlock(&uid_lock); return -EINVAL; } if (uid_entry->state == state) { - mutex_unlock(&uid_lock); + rt_mutex_unlock(&uid_lock); return count; } @@ -366,7 +367,7 @@ static ssize_t uid_procstat_write(struct file *file, uid_entry->state = state; - mutex_unlock(&uid_lock); + rt_mutex_unlock(&uid_lock); return count; } @@ -388,7 +389,7 @@ static int process_notifier(struct notifier_block *self, if (!task) return NOTIFY_OK; - mutex_lock(&uid_lock); + rt_mutex_lock(&uid_lock); uid = from_kuid_munged(current_user_ns(), task_uid(task)); uid_entry = find_or_register_uid(uid); if (!uid_entry) { @@ -404,7 +405,7 @@ static int process_notifier(struct notifier_block *self, clean_uid_io_last_stats(uid_entry, task); exit: - mutex_unlock(&uid_lock); + rt_mutex_unlock(&uid_lock); return NOTIFY_OK; } From f9d703b1b89943a6944dcdc53f64879d572cc197 Mon Sep 17 00:00:00 2001 From: Bowgo Tsai Date: Thu, 2 Mar 2017 18:54:15 +0800 Subject: [PATCH 266/521] ANDROID: dm: android-verity: allow disable dm-verity for Treble VTS To start Treble VTS test, a single AOSP system.img will be flashed onto the device. The size of AOSP system.img might be different than the system partition size on device, making locating verity metadata fail (at the last fixed size of the partition). This change allows disabling dm-verity on system partition when the device is unlocked (orange device state) with invalid metadata. BUG: 35603549 Test: boot device with a different-sized system.img, checks verity is not enabled via: "adb shell getprop | grep partition.system.verified" Change-Id: Ide78dca4eefde4ab019e4b202d3f590dcb1bb506 Signed-off-by: Bowgo Tsai --- drivers/md/dm-android-verity.c | 53 ++++++++++++++++++++++------------ 1 file changed, 35 insertions(+), 18 deletions(-) diff --git a/drivers/md/dm-android-verity.c b/drivers/md/dm-android-verity.c index ec0a4d19ca3e..c3c9502baf18 100644 --- a/drivers/md/dm-android-verity.c +++ b/drivers/md/dm-android-verity.c @@ -115,6 +115,12 @@ static inline bool is_userdebug(void) return !strncmp(buildvariant, typeuserdebug, sizeof(typeuserdebug)); } +static inline bool is_unlocked(void) +{ + static const char unlocked[] = "orange"; + + return !strncmp(verifiedbootstate, unlocked, sizeof(unlocked)); +} static int table_extract_mpi_array(struct public_key_signature *pks, const void *data, size_t len) @@ -650,6 +656,28 @@ static int add_as_linear_device(struct dm_target *ti, char *dev) return err; } +static int create_linear_device(struct dm_target *ti, dev_t dev, + char *target_device) +{ + u64 device_size = 0; + int err = find_size(dev, &device_size); + + if (err) { + DMERR("error finding bdev size"); + handle_error(); + return err; + } + + ti->len = device_size; + err = add_as_linear_device(ti, target_device); + if (err) { + handle_error(); + return err; + } + verity_enabled = false; + return 0; +} + /* * Target parameters: * Key id of the public key in the system keyring. @@ -673,7 +701,6 @@ static int android_verity_ctr(struct dm_target *ti, unsigned argc, char **argv) struct fec_ecc_metadata uninitialized_var(ecc); char buf[FEC_ARG_LENGTH], *buf_ptr; unsigned long long tmpll; - u64 uninitialized_var(device_size); if (argc == 1) { /* Use the default keyid */ @@ -701,23 +728,8 @@ static int android_verity_ctr(struct dm_target *ti, unsigned argc, char **argv) return -EINVAL; } - if (is_eng()) { - err = find_size(dev, &device_size); - if (err) { - DMERR("error finding bdev size"); - handle_error(); - return err; - } - - ti->len = device_size; - err = add_as_linear_device(ti, target_device); - if (err) { - handle_error(); - return err; - } - verity_enabled = false; - return 0; - } + if (is_eng()) + return create_linear_device(ti, dev, target_device); strreplace(key_id, '#', ' '); @@ -732,6 +744,11 @@ static int android_verity_ctr(struct dm_target *ti, unsigned argc, char **argv) err = extract_metadata(dev, &fec, &metadata, &verity_enabled); if (err) { + /* Allow invalid metadata when the device is unlocked */ + if (is_unlocked()) { + DMWARN("Allow invalid metadata when unlocked"); + return create_linear_device(ti, dev, target_device); + } DMERR("Error while extracting metadata"); handle_error(); goto free_metadata; From d48031aaa19658710aaa22e155fe6055ff1ea9ca Mon Sep 17 00:00:00 2001 From: yangdongdong Date: Sat, 8 Aug 2015 11:59:59 +0800 Subject: [PATCH 267/521] ANDROID: power: align wakeup_sources format This aligns every column of elements in wakeup_sources to conveniently check any specific column for suspicious power consumption wakeup source or for other easily human readable purpose. Change-Id: Iac8b0538170fcc0cca9f6857c15d9a4c62c8865e Signed-off-by: yangdongdong --- drivers/base/power/wakeup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index 09c07f519952..0e494108c20c 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -1042,7 +1042,7 @@ static int print_wakeup_source_stats(struct seq_file *m, active_time = ktime_set(0, 0); } - seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n", + seq_printf(m, "%-32s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n", ws->name, active_count, ws->event_count, ws->wakeup_count, ws->expire_count, ktime_to_ms(active_time), ktime_to_ms(total_time), @@ -1062,7 +1062,7 @@ static int wakeup_sources_stats_show(struct seq_file *m, void *unused) { struct wakeup_source *ws; - seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t" + seq_puts(m, "name\t\t\t\t\tactive_count\tevent_count\twakeup_count\t" "expire_count\tactive_since\ttotal_time\tmax_time\t" "last_change\tprevent_suspend_time\n"); From 1c634ee26be129b8f35dee320d5f77e2fe6e029a Mon Sep 17 00:00:00 2001 From: Max Shi Date: Fri, 26 Aug 2016 15:00:16 -0700 Subject: [PATCH 268/521] config: disable CONFIG_USELIB and CONFIG_FHANDLE turn off the two kernel configs to disable related system ABI. Bug: 30903194 Change-Id: I32e2ff3323135ce4b67a86f106fa9327a71fe309 Signed-off-by: Max Shi --- android/configs/android-base.cfg | 2 ++ 1 file changed, 2 insertions(+) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index 2098fe97198c..d1f9628e4377 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -1,10 +1,12 @@ # KEEP ALPHABETICALLY SORTED # CONFIG_DEVKMEM is not set # CONFIG_DEVMEM is not set +# CONFIG_FHANDLE is not set # CONFIG_INET_LRO is not set # CONFIG_MODULES is not set # CONFIG_OABI_COMPAT is not set # CONFIG_SYSVIPC is not set +# CONFIG_USELIB is not set CONFIG_ANDROID=y CONFIG_ANDROID_BINDER_IPC=y CONFIG_ANDROID_LOW_MEMORY_KILLER=y From eddcde3031dd40ec6f881a4b87a80f17b7239dd5 Mon Sep 17 00:00:00 2001 From: Greg Hackmann Date: Tue, 7 Mar 2017 10:37:56 -0800 Subject: [PATCH 269/521] ANDROID: sched: fix duplicate sched_group_energy const specifiers EAS uses "const struct sched_group_energy * const" fairly consistently. But a couple of places swap the "*" and second "const", making the pointer mutable. In the case of struct sched_group, "* const" would have been an error, since init_sched_energy() writes to sd->groups->sge. Change-Id: Ic6a8fcf99e65c0f25d9cc55c32625ef3ca5c9aca Signed-off-by: Greg Hackmann --- kernel/sched/fair.c | 2 +- kernel/sched/sched.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3331f453a17f..83cfb72b2d95 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4919,7 +4919,7 @@ long group_norm_util(struct energy_env *eenv, struct sched_group *sg) } static int find_new_capacity(struct energy_env *eenv, - const struct sched_group_energy const *sge) + const struct sched_group_energy * const sge) { int idx; unsigned long util = group_max_util(eenv); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2f2b959ad244..780522c65cea 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -915,7 +915,7 @@ struct sched_group { unsigned int group_weight; struct sched_group_capacity *sgc; - const struct sched_group_energy const *sge; + const struct sched_group_energy *sge; /* * The CPUs this group covers. From 7ff27b01370c4901258140ccfc2e6464ab69a82c Mon Sep 17 00:00:00 2001 From: Badhri Jagan Sridharan Date: Mon, 20 Mar 2017 14:06:27 -0700 Subject: [PATCH 270/521] ANDROID: android-verity: do not compile as independent module dm-android-verity depends on optional kernel command line parameters. When compiled as module the __setup macro ends up being a no-op resulting in the following warnings: /work/build/batch/drivers/md/dm-android-verity.c:91:19: warning: 'verity_buildvariant' defined but not used [-Wunused-function] static int __init verity_buildvariant(char *line) ^~~~~~~~~~~~~~~~~~~ /work/build/batch/drivers/md/dm-android-verity.c:83:19: warning: 'verity_keyid_param' defined but not used [-Wunused-function] static int __init verity_keyid_param(char *line) ^~~~~~~~~~~~~~~~~~ /work/build/batch/drivers/md/dm-android-verity.c:75:19: warning: 'verity_mode_param' defined but not used [-Wunused-function] static int __init verity_mode_param(char *line) ^~~~~~~~~~~~~~~~~ /work/build/batch/drivers/md/dm-android-verity.c:67:19: warning: 'verified_boot_state_param' defined but not used [-Wunused-function] static int __init verified_boot_state_param(char *line) ^~~~~~~~~~~~~~~~~~~~~~~~~ Tested with allmodconfig. Change-Id: Idfe0c97b216bb620cc7264e968b494eb3a765157 Signed-off-by: Badhri Jagan Sridharan --- drivers/md/Kconfig | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index 3d237a03dab3..9eb08a43cd27 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -516,15 +516,15 @@ config DM_LOG_WRITES If unsure, say N. config DM_ANDROID_VERITY - tristate "Android verity target support" - depends on DM_VERITY + bool "Android verity target support" + depends on DM_VERITY=y depends on X509_CERTIFICATE_PARSER depends on SYSTEM_TRUSTED_KEYRING depends on PUBLIC_KEY_ALGO_RSA depends on KEYS depends on ASYMMETRIC_KEY_TYPE depends on ASYMMETRIC_PUBLIC_KEY_SUBTYPE - depends on MD_LINEAR + depends on MD_LINEAR=y select DM_VERITY_HASH_PREFETCH_MIN_SIZE_128 ---help--- This device-mapper target is virtually a VERITY target. This From f118138a6c78fe21ae46f68063a0799ef16b17e5 Mon Sep 17 00:00:00 2001 From: Jungseung Lee Date: Thu, 22 Dec 2016 12:37:34 +0900 Subject: [PATCH 271/521] BACKPORT: mmc: core: Export device lifetime information through sysfs In the eMMC 5.0 version of the spec, several EXT_CSD fields about device lifetime are added. - Two types of estimated indications reflected by averaged wear out of memory - An indication reflected by average reserved blocks Export the information through sysfs. Signed-off-by: Jungseung Lee Reviewed-by: Jaehoon Chung Reviewed-by: Shawn Lin Signed-off-by: Ulf Hansson --- drivers/mmc/core/mmc.c | 12 ++++++++++++ include/linux/mmc/card.h | 3 +++ include/linux/mmc/mmc.h | 3 +++ 3 files changed, 18 insertions(+) diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index 79a0c26e1419..0e3d7152c8af 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -592,6 +592,12 @@ static int mmc_decode_ext_csd(struct mmc_card *card, u8 *ext_csd) card->ext_csd.ffu_capable = (ext_csd[EXT_CSD_SUPPORTED_MODE] & 0x1) && !(ext_csd[EXT_CSD_FW_CONFIG] & 0x1); + + card->ext_csd.pre_eol_info = ext_csd[EXT_CSD_PRE_EOL_INFO]; + card->ext_csd.device_life_time_est_typ_a = + ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]; + card->ext_csd.device_life_time_est_typ_b = + ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]; } out: return err; @@ -721,6 +727,10 @@ MMC_DEV_ATTR(manfid, "0x%06x\n", card->cid.manfid); MMC_DEV_ATTR(name, "%s\n", card->cid.prod_name); MMC_DEV_ATTR(oemid, "0x%04x\n", card->cid.oemid); MMC_DEV_ATTR(prv, "0x%x\n", card->cid.prv); +MMC_DEV_ATTR(pre_eol_info, "%02x\n", card->ext_csd.pre_eol_info); +MMC_DEV_ATTR(life_time, "0x%02x 0x%02x\n", + card->ext_csd.device_life_time_est_typ_a, + card->ext_csd.device_life_time_est_typ_b); MMC_DEV_ATTR(serial, "0x%08x\n", card->cid.serial); MMC_DEV_ATTR(enhanced_area_offset, "%llu\n", card->ext_csd.enhanced_area_offset); @@ -757,6 +767,8 @@ static struct attribute *mmc_std_attrs[] = { &dev_attr_name.attr, &dev_attr_oemid.attr, &dev_attr_prv.attr, + &dev_attr_pre_eol_info.attr, + &dev_attr_life_time.attr, &dev_attr_serial.attr, &dev_attr_enhanced_area_offset.attr, &dev_attr_enhanced_area_size.attr, diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h index eb0151bac50c..8f23fb2c5ed2 100644 --- a/include/linux/mmc/card.h +++ b/include/linux/mmc/card.h @@ -118,6 +118,9 @@ struct mmc_ext_csd { u8 raw_pwr_cl_ddr_200_360; /* 253 */ u8 raw_bkops_status; /* 246 */ u8 raw_sectors[4]; /* 212 - 4 bytes */ + u8 pre_eol_info; /* 267 */ + u8 device_life_time_est_typ_a; /* 268 */ + u8 device_life_time_est_typ_b; /* 269 */ unsigned int feature_support; #define MMC_DISCARD_FEATURE BIT(0) /* CMD38 feature */ diff --git a/include/linux/mmc/mmc.h b/include/linux/mmc/mmc.h index 15f2c4a0a62c..2c6b1d45626e 100644 --- a/include/linux/mmc/mmc.h +++ b/include/linux/mmc/mmc.h @@ -330,6 +330,9 @@ struct _mmc_csd { #define EXT_CSD_CACHE_SIZE 249 /* RO, 4 bytes */ #define EXT_CSD_PWR_CL_DDR_200_360 253 /* RO */ #define EXT_CSD_FIRMWARE_VERSION 254 /* RO, 8 bytes */ +#define EXT_CSD_PRE_EOL_INFO 267 /* RO */ +#define EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A 268 /* RO */ +#define EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B 269 /* RO */ #define EXT_CSD_SUPPORTED_MODE 493 /* RO */ #define EXT_CSD_TAG_UNIT_SIZE 498 /* RO */ #define EXT_CSD_DATA_TAG_SUPPORT 499 /* RO */ From 6b3788fb9405844bb67004bc6aa0a27073903eff Mon Sep 17 00:00:00 2001 From: Jin Qian Date: Tue, 21 Mar 2017 11:43:02 -0700 Subject: [PATCH 272/521] ANDROID: mmc: core: export emmc revision Change-Id: If23bc838327ef751ee65baadc429e218eaa2a848 Signed-off-by: Jin Qian --- drivers/mmc/core/mmc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index 0e3d7152c8af..83b9bf880834 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -727,6 +727,7 @@ MMC_DEV_ATTR(manfid, "0x%06x\n", card->cid.manfid); MMC_DEV_ATTR(name, "%s\n", card->cid.prod_name); MMC_DEV_ATTR(oemid, "0x%04x\n", card->cid.oemid); MMC_DEV_ATTR(prv, "0x%x\n", card->cid.prv); +MMC_DEV_ATTR(rev, "0x%x\n", card->ext_csd.rev); MMC_DEV_ATTR(pre_eol_info, "%02x\n", card->ext_csd.pre_eol_info); MMC_DEV_ATTR(life_time, "0x%02x 0x%02x\n", card->ext_csd.device_life_time_est_typ_a, @@ -767,6 +768,7 @@ static struct attribute *mmc_std_attrs[] = { &dev_attr_name.attr, &dev_attr_oemid.attr, &dev_attr_prv.attr, + &dev_attr_rev.attr, &dev_attr_pre_eol_info.attr, &dev_attr_life_time.attr, &dev_attr_serial.attr, From f2a4a926f814066843d083bc3555cc1c86c03b3b Mon Sep 17 00:00:00 2001 From: Joel Scherpelz Date: Wed, 22 Mar 2017 18:19:04 +0900 Subject: [PATCH 273/521] net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs. This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). [backport of net-next bbea124bc99df968011e76eba105fe964a4eceab] Bug: 33333670 Test: net_test passes Signed-off-by: Joel Scherpelz Acked-by: Lorenzo Colitti Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 13 +++++++++++-- include/linux/ipv6.h | 1 + include/uapi/linux/ipv6.h | 10 ++++++++++ include/uapi/linux/sysctl.h | 1 + net/ipv6/addrconf.c | 10 ++++++++++ net/ipv6/ndisc.c | 2 ++ 6 files changed, 35 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 2042261408b9..5f1ea84ed72b 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1413,11 +1413,20 @@ accept_ra_pinfo - BOOLEAN Functional default: enabled if accept_ra is enabled. disabled if accept_ra is disabled. +accept_ra_rt_info_min_plen - INTEGER + Minimum prefix length of Route Information in RA. + + Route Information w/ prefix smaller than this variable shall + be ignored. + + Functional default: 0 if accept_ra_rtr_pref is enabled. + -1 if accept_ra_rtr_pref is disabled. + accept_ra_rt_info_max_plen - INTEGER Maximum prefix length of Route Information in RA. - Route Information w/ prefix larger than or equal to this - variable shall be ignored. + Route Information w/ prefix larger than this variable shall + be ignored. Functional default: 0 if accept_ra_rtr_pref is enabled. -1 if accept_ra_rtr_pref is disabled. diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index ce777260e9ea..1182f0e21697 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -36,6 +36,7 @@ struct ipv6_devconf { __s32 accept_ra_rtr_pref; __s32 rtr_probe_interval; #ifdef CONFIG_IPV6_ROUTE_INFO + __s32 accept_ra_rt_info_min_plen; __s32 accept_ra_rt_info_max_plen; #endif #endif diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h index 2b1533859749..c462f1dc175e 100644 --- a/include/uapi/linux/ipv6.h +++ b/include/uapi/linux/ipv6.h @@ -175,6 +175,16 @@ enum { DEVCONF_USE_OIF_ADDRS_ONLY, DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT, DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN, + DEVCONF_DROP_UNICAST_IN_L2_MULTICAST, + DEVCONF_DROP_UNSOLICITED_NA, + DEVCONF_KEEP_ADDR_ON_DOWN, + DEVCONF_RTR_SOLICIT_MAX_INTERVAL, + DEVCONF_SEG6_ENABLED, + DEVCONF_SEG6_REQUIRE_HMAC, + DEVCONF_ENHANCED_DAD, + DEVCONF_ADDR_GEN_MODE, + DEVCONF_DISABLE_POLICY, + DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN, DEVCONF_MAX }; diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h index 0956373b56db..d18980e74534 100644 --- a/include/uapi/linux/sysctl.h +++ b/include/uapi/linux/sysctl.h @@ -570,6 +570,7 @@ enum { NET_IPV6_PROXY_NDP=23, NET_IPV6_ACCEPT_SOURCE_ROUTE=25, NET_IPV6_ACCEPT_RA_FROM_LOCAL=26, + NET_IPV6_ACCEPT_RA_RT_INFO_MIN_PLEN=27, __NET_IPV6_MAX }; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 498a664b8dc9..860ffbe778cf 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -202,6 +202,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .accept_ra_rtr_pref = 1, .rtr_probe_interval = 60 * HZ, #ifdef CONFIG_IPV6_ROUTE_INFO + .accept_ra_rt_info_min_plen = 0, .accept_ra_rt_info_max_plen = 0, #endif #endif @@ -247,6 +248,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .accept_ra_rtr_pref = 1, .rtr_probe_interval = 60 * HZ, #ifdef CONFIG_IPV6_ROUTE_INFO + .accept_ra_rt_info_min_plen = 0, .accept_ra_rt_info_max_plen = 0, #endif #endif @@ -4689,6 +4691,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, array[DEVCONF_RTR_PROBE_INTERVAL] = jiffies_to_msecs(cnf->rtr_probe_interval); #ifdef CONFIG_IPV6_ROUTE_INFO + array[DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN] = cnf->accept_ra_rt_info_min_plen; array[DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN] = cnf->accept_ra_rt_info_max_plen; #endif #endif @@ -5648,6 +5651,13 @@ static struct addrconf_sysctl_table .proc_handler = proc_dointvec_jiffies, }, #ifdef CONFIG_IPV6_ROUTE_INFO + { + .procname = "accept_ra_rt_info_min_plen", + .data = &ipv6_devconf.accept_ra_rt_info_min_plen, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, { .procname = "accept_ra_rt_info_max_plen", .data = &ipv6_devconf.accept_ra_rt_info_max_plen, diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 84afb9a77278..3452f9037ad4 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1358,6 +1358,8 @@ static void ndisc_router_discovery(struct sk_buff *skb) if (ri->prefix_len == 0 && !in6_dev->cnf.accept_ra_defrtr) continue; + if (ri->prefix_len < in6_dev->cnf.accept_ra_rt_info_min_plen) + continue; if (ri->prefix_len > in6_dev->cnf.accept_ra_rt_info_max_plen) continue; rt6_route_rcv(skb->dev, (u8 *)p, (p->nd_opt_len) << 3, From 82b6efc31b3571b46673f0b2267895b8a7892f73 Mon Sep 17 00:00:00 2001 From: Chenbo Feng Date: Thu, 23 Mar 2017 13:51:24 -0700 Subject: [PATCH 274/521] fix the deadlock in xt_qtaguid when enable DDEBUG When DDEBUG is enabled, the prdebug_full_state() function will try to recursively aquire the spinlock of sock_tag_list and causing deadlock. A check statement is added before it aquire the spinlock to differentiate the behavior depend on the caller of the function. Bug: 36559739 Test: Compile and run test under system/extra/test/iptables/ Change-Id: Ie3397fbaa207e14fe214d47aaf5e8ca1f4a712ee Signed-off-by: Chenbo Feng --- net/netfilter/xt_qtaguid.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/net/netfilter/xt_qtaguid.c b/net/netfilter/xt_qtaguid.c index 3bf0c59dab2f..0f5628a59917 100644 --- a/net/netfilter/xt_qtaguid.c +++ b/net/netfilter/xt_qtaguid.c @@ -1814,8 +1814,11 @@ static bool qtaguid_mt(const struct sk_buff *skb, struct xt_action_param *par) } #ifdef DDEBUG -/* This function is not in xt_qtaguid_print.c because of locks visibility */ -static void prdebug_full_state(int indent_level, const char *fmt, ...) +/* + * This function is not in xt_qtaguid_print.c because of locks visibility. + * The lock of sock_tag_list must be aquired before calling this function + */ +static void prdebug_full_state_locked(int indent_level, const char *fmt, ...) { va_list args; char *fmt_buff; @@ -1836,16 +1839,12 @@ static void prdebug_full_state(int indent_level, const char *fmt, ...) kfree(buff); va_end(args); - spin_lock_bh(&sock_tag_list_lock); prdebug_sock_tag_tree(indent_level, &sock_tag_tree); - spin_unlock_bh(&sock_tag_list_lock); - spin_lock_bh(&sock_tag_list_lock); spin_lock_bh(&uid_tag_data_tree_lock); prdebug_uid_tag_data_tree(indent_level, &uid_tag_data_tree); prdebug_proc_qtu_data_tree(indent_level, &proc_qtu_data_tree); spin_unlock_bh(&uid_tag_data_tree_lock); - spin_unlock_bh(&sock_tag_list_lock); spin_lock_bh(&iface_stat_list_lock); prdebug_iface_stat_list(indent_level, &iface_stat_list); @@ -1854,7 +1853,7 @@ static void prdebug_full_state(int indent_level, const char *fmt, ...) pr_debug("qtaguid: %s(): }\n", __func__); } #else -static void prdebug_full_state(int indent_level, const char *fmt, ...) {} +static void prdebug_full_state_locked(int indent_level, const char *fmt, ...) {} #endif struct proc_ctrl_print_info { @@ -1977,8 +1976,11 @@ static int qtaguid_ctrl_proc_show(struct seq_file *m, void *v) (u64)atomic64_read(&qtu_events.match_no_sk), (u64)atomic64_read(&qtu_events.match_no_sk_file)); - /* Count the following as part of the last item_index */ - prdebug_full_state(0, "proc ctrl"); + /* Count the following as part of the last item_index. No need + * to lock the sock_tag_list here since it is already locked when + * starting the seq_file operation + */ + prdebug_full_state_locked(0, "proc ctrl"); } return 0; @@ -2887,8 +2889,10 @@ static int qtudev_release(struct inode *inode, struct file *file) sock_tag_tree_erase(&st_to_free_tree); - prdebug_full_state(0, "%s(): pid=%u tgid=%u", __func__, + spin_lock_bh(&sock_tag_list_lock); + prdebug_full_state_locked(0, "%s(): pid=%u tgid=%u", __func__, current->pid, current->tgid); + spin_unlock_bh(&sock_tag_list_lock); return 0; } From 0084d2b319df47847516b7aab2d814c35100ade1 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Tue, 21 Mar 2017 16:28:27 -0700 Subject: [PATCH 275/521] ANDROID: sdcardfs: correct order of descriptors Signed-off-by: Daniel Rosenberg Bug: 35331000 Change-Id: Ia6d16b19c8c911f41231d2a12be0740057edfacf --- fs/sdcardfs/packagelist.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/sdcardfs/packagelist.c b/fs/sdcardfs/packagelist.c index e72fe83f7837..5398ebf8a34b 100644 --- a/fs/sdcardfs/packagelist.c +++ b/fs/sdcardfs/packagelist.c @@ -48,12 +48,14 @@ static struct kmem_cache *hashtable_entry_cachep; static unsigned int full_name_case_hash(const unsigned char *name, unsigned int len) { unsigned long hash = init_name_hash(); + while (len--) hash = partial_name_hash(tolower(*name++), hash); return end_name_hash(hash); } -static void inline qstr_init(struct qstr *q, const char *name) { +static inline void qstr_init(struct qstr *q, const char *name) +{ q->name = name; q->len = strlen(q->name); q->hash = full_name_case_hash(q->name, q->len); From 8fed3ac5ace5f07067b1877c7bdc33db6034fbf8 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 16 Mar 2017 17:42:58 -0700 Subject: [PATCH 276/521] ANDROID: sdcardfs: Fix formatting This fixes various spacing and bracket related issues pointed out by checkpatch. Signed-off-by: Daniel Rosenberg Bug: 35331000 Change-Id: I6e248833a7a04e3899f3ae9462d765cfcaa70c96 --- fs/sdcardfs/dentry.c | 13 +- fs/sdcardfs/derived_perm.c | 242 +++++++++++++++++++------------------ fs/sdcardfs/file.c | 5 +- fs/sdcardfs/inode.c | 50 ++++---- fs/sdcardfs/lookup.c | 32 ++--- fs/sdcardfs/main.c | 34 ++++-- fs/sdcardfs/packagelist.c | 43 ++++--- fs/sdcardfs/sdcardfs.h | 106 ++++++++-------- fs/sdcardfs/super.c | 29 +++-- 9 files changed, 300 insertions(+), 254 deletions(-) diff --git a/fs/sdcardfs/dentry.c b/fs/sdcardfs/dentry.c index 64494a50e250..aaa15dff6f51 100644 --- a/fs/sdcardfs/dentry.c +++ b/fs/sdcardfs/dentry.c @@ -46,7 +46,8 @@ static int sdcardfs_d_revalidate(struct dentry *dentry, unsigned int flags) spin_unlock(&dentry->d_lock); /* check uninitialized obb_dentry and - * whether the base obbpath has been changed or not */ + * whether the base obbpath has been changed or not + */ if (is_obbpath_invalid(dentry)) { d_drop(dentry); return 0; @@ -106,12 +107,10 @@ static int sdcardfs_d_revalidate(struct dentry *dentry, unsigned int flags) static void sdcardfs_d_release(struct dentry *dentry) { /* release and reset the lower paths */ - if(has_graft_path(dentry)) { + if (has_graft_path(dentry)) sdcardfs_put_reset_orig_path(dentry); - } sdcardfs_put_reset_lower_path(dentry); free_dentry_private_data(dentry); - return; } static int sdcardfs_hash_ci(const struct dentry *dentry, @@ -168,14 +167,16 @@ static int sdcardfs_cmp_ci(const struct dentry *parent, return 1; } -static void sdcardfs_canonical_path(const struct path *path, struct path *actual_path) { +static void sdcardfs_canonical_path(const struct path *path, + struct path *actual_path) +{ sdcardfs_get_real_lower(path->dentry, actual_path); } const struct dentry_operations sdcardfs_ci_dops = { .d_revalidate = sdcardfs_d_revalidate, .d_release = sdcardfs_d_release, - .d_hash = sdcardfs_hash_ci, + .d_hash = sdcardfs_hash_ci, .d_compare = sdcardfs_cmp_ci, .d_canonical_path = sdcardfs_canonical_path, }; diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index 28e9b8d42f9e..51fa565742bd 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -37,7 +37,8 @@ static void inherit_derived_state(struct inode *parent, struct inode *child) /* helper function for derived state */ void setup_derived_state(struct inode *inode, perm_t perm, userid_t userid, - uid_t uid, bool under_android, struct inode *top) + uid_t uid, bool under_android, + struct inode *top) { struct sdcardfs_inode_info *info = SDCARDFS_I(inode); @@ -50,11 +51,14 @@ void setup_derived_state(struct inode *inode, perm_t perm, userid_t userid, set_top(info, top); } -/* While renaming, there is a point where we want the path from dentry, but the name from newdentry */ -void get_derived_permission_new(struct dentry *parent, struct dentry *dentry, const struct qstr *name) +/* While renaming, there is a point where we want the path from dentry, + * but the name from newdentry + */ +void get_derived_permission_new(struct dentry *parent, struct dentry *dentry, + const struct qstr *name) { struct sdcardfs_inode_info *info = SDCARDFS_I(d_inode(dentry)); - struct sdcardfs_inode_info *parent_info= SDCARDFS_I(d_inode(parent)); + struct sdcardfs_inode_info *parent_info = SDCARDFS_I(d_inode(parent)); appid_t appid; struct qstr q_Android = QSTR_LITERAL("Android"); struct qstr q_data = QSTR_LITERAL("data"); @@ -77,58 +81,57 @@ void get_derived_permission_new(struct dentry *parent, struct dentry *dentry, co return; /* Derive custom permissions based on parent and current node */ switch (parent_info->perm) { - case PERM_INHERIT: - case PERM_ANDROID_PACKAGE_CACHE: - /* Already inherited above */ - break; - case PERM_PRE_ROOT: - /* Legacy internal layout places users at top level */ - info->perm = PERM_ROOT; - info->userid = simple_strtoul(name->name, NULL, 10); + case PERM_INHERIT: + case PERM_ANDROID_PACKAGE_CACHE: + /* Already inherited above */ + break; + case PERM_PRE_ROOT: + /* Legacy internal layout places users at top level */ + info->perm = PERM_ROOT; + info->userid = simple_strtoul(name->name, NULL, 10); + set_top(info, &info->vfs_inode); + break; + case PERM_ROOT: + /* Assume masked off by default. */ + if (qstr_case_eq(name, &q_Android)) { + /* App-specific directories inside; let anyone traverse */ + info->perm = PERM_ANDROID; + info->under_android = true; set_top(info, &info->vfs_inode); - break; - case PERM_ROOT: - /* Assume masked off by default. */ - if (qstr_case_eq(name, &q_Android)) { - /* App-specific directories inside; let anyone traverse */ - info->perm = PERM_ANDROID; - info->under_android = true; - set_top(info, &info->vfs_inode); - } - break; - case PERM_ANDROID: - if (qstr_case_eq(name, &q_data)) { - /* App-specific directories inside; let anyone traverse */ - info->perm = PERM_ANDROID_DATA; - set_top(info, &info->vfs_inode); - } else if (qstr_case_eq(name, &q_obb)) { - /* App-specific directories inside; let anyone traverse */ - info->perm = PERM_ANDROID_OBB; - info->under_obb = true; - set_top(info, &info->vfs_inode); - /* Single OBB directory is always shared */ - } else if (qstr_case_eq(name, &q_media)) { - /* App-specific directories inside; let anyone traverse */ - info->perm = PERM_ANDROID_MEDIA; - set_top(info, &info->vfs_inode); - } - break; - case PERM_ANDROID_OBB: - case PERM_ANDROID_DATA: - case PERM_ANDROID_MEDIA: - info->perm = PERM_ANDROID_PACKAGE; - appid = get_appid(name->name); - if (appid != 0 && !is_excluded(name->name, parent_info->userid)) { - info->d_uid = multiuser_get_uid(parent_info->userid, appid); - } + } + break; + case PERM_ANDROID: + if (qstr_case_eq(name, &q_data)) { + /* App-specific directories inside; let anyone traverse */ + info->perm = PERM_ANDROID_DATA; set_top(info, &info->vfs_inode); - break; - case PERM_ANDROID_PACKAGE: - if (qstr_case_eq(name, &q_cache)) { - info->perm = PERM_ANDROID_PACKAGE_CACHE; - info->under_cache = true; - } - break; + } else if (qstr_case_eq(name, &q_obb)) { + /* App-specific directories inside; let anyone traverse */ + info->perm = PERM_ANDROID_OBB; + info->under_obb = true; + set_top(info, &info->vfs_inode); + /* Single OBB directory is always shared */ + } else if (qstr_case_eq(name, &q_media)) { + /* App-specific directories inside; let anyone traverse */ + info->perm = PERM_ANDROID_MEDIA; + set_top(info, &info->vfs_inode); + } + break; + case PERM_ANDROID_OBB: + case PERM_ANDROID_DATA: + case PERM_ANDROID_MEDIA: + info->perm = PERM_ANDROID_PACKAGE; + appid = get_appid(name->name); + if (appid != 0 && !is_excluded(name->name, parent_info->userid)) + info->d_uid = multiuser_get_uid(parent_info->userid, appid); + set_top(info, &info->vfs_inode); + break; + case PERM_ANDROID_PACKAGE: + if (qstr_case_eq(name, &q_cache)) { + info->perm = PERM_ANDROID_PACKAGE_CACHE; + info->under_cache = true; + } + break; } } @@ -137,7 +140,8 @@ void get_derived_permission(struct dentry *parent, struct dentry *dentry) get_derived_permission_new(parent, dentry, &dentry->d_name); } -static appid_t get_type(const char *name) { +static appid_t get_type(const char *name) +{ const char *ext = strrchr(name, '.'); appid_t id; @@ -149,7 +153,8 @@ static appid_t get_type(const char *name) { return AID_MEDIA_RW; } -void fixup_lower_ownership(struct dentry* dentry, const char *name) { +void fixup_lower_ownership(struct dentry *dentry, const char *name) +{ struct path path; struct inode *inode; struct inode *delegated_inode = NULL; @@ -175,49 +180,49 @@ void fixup_lower_ownership(struct dentry* dentry, const char *name) { } switch (perm) { - case PERM_ROOT: - case PERM_ANDROID: - case PERM_ANDROID_DATA: - case PERM_ANDROID_MEDIA: - case PERM_ANDROID_PACKAGE: - case PERM_ANDROID_PACKAGE_CACHE: - uid = multiuser_get_uid(info->userid, uid); - break; - case PERM_ANDROID_OBB: - uid = AID_MEDIA_OBB; - break; - case PERM_PRE_ROOT: - default: - break; + case PERM_ROOT: + case PERM_ANDROID: + case PERM_ANDROID_DATA: + case PERM_ANDROID_MEDIA: + case PERM_ANDROID_PACKAGE: + case PERM_ANDROID_PACKAGE_CACHE: + uid = multiuser_get_uid(info->userid, uid); + break; + case PERM_ANDROID_OBB: + uid = AID_MEDIA_OBB; + break; + case PERM_PRE_ROOT: + default: + break; } switch (perm) { - case PERM_ROOT: - case PERM_ANDROID: - case PERM_ANDROID_DATA: - case PERM_ANDROID_MEDIA: - if (S_ISDIR(d_inode(dentry)->i_mode)) - gid = multiuser_get_uid(info->userid, AID_MEDIA_RW); - else - gid = multiuser_get_uid(info->userid, get_type(name)); - break; - case PERM_ANDROID_OBB: - gid = AID_MEDIA_OBB; - break; - case PERM_ANDROID_PACKAGE: - if (info->d_uid != 0) - gid = multiuser_get_ext_gid(info->d_uid); - else - gid = multiuser_get_uid(info->userid, uid); - break; - case PERM_ANDROID_PACKAGE_CACHE: - if (info->d_uid != 0) - gid = multiuser_get_cache_gid(info->d_uid); - else - gid = multiuser_get_uid(info->userid, uid); - break; - case PERM_PRE_ROOT: - default: - break; + case PERM_ROOT: + case PERM_ANDROID: + case PERM_ANDROID_DATA: + case PERM_ANDROID_MEDIA: + if (S_ISDIR(d_inode(dentry)->i_mode)) + gid = multiuser_get_uid(info->userid, AID_MEDIA_RW); + else + gid = multiuser_get_uid(info->userid, get_type(name)); + break; + case PERM_ANDROID_OBB: + gid = AID_MEDIA_OBB; + break; + case PERM_ANDROID_PACKAGE: + if (info->d_uid != 0) + gid = multiuser_get_ext_gid(info->d_uid); + else + gid = multiuser_get_uid(info->userid, uid); + break; + case PERM_ANDROID_PACKAGE_CACHE: + if (info->d_uid != 0) + gid = multiuser_get_cache_gid(info->d_uid); + else + gid = multiuser_get_uid(info->userid, uid); + break; + case PERM_PRE_ROOT: + default: + break; } sdcardfs_get_lower_path(dentry, &path); @@ -246,7 +251,8 @@ void fixup_lower_ownership(struct dentry* dentry, const char *name) { sdcardfs_put_lower_path(dentry, &path); } -static int descendant_may_need_fixup(struct sdcardfs_inode_info *info, struct limit_search *limit) { +static int descendant_may_need_fixup(struct sdcardfs_inode_info *info, struct limit_search *limit) +{ if (info->perm == PERM_ROOT) return (limit->flags & BY_USERID)?info->userid == limit->userid:1; if (info->perm == PERM_PRE_ROOT || info->perm == PERM_ANDROID) @@ -254,7 +260,8 @@ static int descendant_may_need_fixup(struct sdcardfs_inode_info *info, struct li return 0; } -static int needs_fixup(perm_t perm) { +static int needs_fixup(perm_t perm) +{ if (perm == PERM_ANDROID_DATA || perm == PERM_ANDROID_OBB || perm == PERM_ANDROID_MEDIA) return 1; @@ -292,9 +299,9 @@ static void __fixup_perms_recursive(struct dentry *dentry, struct limit_search * } spin_unlock(&child->d_lock); } - } else if (descendant_may_need_fixup(info, limit)) { + } else if (descendant_may_need_fixup(info, limit)) { list_for_each_entry(child, &dentry->d_subdirs, d_child) { - __fixup_perms_recursive(child, limit, depth + 1); + __fixup_perms_recursive(child, limit, depth + 1); } } spin_unlock(&dentry->d_lock); @@ -310,7 +317,7 @@ inline void update_derived_permission_lock(struct dentry *dentry) { struct dentry *parent; - if(!dentry || !d_inode(dentry)) { + if (!dentry || !d_inode(dentry)) { printk(KERN_ERR "sdcardfs: %s: invalid dentry\n", __func__); return; } @@ -318,11 +325,11 @@ inline void update_derived_permission_lock(struct dentry *dentry) * 1. need to check whether the dentry is updated or not * 2. remove the root dentry update */ - if(IS_ROOT(dentry)) { + if (IS_ROOT(dentry)) { //setup_default_pre_root_state(d_inode(dentry)); } else { parent = dget_parent(dentry); - if(parent) { + if (parent) { get_derived_permission(parent, dentry); dput(parent); } @@ -334,15 +341,15 @@ int need_graft_path(struct dentry *dentry) { int ret = 0; struct dentry *parent = dget_parent(dentry); - struct sdcardfs_inode_info *parent_info= SDCARDFS_I(d_inode(parent)); + struct sdcardfs_inode_info *parent_info = SDCARDFS_I(d_inode(parent)); struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb); struct qstr obb = QSTR_LITERAL("obb"); - if(parent_info->perm == PERM_ANDROID && + if (parent_info->perm == PERM_ANDROID && qstr_case_eq(&dentry->d_name, &obb)) { /* /Android/obb is the base obbpath of DERIVED_UNIFIED */ - if(!(sbi->options.multiuser == false + if (!(sbi->options.multiuser == false && parent_info->userid == 0)) { ret = 1; } @@ -362,17 +369,18 @@ int is_obbpath_invalid(struct dentry *dent) /* check the base obbpath has been changed. * this routine can check an uninitialized obb dentry as well. - * regarding the uninitialized obb, refer to the sdcardfs_mkdir() */ + * regarding the uninitialized obb, refer to the sdcardfs_mkdir() + */ spin_lock(&di->lock); - if(di->orig_path.dentry) { - if(!di->lower_path.dentry) { + if (di->orig_path.dentry) { + if (!di->lower_path.dentry) { ret = 1; } else { path_get(&di->lower_path); //lower_parent = lock_parent(lower_path->dentry); path_buf = kmalloc(PATH_MAX, GFP_ATOMIC); - if(!path_buf) { + if (!path_buf) { ret = 1; printk(KERN_ERR "sdcardfs: fail to allocate path_buf in %s.\n", __func__); } else { @@ -399,13 +407,13 @@ int is_base_obbpath(struct dentry *dentry) { int ret = 0; struct dentry *parent = dget_parent(dentry); - struct sdcardfs_inode_info *parent_info= SDCARDFS_I(d_inode(parent)); + struct sdcardfs_inode_info *parent_info = SDCARDFS_I(d_inode(parent)); struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb); struct qstr q_obb = QSTR_LITERAL("obb"); spin_lock(&SDCARDFS_D(dentry)->lock); if (sbi->options.multiuser) { - if(parent_info->perm == PERM_PRE_ROOT && + if (parent_info->perm == PERM_PRE_ROOT && qstr_case_eq(&dentry->d_name, &q_obb)) { ret = 1; } @@ -420,7 +428,8 @@ int is_base_obbpath(struct dentry *dentry) /* The lower_path will be stored to the dentry's orig_path * and the base obbpath will be copyed to the lower_path variable. * if an error returned, there's no change in the lower_path - * returns: -ERRNO if error (0: no error) */ + * returns: -ERRNO if error (0: no error) + */ int setup_obb_dentry(struct dentry *dentry, struct path *lower_path) { int err = 0; @@ -435,7 +444,7 @@ int setup_obb_dentry(struct dentry *dentry, struct path *lower_path) err = kern_path(sbi->obbpath_s, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &obbpath); - if(!err) { + if (!err) { /* the obbpath base has been found */ pathcpy(lower_path, &obbpath); } else { @@ -443,7 +452,8 @@ int setup_obb_dentry(struct dentry *dentry, struct path *lower_path) * setup the lower_path with its orig_path. * but, the current implementation just returns an error * because the sdcard daemon also regards this case as - * a lookup fail. */ + * a lookup fail. + */ printk(KERN_INFO "sdcardfs: the sbi->obbpath is not available\n"); } return err; diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c index 0592facef704..4c9726606c0f 100644 --- a/fs/sdcardfs/file.c +++ b/fs/sdcardfs/file.c @@ -216,7 +216,7 @@ static int sdcardfs_open(struct inode *inode, struct file *file) goto out_err; } - if(!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { + if (!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { err = -EACCES; goto out_err; } @@ -248,9 +248,8 @@ static int sdcardfs_open(struct inode *inode, struct file *file) if (err) kfree(SDCARDFS_F(file)); - else { + else sdcardfs_copy_and_fix_attrs(inode, sdcardfs_lower_inode(inode)); - } out_revert_cred: REVERT_CRED(saved_cred); diff --git a/fs/sdcardfs/inode.c b/fs/sdcardfs/inode.c index f19f11ea19fc..ce187d865f29 100644 --- a/fs/sdcardfs/inode.c +++ b/fs/sdcardfs/inode.c @@ -23,10 +23,10 @@ #include /* Do not directly use this function. Use OVERRIDE_CRED() instead. */ -const struct cred * override_fsids(struct sdcardfs_sb_info* sbi, struct sdcardfs_inode_info *info) +const struct cred *override_fsids(struct sdcardfs_sb_info *sbi, struct sdcardfs_inode_info *info) { - struct cred * cred; - const struct cred * old_cred; + struct cred *cred; + const struct cred *old_cred; uid_t uid; cred = prepare_creds(); @@ -46,9 +46,9 @@ const struct cred * override_fsids(struct sdcardfs_sb_info* sbi, struct sdcardfs } /* Do not directly use this function, use REVERT_CRED() instead. */ -void revert_fsids(const struct cred * old_cred) +void revert_fsids(const struct cred *old_cred) { - const struct cred * cur_cred; + const struct cred *cur_cred; cur_cred = current->cred; revert_creds(old_cred); @@ -67,7 +67,7 @@ static int sdcardfs_create(struct inode *dir, struct dentry *dentry, struct fs_struct *saved_fs; struct fs_struct *copied_fs; - if(!check_caller_access_to_name(dir, &dentry->d_name)) { + if (!check_caller_access_to_name(dir, &dentry->d_name)) { err = -EACCES; goto out_eacces; } @@ -166,7 +166,7 @@ static int sdcardfs_unlink(struct inode *dir, struct dentry *dentry) struct path lower_path; const struct cred *saved_cred = NULL; - if(!check_caller_access_to_name(dir, &dentry->d_name)) { + if (!check_caller_access_to_name(dir, &dentry->d_name)) { err = -EACCES; goto out_eacces; } @@ -240,13 +240,14 @@ static int sdcardfs_symlink(struct inode *dir, struct dentry *dentry, } #endif -static int touch(char *abs_path, mode_t mode) { +static int touch(char *abs_path, mode_t mode) +{ struct file *filp = filp_open(abs_path, O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW, mode); + if (IS_ERR(filp)) { if (PTR_ERR(filp) == -EEXIST) { return 0; - } - else { + } else { printk(KERN_ERR "sdcardfs: failed to open(%s): %ld\n", abs_path, PTR_ERR(filp)); return PTR_ERR(filp); @@ -273,7 +274,7 @@ static int sdcardfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode struct qstr q_obb = QSTR_LITERAL("obb"); struct qstr q_data = QSTR_LITERAL("data"); - if(!check_caller_access_to_name(dir, &dentry->d_name)) { + if (!check_caller_access_to_name(dir, &dentry->d_name)) { err = -EACCES; goto out_eacces; } @@ -315,19 +316,21 @@ static int sdcardfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode } /* if it is a local obb dentry, setup it with the base obbpath */ - if(need_graft_path(dentry)) { + if (need_graft_path(dentry)) { err = setup_obb_dentry(dentry, &lower_path); - if(err) { + if (err) { /* if the sbi->obbpath is not available, the lower_path won't be * changed by setup_obb_dentry() but the lower path is saved to * its orig_path. this dentry will be revalidated later. - * but now, the lower_path should be NULL */ + * but now, the lower_path should be NULL + */ sdcardfs_put_reset_lower_path(dentry); /* the newly created lower path which saved to its orig_path or * the lower_path is the base obbpath. - * therefore, an additional path_get is required */ + * therefore, an additional path_get is required + */ path_get(&lower_path); } else make_nomedia_in_obb = 1; @@ -382,7 +385,7 @@ static int sdcardfs_rmdir(struct inode *dir, struct dentry *dentry) struct path lower_path; const struct cred *saved_cred = NULL; - if(!check_caller_access_to_name(dir, &dentry->d_name)) { + if (!check_caller_access_to_name(dir, &dentry->d_name)) { err = -EACCES; goto out_eacces; } @@ -391,7 +394,8 @@ static int sdcardfs_rmdir(struct inode *dir, struct dentry *dentry) OVERRIDE_CRED(SDCARDFS_SB(dir->i_sb), saved_cred, SDCARDFS_I(dir)); /* sdcardfs_get_real_lower(): in case of remove an user's obb dentry - * the dentry on the original path should be deleted. */ + * the dentry on the original path should be deleted. + */ sdcardfs_get_real_lower(dentry, &lower_path); lower_dentry = lower_path.dentry; @@ -467,7 +471,7 @@ static int sdcardfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct path lower_old_path, lower_new_path; const struct cred *saved_cred = NULL; - if(!check_caller_access_to_name(old_dir, &old_dentry->d_name) || + if (!check_caller_access_to_name(old_dir, &old_dentry->d_name) || !check_caller_access_to_name(new_dir, &new_dentry->d_name)) { err = -EACCES; goto out_eacces; @@ -656,6 +660,7 @@ static int sdcardfs_permission(struct vfsmount *mnt, struct inode *inode, int ma * we check it with AID_MEDIA_RW permission */ struct inode *lower_inode; + OVERRIDE_CRED(SDCARDFS_SB(inode->sb)); lower_inode = sdcardfs_lower_inode(inode); @@ -724,7 +729,8 @@ static int sdcardfs_setattr(struct vfsmount *mnt, struct dentry *dentry, struct /* prepare our own lower struct iattr (with the lower file) */ memcpy(&lower_ia, ia, sizeof(lower_ia)); /* Allow touch updating timestamps. A previous permission check ensures - * we have write access. Changes to mode, owner, and group are ignored*/ + * we have write access. Changes to mode, owner, and group are ignored + */ ia->ia_valid |= ATTR_FORCE; err = inode_change_ok(&tmp, ia); @@ -810,10 +816,12 @@ static int sdcardfs_setattr(struct vfsmount *mnt, struct dentry *dentry, struct return err; } -static int sdcardfs_fillattr(struct vfsmount *mnt, struct inode *inode, struct kstat *stat) +static int sdcardfs_fillattr(struct vfsmount *mnt, + struct inode *inode, struct kstat *stat) { struct sdcardfs_inode_info *info = SDCARDFS_I(inode); struct inode *top = grab_top(info); + if (!top) return -EINVAL; @@ -843,7 +851,7 @@ static int sdcardfs_getattr(struct vfsmount *mnt, struct dentry *dentry, int err; parent = dget_parent(dentry); - if(!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { + if (!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { dput(parent); return -EACCES; } diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index fffb94c923c4..fab9fd9d0105 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -73,6 +73,7 @@ static int sdcardfs_inode_test(struct inode *inode, void *candidate_data/*void * { struct inode *current_lower_inode = sdcardfs_lower_inode(inode); userid_t current_userid = SDCARDFS_I(inode)->userid; + if (current_lower_inode == ((struct inode_data *)candidate_data)->lower_inode && current_userid == ((struct inode_data *)candidate_data)->id) return 1; /* found a match */ @@ -102,7 +103,7 @@ struct inode *sdcardfs_iget(struct super_block *sb, struct inode *lower_inode, u * instead. */ lower_inode->i_ino, /* hashval */ - sdcardfs_inode_test, /* inode comparison function */ + sdcardfs_inode_test, /* inode comparison function */ sdcardfs_inode_set, /* inode init function */ &data); /* data passed to test+set fxns */ if (!inode) { @@ -213,8 +214,8 @@ struct sdcardfs_name_data { bool found; }; -static int sdcardfs_name_match(struct dir_context *ctx, const char *name, int namelen, - loff_t offset, u64 ino, unsigned int d_type) +static int sdcardfs_name_match(struct dir_context *ctx, const char *name, + int namelen, loff_t offset, u64 ino, unsigned int d_type) { struct sdcardfs_name_data *buf = container_of(ctx, struct sdcardfs_name_data, ctx); struct qstr candidate = QSTR_INIT(name, namelen); @@ -303,23 +304,26 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, if (!err) { /* check if the dentry is an obb dentry * if true, the lower_inode must be replaced with - * the inode of the graft path */ + * the inode of the graft path + */ - if(need_graft_path(dentry)) { + if (need_graft_path(dentry)) { /* setup_obb_dentry() - * The lower_path will be stored to the dentry's orig_path + * The lower_path will be stored to the dentry's orig_path * and the base obbpath will be copyed to the lower_path variable. * if an error returned, there's no change in the lower_path - * returns: -ERRNO if error (0: no error) */ + * returns: -ERRNO if error (0: no error) + */ err = setup_obb_dentry(dentry, &lower_path); - if(err) { + if (err) { /* if the sbi->obbpath is not available, we can optionally * setup the lower_path with its orig_path. * but, the current implementation just returns an error * because the sdcard daemon also regards this case as - * a lookup fail. */ + * a lookup fail. + */ printk(KERN_INFO "sdcardfs: base obbpath is not available\n"); sdcardfs_put_reset_orig_path(dentry); goto out; @@ -374,9 +378,9 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, /* * On success: - * fills dentry object appropriate values and returns NULL. + * fills dentry object appropriate values and returns NULL. * On fail (== error) - * returns error ptr + * returns error ptr * * @dir : Parent inode. It is locked (dir->i_mutex) * @dentry : Target dentry to lookup. we should set each of fields. @@ -393,10 +397,10 @@ struct dentry *sdcardfs_lookup(struct inode *dir, struct dentry *dentry, parent = dget_parent(dentry); - if(!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { + if (!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) { ret = ERR_PTR(-EACCES); goto out_err; - } + } /* save current_cred and override it */ OVERRIDE_CRED_PTR(SDCARDFS_SB(dir->i_sb), saved_cred, SDCARDFS_I(dir)); @@ -412,9 +416,7 @@ struct dentry *sdcardfs_lookup(struct inode *dir, struct dentry *dentry, ret = __sdcardfs_lookup(dentry, flags, &lower_parent_path, SDCARDFS_I(dir)->userid); if (IS_ERR(ret)) - { goto out; - } if (ret) dentry = ret; if (d_inode(dentry)) { diff --git a/fs/sdcardfs/main.c b/fs/sdcardfs/main.c index 4e2aded8d1d9..c249bd73b121 100644 --- a/fs/sdcardfs/main.c +++ b/fs/sdcardfs/main.c @@ -72,6 +72,7 @@ static int parse_options(struct super_block *sb, char *options, int silent, while ((p = strsep(&options, ",")) != NULL) { int token; + if (!*p) continue; @@ -148,6 +149,7 @@ int parse_options_remount(struct super_block *sb, char *options, int silent, while ((p = strsep(&options, ",")) != NULL) { int token; + if (!*p) continue; @@ -223,8 +225,8 @@ static struct dentry *sdcardfs_d_alloc_root(struct super_block *sb) #endif DEFINE_MUTEX(sdcardfs_super_list_lock); -LIST_HEAD(sdcardfs_super_list); EXPORT_SYMBOL_GPL(sdcardfs_super_list_lock); +LIST_HEAD(sdcardfs_super_list); EXPORT_SYMBOL_GPL(sdcardfs_super_list); /* @@ -328,14 +330,15 @@ static int sdcardfs_read_super(struct vfsmount *mnt, struct super_block *sb, /* setup permission policy */ sb_info->obbpath_s = kzalloc(PATH_MAX, GFP_KERNEL); mutex_lock(&sdcardfs_super_list_lock); - if(sb_info->options.multiuser) { - setup_derived_state(d_inode(sb->s_root), PERM_PRE_ROOT, sb_info->options.fs_user_id, AID_ROOT, false, d_inode(sb->s_root)); + if (sb_info->options.multiuser) { + setup_derived_state(d_inode(sb->s_root), PERM_PRE_ROOT, + sb_info->options.fs_user_id, AID_ROOT, + false, d_inode(sb->s_root)); snprintf(sb_info->obbpath_s, PATH_MAX, "%s/obb", dev_name); - /*err = prepare_dir(sb_info->obbpath_s, - sb_info->options.fs_low_uid, - sb_info->options.fs_low_gid, 00755);*/ } else { - setup_derived_state(d_inode(sb->s_root), PERM_ROOT, sb_info->options.fs_user_id, AID_ROOT, false, d_inode(sb->s_root)); + setup_derived_state(d_inode(sb->s_root), PERM_ROOT, + sb_info->options.fs_user_id, AID_ROOT, + false, d_inode(sb->s_root)); snprintf(sb_info->obbpath_s, PATH_MAX, "%s/Android/obb", dev_name); } fixup_tmp_permissions(d_inode(sb->s_root)); @@ -368,8 +371,10 @@ static int sdcardfs_read_super(struct vfsmount *mnt, struct super_block *sb, /* A feature which supports mount_nodev() with options */ static struct dentry *mount_nodev_with_options(struct vfsmount *mnt, - struct file_system_type *fs_type, int flags, const char *dev_name, void *data, - int (*fill_super)(struct vfsmount *, struct super_block *, const char *, void *, int)) + struct file_system_type *fs_type, int flags, + const char *dev_name, void *data, + int (*fill_super)(struct vfsmount *, struct super_block *, + const char *, void *, int)) { int error; @@ -401,19 +406,22 @@ static struct dentry *sdcardfs_mount(struct vfsmount *mnt, raw_data, sdcardfs_read_super); } -static struct dentry *sdcardfs_mount_wrn(struct file_system_type *fs_type, int flags, - const char *dev_name, void *raw_data) +static struct dentry *sdcardfs_mount_wrn(struct file_system_type *fs_type, + int flags, const char *dev_name, void *raw_data) { WARN(1, "sdcardfs does not support mount. Use mount2.\n"); return ERR_PTR(-EINVAL); } -void *sdcardfs_alloc_mnt_data(void) { +void *sdcardfs_alloc_mnt_data(void) +{ return kmalloc(sizeof(struct sdcardfs_vfsmount_options), GFP_KERNEL); } -void sdcardfs_kill_sb(struct super_block *sb) { +void sdcardfs_kill_sb(struct super_block *sb) +{ struct sdcardfs_sb_info *sbi; + if (sb->s_magic == SDCARDFS_SUPER_MAGIC) { sbi = SDCARDFS_SB(sb); mutex_lock(&sdcardfs_super_list_lock); diff --git a/fs/sdcardfs/packagelist.c b/fs/sdcardfs/packagelist.c index 5398ebf8a34b..58218fed4083 100644 --- a/fs/sdcardfs/packagelist.c +++ b/fs/sdcardfs/packagelist.c @@ -61,7 +61,8 @@ static inline void qstr_init(struct qstr *q, const char *name) q->hash = full_name_case_hash(q->name, q->len); } -static inline int qstr_copy(const struct qstr *src, struct qstr *dest) { +static inline int qstr_copy(const struct qstr *src, struct qstr *dest) +{ dest->name = kstrdup(src->name, GFP_KERNEL); dest->hash_len = src->hash_len; return !!dest->name; @@ -89,6 +90,7 @@ static appid_t __get_appid(const struct qstr *key) appid_t get_appid(const char *key) { struct qstr q; + qstr_init(&q, key); return __get_appid(&q); } @@ -114,6 +116,7 @@ static appid_t __get_ext_gid(const struct qstr *key) appid_t get_ext_gid(const char *key) { struct qstr q; + qstr_init(&q, key); return __get_ext_gid(&q); } @@ -144,8 +147,10 @@ appid_t is_excluded(const char *key, userid_t user) /* Kernel has already enforced everything we returned through * derive_permissions_locked(), so this is used to lock down access - * even further, such as enforcing that apps hold sdcard_rw. */ -int check_caller_access_to_name(struct inode *parent_node, const struct qstr *name) { + * even further, such as enforcing that apps hold sdcard_rw. + */ +int check_caller_access_to_name(struct inode *parent_node, const struct qstr *name) +{ struct qstr q_autorun = QSTR_LITERAL("autorun.inf"); struct qstr q__android_secure = QSTR_LITERAL(".android_secure"); struct qstr q_android_secure = QSTR_LITERAL("android_secure"); @@ -160,26 +165,26 @@ int check_caller_access_to_name(struct inode *parent_node, const struct qstr *na } /* Root always has access; access for any other UIDs should always - * be controlled through packages.list. */ - if (from_kuid(&init_user_ns, current_fsuid()) == 0) { + * be controlled through packages.list. + */ + if (from_kuid(&init_user_ns, current_fsuid()) == 0) return 1; - } /* No extra permissions to enforce */ return 1; } /* This function is used when file opening. The open flags must be - * checked before calling check_caller_access_to_name() */ -int open_flags_to_access_mode(int open_flags) { - if((open_flags & O_ACCMODE) == O_RDONLY) { + * checked before calling check_caller_access_to_name() + */ +int open_flags_to_access_mode(int open_flags) +{ + if ((open_flags & O_ACCMODE) == O_RDONLY) return 0; /* R_OK */ - } else if ((open_flags & O_ACCMODE) == O_WRONLY) { + if ((open_flags & O_ACCMODE) == O_WRONLY) return 1; /* W_OK */ - } else { - /* Probably O_RDRW, but treat as default to be safe */ + /* Probably O_RDRW, but treat as default to be safe */ return 1; /* R_OK | W_OK */ - } } static struct hashtable_entry *alloc_hashtable_entry(const struct qstr *key, @@ -371,7 +376,6 @@ static void remove_packagelist_entry(const struct qstr *key) remove_packagelist_entry_locked(key); fixup_all_perms_name(key); mutex_unlock(&sdcardfs_super_list_lock); - return; } static void remove_ext_gid_entry_locked(const struct qstr *key, gid_t group) @@ -394,7 +398,6 @@ static void remove_ext_gid_entry(const struct qstr *key, gid_t group) mutex_lock(&sdcardfs_super_list_lock); remove_ext_gid_entry_locked(key, group); mutex_unlock(&sdcardfs_super_list_lock); - return; } static void remove_userid_all_entry_locked(userid_t userid) @@ -422,7 +425,6 @@ static void remove_userid_all_entry(userid_t userid) remove_userid_all_entry_locked(userid); fixup_all_perms_userid(userid); mutex_unlock(&sdcardfs_super_list_lock); - return; } static void remove_userid_exclude_entry_locked(const struct qstr *key, userid_t userid) @@ -447,7 +449,6 @@ static void remove_userid_exclude_entry(const struct qstr *key, userid_t userid) remove_userid_exclude_entry_locked(key, userid); fixup_all_perms_name_userid(key, userid); mutex_unlock(&sdcardfs_super_list_lock); - return; } static void packagelist_destroy(void) @@ -456,6 +457,7 @@ static void packagelist_destroy(void) struct hlist_node *h_t; HLIST_HEAD(free_list); int i; + mutex_lock(&sdcardfs_super_list_lock); hash_for_each_rcu(package_to_appid, i, hash_cur, hlist) { hash_del_rcu(&hash_cur->hlist); @@ -603,7 +605,7 @@ static struct configfs_attribute *package_details_attrs[] = { }; static struct configfs_item_operations package_details_item_ops = { - .release = package_details_release, + .release = package_details_release, }; static struct config_item_type package_appid_type = { @@ -659,6 +661,7 @@ static struct config_item *extension_details_make_item(struct config_group *grou struct extension_details *extension_details = kzalloc(sizeof(struct extension_details), GFP_KERNEL); const char *tmp; int ret; + if (!extension_details) return ERR_PTR(-ENOMEM); @@ -713,6 +716,7 @@ static struct config_group *extensions_make_group(struct config_group *group, co static void extensions_drop_group(struct config_group *group, struct config_item *item) { struct extensions_value *value = to_extensions_value(item); + printk(KERN_INFO "sdcardfs: No longer mapping any files to gid %d\n", value->num); kfree(value); } @@ -847,6 +851,7 @@ static int configfs_sdcardfs_init(void) { int ret, i; struct configfs_subsystem *subsys = &sdcardfs_packages; + for (i = 0; sd_default_groups[i]; i++) { config_group_init(sd_default_groups[i]); } @@ -877,7 +882,7 @@ int packagelist_init(void) } configfs_sdcardfs_init(); - return 0; + return 0; } void packagelist_exit(void) diff --git a/fs/sdcardfs/sdcardfs.h b/fs/sdcardfs/sdcardfs.h index e28a7dd47ebb..d1b86fff1b70 100644 --- a/fs/sdcardfs/sdcardfs.h +++ b/fs/sdcardfs/sdcardfs.h @@ -87,11 +87,11 @@ } while (0) /* OVERRIDE_CRED() and REVERT_CRED() - * OVERRID_CRED() - * backup original task->cred - * and modifies task->cred->fsuid/fsgid to specified value. + * OVERRIDE_CRED() + * backup original task->cred + * and modifies task->cred->fsuid/fsgid to specified value. * REVERT_CRED() - * restore original task->cred->fsuid/fsgid. + * restore original task->cred->fsuid/fsgid. * These two macro should be used in pair, and OVERRIDE_CRED() should be * placed at the beginning of a function, right after variable declaration. */ @@ -114,27 +114,29 @@ /* Android 5.0 support */ /* Permission mode for a specific node. Controls how file permissions - * are derived for children nodes. */ + * are derived for children nodes. + */ typedef enum { - /* Nothing special; this node should just inherit from its parent. */ - PERM_INHERIT, - /* This node is one level above a normal root; used for legacy layouts - * which use the first level to represent user_id. */ - PERM_PRE_ROOT, - /* This node is "/" */ - PERM_ROOT, - /* This node is "/Android" */ - PERM_ANDROID, - /* This node is "/Android/data" */ - PERM_ANDROID_DATA, - /* This node is "/Android/obb" */ - PERM_ANDROID_OBB, - /* This node is "/Android/media" */ - PERM_ANDROID_MEDIA, - /* This node is "/Android/[data|media|obb]/[package]" */ - PERM_ANDROID_PACKAGE, - /* This node is "/Android/[data|media|obb]/[package]/cache" */ - PERM_ANDROID_PACKAGE_CACHE, + /* Nothing special; this node should just inherit from its parent. */ + PERM_INHERIT, + /* This node is one level above a normal root; used for legacy layouts + * which use the first level to represent user_id. + */ + PERM_PRE_ROOT, + /* This node is "/" */ + PERM_ROOT, + /* This node is "/Android" */ + PERM_ANDROID, + /* This node is "/Android/data" */ + PERM_ANDROID_DATA, + /* This node is "/Android/obb" */ + PERM_ANDROID_OBB, + /* This node is "/Android/media" */ + PERM_ANDROID_MEDIA, + /* This node is "/Android/[data|media|obb]/[package]" */ + PERM_ANDROID_PACKAGE, + /* This node is "/Android/[data|media|obb]/[package]/cache" */ + PERM_ANDROID_PACKAGE_CACHE, } perm_t; struct sdcardfs_sb_info; @@ -142,9 +144,9 @@ struct sdcardfs_mount_options; struct sdcardfs_inode_info; /* Do not directly use this function. Use OVERRIDE_CRED() instead. */ -const struct cred * override_fsids(struct sdcardfs_sb_info* sbi, struct sdcardfs_inode_info *info); +const struct cred *override_fsids(struct sdcardfs_sb_info *sbi, struct sdcardfs_inode_info *info); /* Do not directly use this function, use REVERT_CRED() instead. */ -void revert_fsids(const struct cred * old_cred); +void revert_fsids(const struct cred *old_cred); /* operations vectors defined in specific files */ extern const struct file_operations sdcardfs_main_fops; @@ -221,7 +223,8 @@ struct sdcardfs_sb_info { struct super_block *sb; struct super_block *lower_sb; /* derived perm policy : some of options have been added - * to sdcardfs_mount_options (Android 4.4 support) */ + * to sdcardfs_mount_options (Android 4.4 support) + */ struct sdcardfs_mount_options options; spinlock_t lock; /* protects obbpath */ char *obbpath_s; @@ -332,7 +335,7 @@ static inline void sdcardfs_put_reset_##pname(const struct dentry *dent) \ { \ struct path pname; \ spin_lock(&SDCARDFS_D(dent)->lock); \ - if(SDCARDFS_D(dent)->pname.dentry) { \ + if (SDCARDFS_D(dent)->pname.dentry) { \ pathcpy(&pname, &SDCARDFS_D(dent)->pname); \ SDCARDFS_D(dent)->pname.dentry = NULL; \ SDCARDFS_D(dent)->pname.mnt = NULL; \ @@ -348,17 +351,17 @@ SDCARDFS_DENT_FUNC(orig_path) static inline bool sbinfo_has_sdcard_magic(struct sdcardfs_sb_info *sbinfo) { - return sbinfo && sbinfo->sb && sbinfo->sb->s_magic == SDCARDFS_SUPER_MAGIC; + return sbinfo && sbinfo->sb && sbinfo->sb->s_magic == SDCARDFS_SUPER_MAGIC; } /* grab a refererence if we aren't linking to ourself */ static inline void set_top(struct sdcardfs_inode_info *info, struct inode *top) { struct inode *old_top = NULL; + BUG_ON(IS_ERR_OR_NULL(top)); - if (info->top && info->top != &info->vfs_inode) { + if (info->top && info->top != &info->vfs_inode) old_top = info->top; - } if (top != &info->vfs_inode) igrab(top); info->top = top; @@ -368,11 +371,11 @@ static inline void set_top(struct sdcardfs_inode_info *info, struct inode *top) static inline struct inode *grab_top(struct sdcardfs_inode_info *info) { struct inode *top = info->top; - if (top) { + + if (top) return igrab(top); - } else { + else return NULL; - } } static inline void release_top(struct sdcardfs_inode_info *info) @@ -380,21 +383,24 @@ static inline void release_top(struct sdcardfs_inode_info *info) iput(info->top); } -static inline int get_gid(struct vfsmount *mnt, struct sdcardfs_inode_info *info) { +static inline int get_gid(struct vfsmount *mnt, struct sdcardfs_inode_info *info) +{ struct sdcardfs_vfsmount_options *opts = mnt->data; - if (opts->gid == AID_SDCARD_RW) { + if (opts->gid == AID_SDCARD_RW) /* As an optimization, certain trusted system components only run * as owner but operate across all users. Since we're now handing * out the sdcard_rw GID only to trusted apps, we're okay relaxing * the user boundary enforcement for the default view. The UIDs - * assigned to app directories are still multiuser aware. */ + * assigned to app directories are still multiuser aware. + */ return AID_SDCARD_RW; - } else { + else return multiuser_get_uid(info->userid, opts->gid); - } } -static inline int get_mode(struct vfsmount *mnt, struct sdcardfs_inode_info *info) { + +static inline int get_mode(struct vfsmount *mnt, struct sdcardfs_inode_info *info) +{ int owner_mode; int filtered_mode; struct sdcardfs_vfsmount_options *opts = mnt->data; @@ -403,17 +409,18 @@ static inline int get_mode(struct vfsmount *mnt, struct sdcardfs_inode_info *inf if (info->perm == PERM_PRE_ROOT) { /* Top of multi-user view should always be visible to ensure - * secondary users can traverse inside. */ + * secondary users can traverse inside. + */ visible_mode = 0711; } else if (info->under_android) { /* Block "other" access to Android directories, since only apps * belonging to a specific user should be in there; we still - * leave +x open for the default view. */ - if (opts->gid == AID_SDCARD_RW) { + * leave +x open for the default view. + */ + if (opts->gid == AID_SDCARD_RW) visible_mode = visible_mode & ~0006; - } else { + else visible_mode = visible_mode & ~0007; - } } owner_mode = info->lower_inode->i_mode & 0700; filtered_mode = visible_mode & (owner_mode | (owner_mode >> 3) | (owner_mode >> 6)); @@ -438,7 +445,7 @@ static inline void sdcardfs_get_real_lower(const struct dentry *dent, /* in case of a local obb dentry * the orig_path should be returned */ - if(has_graft_path(dent)) + if (has_graft_path(dent)) sdcardfs_get_orig_path(dent, real_lower); else sdcardfs_get_lower_path(dent, real_lower); @@ -447,7 +454,7 @@ static inline void sdcardfs_get_real_lower(const struct dentry *dent, static inline void sdcardfs_put_real_lower(const struct dentry *dent, struct path *real_lower) { - if(has_graft_path(dent)) + if (has_graft_path(dent)) sdcardfs_put_orig_path(dent, real_lower); else sdcardfs_put_lower_path(dent, real_lower); @@ -460,7 +467,7 @@ extern struct list_head sdcardfs_super_list; extern appid_t get_appid(const char *app_name); extern appid_t get_ext_gid(const char *app_name); extern appid_t is_excluded(const char *app_name, userid_t userid); -extern int check_caller_access_to_name(struct inode *parent_node, const struct qstr* name); +extern int check_caller_access_to_name(struct inode *parent_node, const struct qstr *name); extern int open_flags_to_access_mode(int open_flags); extern int packagelist_init(void); extern void packagelist_exit(void); @@ -481,7 +488,7 @@ extern void get_derived_permission_new(struct dentry *parent, struct dentry *den extern void fixup_perms_recursive(struct dentry *dentry, struct limit_search *limit); extern void update_derived_permission_lock(struct dentry *dentry); -void fixup_lower_ownership(struct dentry* dentry, const char *name); +void fixup_lower_ownership(struct dentry *dentry, const char *name); extern int need_graft_path(struct dentry *dentry); extern int is_base_obbpath(struct dentry *dentry); extern int is_obbpath_invalid(struct dentry *dentry); @@ -491,6 +498,7 @@ extern int setup_obb_dentry(struct dentry *dentry, struct path *lower_path); static inline struct dentry *lock_parent(struct dentry *dentry) { struct dentry *dir = dget_parent(dentry); + mutex_lock_nested(&d_inode(dir)->i_mutex, I_MUTEX_PARENT); return dir; } diff --git a/fs/sdcardfs/super.c b/fs/sdcardfs/super.c index edda32b68dc0..a4629f20812e 100644 --- a/fs/sdcardfs/super.c +++ b/fs/sdcardfs/super.c @@ -36,7 +36,7 @@ static void sdcardfs_put_super(struct super_block *sb) if (!spd) return; - if(spd->obbpath_s) { + if (spd->obbpath_s) { kfree(spd->obbpath_s); path_put(&spd->obbpath); } @@ -125,29 +125,33 @@ static int sdcardfs_remount_fs2(struct vfsmount *mnt, struct super_block *sb, * SILENT, but anything else left over is an error. */ if ((*flags & ~(MS_RDONLY | MS_MANDLOCK | MS_SILENT | MS_REMOUNT)) != 0) { - printk(KERN_ERR - "sdcardfs: remount flags 0x%x unsupported\n", *flags); + pr_err("sdcardfs: remount flags 0x%x unsupported\n", *flags); err = -EINVAL; } - printk(KERN_INFO "Remount options were %s for vfsmnt %p.\n", options, mnt); + pr_info("Remount options were %s for vfsmnt %p.\n", options, mnt); err = parse_options_remount(sb, options, *flags & ~MS_SILENT, mnt->data); return err; } -static void* sdcardfs_clone_mnt_data(void *data) { - struct sdcardfs_vfsmount_options* opt = kmalloc(sizeof(struct sdcardfs_vfsmount_options), GFP_KERNEL); - struct sdcardfs_vfsmount_options* old = data; - if(!opt) return NULL; +static void *sdcardfs_clone_mnt_data(void *data) +{ + struct sdcardfs_vfsmount_options *opt = kmalloc(sizeof(struct sdcardfs_vfsmount_options), GFP_KERNEL); + struct sdcardfs_vfsmount_options *old = data; + + if (!opt) + return NULL; opt->gid = old->gid; opt->mask = old->mask; return opt; } -static void sdcardfs_copy_mnt_data(void *data, void *newdata) { - struct sdcardfs_vfsmount_options* old = data; - struct sdcardfs_vfsmount_options* new = newdata; +static void sdcardfs_copy_mnt_data(void *data, void *newdata) +{ + struct sdcardfs_vfsmount_options *old = data; + struct sdcardfs_vfsmount_options *new = newdata; + old->gid = new->gid; old->mask = new->mask; } @@ -235,7 +239,8 @@ static void sdcardfs_umount_begin(struct super_block *sb) lower_sb->s_op->umount_begin(lower_sb); } -static int sdcardfs_show_options(struct vfsmount *mnt, struct seq_file *m, struct dentry *root) +static int sdcardfs_show_options(struct vfsmount *mnt, struct seq_file *m, + struct dentry *root) { struct sdcardfs_sb_info *sbi = SDCARDFS_SB(root->d_sb); struct sdcardfs_mount_options *opts = &sbi->options; From 8379c1b127a069d0016e2eac00037ac4ac0f9a80 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 16 Mar 2017 19:33:35 -0700 Subject: [PATCH 277/521] ANDROID: sdcardfs: Fix style issues with comments Signed-off-by: Daniel Rosenberg Bug: 35331000 Change-Id: I8791ef7eac527645ecb9407908e7e5ece35b8f80 --- fs/sdcardfs/dentry.c | 16 +--------------- fs/sdcardfs/derived_perm.c | 9 +++------ fs/sdcardfs/main.c | 2 +- 3 files changed, 5 insertions(+), 22 deletions(-) diff --git a/fs/sdcardfs/dentry.c b/fs/sdcardfs/dentry.c index aaa15dff6f51..8e31d1a80f0c 100644 --- a/fs/sdcardfs/dentry.c +++ b/fs/sdcardfs/dentry.c @@ -127,12 +127,10 @@ static int sdcardfs_hash_ci(const struct dentry *dentry, unsigned long hash; name = qstr->name; - //len = vfat_striptail_len(qstr); len = qstr->len; hash = init_name_hash(); while (len--) - //hash = partial_name_hash(nls_tolower(t, *name++), hash); hash = partial_name_hash(tolower(*name++), hash); qstr->hash = end_name_hash(hash); @@ -146,20 +144,8 @@ static int sdcardfs_cmp_ci(const struct dentry *parent, const struct dentry *dentry, unsigned int len, const char *str, const struct qstr *name) { - /* This function is copy of vfat_cmpi */ - // FIXME Should we support national language? - //struct nls_table *t = MSDOS_SB(parent->d_sb)->nls_io; - //unsigned int alen, blen; + /* FIXME Should we support national language? */ - /* A filename cannot end in '.' or we treat it like it has none */ - /* - alen = vfat_striptail_len(name); - blen = __vfat_striptail_len(len, str); - if (alen == blen) { - if (nls_strnicmp(t, name->name, str, alen) == 0) - return 0; - } - */ if (name->len == len) { if (str_n_case_eq(name->name, str, len)) return 0; diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index 51fa565742bd..748eb3741b3b 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -325,9 +325,7 @@ inline void update_derived_permission_lock(struct dentry *dentry) * 1. need to check whether the dentry is updated or not * 2. remove the root dentry update */ - if (IS_ROOT(dentry)) { - //setup_default_pre_root_state(d_inode(dentry)); - } else { + if (!IS_ROOT(dentry)) { parent = dget_parent(dentry); if (parent) { get_derived_permission(parent, dentry); @@ -377,7 +375,6 @@ int is_obbpath_invalid(struct dentry *dent) ret = 1; } else { path_get(&di->lower_path); - //lower_parent = lock_parent(lower_path->dentry); path_buf = kmalloc(PATH_MAX, GFP_ATOMIC); if (!path_buf) { @@ -392,7 +389,6 @@ int is_obbpath_invalid(struct dentry *dent) kfree(path_buf); } - //unlock_dir(lower_parent); pathcpy(&lower_path, &di->lower_path); need_put = 1; } @@ -438,7 +434,8 @@ int setup_obb_dentry(struct dentry *dentry, struct path *lower_path) /* A local obb dentry must have its own orig_path to support rmdir * and mkdir of itself. Usually, we expect that the sbi->obbpath - * is avaiable on this stage. */ + * is avaiable on this stage. + */ sdcardfs_set_orig_path(dentry, lower_path); err = kern_path(sbi->obbpath_s, diff --git a/fs/sdcardfs/main.c b/fs/sdcardfs/main.c index c249bd73b121..95cf03379019 100644 --- a/fs/sdcardfs/main.c +++ b/fs/sdcardfs/main.c @@ -29,7 +29,7 @@ enum { Opt_gid, Opt_debug, Opt_mask, - Opt_multiuser, // May need? + Opt_multiuser, Opt_userid, Opt_reserved_mb, Opt_err, From fe13dc52643f6d70eebac263d24637597bf5c2fd Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Tue, 21 Mar 2017 16:29:13 -0700 Subject: [PATCH 278/521] ANDROID: sdcardfs: remove unneeded null check As pointed out by checkpatch, these functions already handle null inputs, so the checks are not needed. Signed-off-by: Daniel Rosenberg Bug: 35331000 Change-Id: I189342f032dfcefee36b27648bb512488ad61d20 --- fs/sdcardfs/lookup.c | 3 +-- fs/sdcardfs/packagelist.c | 3 +-- fs/sdcardfs/super.c | 3 +-- 3 files changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index fab9fd9d0105..51a904e59f46 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -36,8 +36,7 @@ int sdcardfs_init_dentry_cache(void) void sdcardfs_destroy_dentry_cache(void) { - if (sdcardfs_dentry_cachep) - kmem_cache_destroy(sdcardfs_dentry_cachep); + kmem_cache_destroy(sdcardfs_dentry_cachep); } void free_dentry_private_data(struct dentry *dentry) diff --git a/fs/sdcardfs/packagelist.c b/fs/sdcardfs/packagelist.c index 58218fed4083..3c016cb0d8bb 100644 --- a/fs/sdcardfs/packagelist.c +++ b/fs/sdcardfs/packagelist.c @@ -889,6 +889,5 @@ void packagelist_exit(void) { configfs_sdcardfs_exit(); packagelist_destroy(); - if (hashtable_entry_cachep) - kmem_cache_destroy(hashtable_entry_cachep); + kmem_cache_destroy(hashtable_entry_cachep); } diff --git a/fs/sdcardfs/super.c b/fs/sdcardfs/super.c index a4629f20812e..7d3c331379f5 100644 --- a/fs/sdcardfs/super.c +++ b/fs/sdcardfs/super.c @@ -222,8 +222,7 @@ int sdcardfs_init_inode_cache(void) /* sdcardfs inode cache destructor */ void sdcardfs_destroy_inode_cache(void) { - if (sdcardfs_inode_cachep) - kmem_cache_destroy(sdcardfs_inode_cachep); + kmem_cache_destroy(sdcardfs_inode_cachep); } /* From f52be24bd98b8cda7aa03022ac511e5f42fa2bdd Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 16 Mar 2017 17:46:13 -0700 Subject: [PATCH 279/521] ANDROID: sdcardfs: Use pr_[...] instead of printk Signed-off-by: Daniel Rosenberg Bug: 35331000 Change-Id: Ibc635ec865750530d32b87067779f681fe58a003 --- fs/sdcardfs/derived_perm.c | 6 ++--- fs/sdcardfs/file.c | 9 ++++---- fs/sdcardfs/inode.c | 8 +++---- fs/sdcardfs/lookup.c | 2 +- fs/sdcardfs/main.c | 45 +++++++++++++++++--------------------- fs/sdcardfs/packagelist.c | 16 +++++++------- fs/sdcardfs/sdcardfs.h | 2 +- fs/sdcardfs/super.c | 5 ++--- 8 files changed, 43 insertions(+), 50 deletions(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index 748eb3741b3b..eabdbfd30054 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -318,7 +318,7 @@ inline void update_derived_permission_lock(struct dentry *dentry) struct dentry *parent; if (!dentry || !d_inode(dentry)) { - printk(KERN_ERR "sdcardfs: %s: invalid dentry\n", __func__); + pr_err("sdcardfs: %s: invalid dentry\n", __func__); return; } /* FIXME: @@ -379,7 +379,7 @@ int is_obbpath_invalid(struct dentry *dent) path_buf = kmalloc(PATH_MAX, GFP_ATOMIC); if (!path_buf) { ret = 1; - printk(KERN_ERR "sdcardfs: fail to allocate path_buf in %s.\n", __func__); + pr_err("sdcardfs: fail to allocate path_buf in %s.\n", __func__); } else { obbpath_s = d_path(&di->lower_path, path_buf, PATH_MAX); if (d_unhashed(di->lower_path.dentry) || @@ -451,7 +451,7 @@ int setup_obb_dentry(struct dentry *dentry, struct path *lower_path) * because the sdcard daemon also regards this case as * a lookup fail. */ - printk(KERN_INFO "sdcardfs: the sbi->obbpath is not available\n"); + pr_info("sdcardfs: the sbi->obbpath is not available\n"); } return err; } diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c index 4c9726606c0f..eee4eb5896e5 100644 --- a/fs/sdcardfs/file.c +++ b/fs/sdcardfs/file.c @@ -65,7 +65,7 @@ static ssize_t sdcardfs_write(struct file *file, const char __user *buf, /* check disk space */ if (!check_min_free_space(dentry, count, 0)) { - printk(KERN_INFO "No minimum free space.\n"); + pr_err("No minimum free space.\n"); return -ENOSPC; } @@ -160,8 +160,7 @@ static int sdcardfs_mmap(struct file *file, struct vm_area_struct *vma) lower_file = sdcardfs_lower_file(file); if (willwrite && !lower_file->f_mapping->a_ops->writepage) { err = -EINVAL; - printk(KERN_ERR "sdcardfs: lower file system does not " - "support writeable mmap\n"); + pr_err("sdcardfs: lower file system does not support writeable mmap\n"); goto out; } @@ -173,14 +172,14 @@ static int sdcardfs_mmap(struct file *file, struct vm_area_struct *vma) if (!SDCARDFS_F(file)->lower_vm_ops) { err = lower_file->f_op->mmap(lower_file, vma); if (err) { - printk(KERN_ERR "sdcardfs: lower mmap failed %d\n", err); + pr_err("sdcardfs: lower mmap failed %d\n", err); goto out; } saved_vm_ops = vma->vm_ops; /* save: came from lower ->mmap */ err = do_munmap(current->mm, vma->vm_start, vma->vm_end - vma->vm_start); if (err) { - printk(KERN_ERR "sdcardfs: do_munmap failed %d\n", err); + pr_err("sdcardfs: do_munmap failed %d\n", err); goto out; } } diff --git a/fs/sdcardfs/inode.c b/fs/sdcardfs/inode.c index ce187d865f29..f15cb11ca8fd 100644 --- a/fs/sdcardfs/inode.c +++ b/fs/sdcardfs/inode.c @@ -248,7 +248,7 @@ static int touch(char *abs_path, mode_t mode) if (PTR_ERR(filp) == -EEXIST) { return 0; } else { - printk(KERN_ERR "sdcardfs: failed to open(%s): %ld\n", + pr_err("sdcardfs: failed to open(%s): %ld\n", abs_path, PTR_ERR(filp)); return PTR_ERR(filp); } @@ -284,7 +284,7 @@ static int sdcardfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode /* check disk space */ if (!check_min_free_space(dentry, 0, 1)) { - printk(KERN_INFO "sdcardfs: No minimum free space.\n"); + pr_err("sdcardfs: No minimum free space.\n"); err = -ENOSPC; goto out_revert; } @@ -360,7 +360,7 @@ static int sdcardfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode set_fs_pwd(current->fs, &lower_path); touch_err = touch(".nomedia", 0664); if (touch_err) { - printk(KERN_ERR "sdcardfs: failed to create .nomedia in %s: %d\n", + pr_err("sdcardfs: failed to create .nomedia in %s: %d\n", lower_path.dentry->d_name.name, touch_err); goto out; } @@ -642,7 +642,7 @@ static int sdcardfs_permission(struct vfsmount *mnt, struct inode *inode, int ma release_top(SDCARDFS_I(inode)); tmp.i_sb = inode->i_sb; if (IS_POSIXACL(inode)) - printk(KERN_WARNING "%s: This may be undefined behavior... \n", __func__); + pr_warn("%s: This may be undefined behavior...\n", __func__); err = generic_permission(&tmp, mask); /* XXX * Original sdcardfs code calls inode_permission(lower_inode,.. ) diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index 51a904e59f46..cfa2b12b6411 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -323,7 +323,7 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, * because the sdcard daemon also regards this case as * a lookup fail. */ - printk(KERN_INFO "sdcardfs: base obbpath is not available\n"); + pr_info("sdcardfs: base obbpath is not available\n"); sdcardfs_put_reset_orig_path(dentry); goto out; } diff --git a/fs/sdcardfs/main.c b/fs/sdcardfs/main.c index 95cf03379019..7344635522cd 100644 --- a/fs/sdcardfs/main.c +++ b/fs/sdcardfs/main.c @@ -117,19 +117,17 @@ static int parse_options(struct super_block *sb, char *options, int silent, break; /* unknown option */ default: - if (!silent) { - printk( KERN_ERR "Unrecognized mount option \"%s\" " - "or missing value", p); - } + if (!silent) + pr_err("Unrecognized mount option \"%s\" or missing value", p); return -EINVAL; } } if (*debug) { - printk( KERN_INFO "sdcardfs : options - debug:%d\n", *debug); - printk( KERN_INFO "sdcardfs : options - uid:%d\n", + pr_info("sdcardfs : options - debug:%d\n", *debug); + pr_info("sdcardfs : options - uid:%d\n", opts->fs_low_uid); - printk( KERN_INFO "sdcardfs : options - gid:%d\n", + pr_info("sdcardfs : options - gid:%d\n", opts->fs_low_gid); } @@ -175,22 +173,20 @@ int parse_options_remount(struct super_block *sb, char *options, int silent, case Opt_fsuid: case Opt_fsgid: case Opt_reserved_mb: - printk( KERN_WARNING "Option \"%s\" can't be changed during remount\n", p); + pr_warn("Option \"%s\" can't be changed during remount\n", p); break; /* unknown option */ default: - if (!silent) { - printk( KERN_ERR "Unrecognized mount option \"%s\" " - "or missing value", p); - } + if (!silent) + pr_err("Unrecognized mount option \"%s\" or missing value", p); return -EINVAL; } } if (debug) { - printk( KERN_INFO "sdcardfs : options - debug:%d\n", debug); - printk( KERN_INFO "sdcardfs : options - gid:%d\n", vfsopts->gid); - printk( KERN_INFO "sdcardfs : options - mask:%d\n", vfsopts->mask); + pr_info("sdcardfs : options - debug:%d\n", debug); + pr_info("sdcardfs : options - gid:%d\n", vfsopts->gid); + pr_info("sdcardfs : options - mask:%d\n", vfsopts->mask); } return 0; @@ -244,31 +240,30 @@ static int sdcardfs_read_super(struct vfsmount *mnt, struct super_block *sb, struct sdcardfs_vfsmount_options *mnt_opt = mnt->data; struct inode *inode; - printk(KERN_INFO "sdcardfs version 2.0\n"); + pr_info("sdcardfs version 2.0\n"); if (!dev_name) { - printk(KERN_ERR - "sdcardfs: read_super: missing dev_name argument\n"); + pr_err("sdcardfs: read_super: missing dev_name argument\n"); err = -EINVAL; goto out; } - printk(KERN_INFO "sdcardfs: dev_name -> %s\n", dev_name); - printk(KERN_INFO "sdcardfs: options -> %s\n", (char *)raw_data); - printk(KERN_INFO "sdcardfs: mnt -> %p\n", mnt); + pr_info("sdcardfs: dev_name -> %s\n", dev_name); + pr_info("sdcardfs: options -> %s\n", (char *)raw_data); + pr_info("sdcardfs: mnt -> %p\n", mnt); /* parse lower path */ err = kern_path(dev_name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &lower_path); if (err) { - printk(KERN_ERR "sdcardfs: error accessing lower directory '%s'\n", dev_name); + pr_err("sdcardfs: error accessing lower directory '%s'\n", dev_name); goto out; } /* allocate superblock private data */ sb->s_fs_info = kzalloc(sizeof(struct sdcardfs_sb_info), GFP_KERNEL); if (!SDCARDFS_SB(sb)) { - printk(KERN_CRIT "sdcardfs: read_super: out of memory\n"); + pr_crit("sdcardfs: read_super: out of memory\n"); err = -ENOMEM; goto out_free; } @@ -277,7 +272,7 @@ static int sdcardfs_read_super(struct vfsmount *mnt, struct super_block *sb, /* parse options */ err = parse_options(sb, raw_data, silent, &debug, mnt_opt, &sb_info->options); if (err) { - printk(KERN_ERR "sdcardfs: invalid options\n"); + pr_err("sdcardfs: invalid options\n"); goto out_freesbi; } @@ -347,7 +342,7 @@ static int sdcardfs_read_super(struct vfsmount *mnt, struct super_block *sb, mutex_unlock(&sdcardfs_super_list_lock); if (!silent) - printk(KERN_INFO "sdcardfs: mounted on top of %s type %s\n", + pr_info("sdcardfs: mounted on top of %s type %s\n", dev_name, lower_sb->s_type->name); goto out; /* all is well */ diff --git a/fs/sdcardfs/packagelist.c b/fs/sdcardfs/packagelist.c index 3c016cb0d8bb..89196e31073e 100644 --- a/fs/sdcardfs/packagelist.c +++ b/fs/sdcardfs/packagelist.c @@ -471,7 +471,7 @@ static void packagelist_destroy(void) hlist_for_each_entry_safe(hash_cur, h_t, &free_list, dlist) free_hashtable_entry(hash_cur); mutex_unlock(&sdcardfs_super_list_lock); - printk(KERN_INFO "sdcardfs: destroyed packagelist pkgld\n"); + pr_info("sdcardfs: destroyed packagelist pkgld\n"); } #define SDCARDFS_CONFIGFS_ATTR(_pfx, _name) \ @@ -587,7 +587,8 @@ static ssize_t package_details_clear_userid_store(struct config_item *item, static void package_details_release(struct config_item *item) { struct package_details *package_details = to_package_details(item); - printk(KERN_INFO "sdcardfs: removing %s\n", package_details->name.name); + + pr_info("sdcardfs: removing %s\n", package_details->name.name); remove_packagelist_entry(&package_details->name); kfree(package_details->name.name); kfree(package_details); @@ -639,7 +640,7 @@ static void extension_details_release(struct config_item *item) { struct extension_details *extension_details = to_extension_details(item); - printk(KERN_INFO "sdcardfs: No longer mapping %s files to gid %d\n", + pr_info("sdcardfs: No longer mapping %s files to gid %d\n", extension_details->name.name, extension_details->num); remove_ext_gid_entry(&extension_details->name, extension_details->num); kfree(extension_details->name.name); @@ -717,7 +718,7 @@ static void extensions_drop_group(struct config_group *group, struct config_item { struct extensions_value *value = to_extensions_value(item); - printk(KERN_INFO "sdcardfs: No longer mapping any files to gid %d\n", value->num); + pr_info("sdcardfs: No longer mapping any files to gid %d\n", value->num); kfree(value); } @@ -852,14 +853,13 @@ static int configfs_sdcardfs_init(void) int ret, i; struct configfs_subsystem *subsys = &sdcardfs_packages; - for (i = 0; sd_default_groups[i]; i++) { + for (i = 0; sd_default_groups[i]; i++) config_group_init(sd_default_groups[i]); - } config_group_init(&subsys->su_group); mutex_init(&subsys->su_mutex); ret = configfs_register_subsystem(subsys); if (ret) { - printk(KERN_ERR "Error %d while registering subsystem %s\n", + pr_err("Error %d while registering subsystem %s\n", ret, subsys->su_group.cg_item.ci_namebuf); } @@ -877,7 +877,7 @@ int packagelist_init(void) kmem_cache_create("packagelist_hashtable_entry", sizeof(struct hashtable_entry), 0, 0, NULL); if (!hashtable_entry_cachep) { - printk(KERN_ERR "sdcardfs: failed creating pkgl_hashtable entry slab cache\n"); + pr_err("sdcardfs: failed creating pkgl_hashtable entry slab cache\n"); return -ENOMEM; } diff --git a/fs/sdcardfs/sdcardfs.h b/fs/sdcardfs/sdcardfs.h index d1b86fff1b70..a9c5a8345d09 100644 --- a/fs/sdcardfs/sdcardfs.h +++ b/fs/sdcardfs/sdcardfs.h @@ -53,7 +53,7 @@ #define SDCARDFS_ROOT_INO 1 /* useful for tracking code reachability */ -#define UDBG printk(KERN_DEFAULT "DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__) +#define UDBG pr_default("DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__) #define SDCARDFS_DIRENT_SIZE 256 diff --git a/fs/sdcardfs/super.c b/fs/sdcardfs/super.c index 7d3c331379f5..67b4ae24337a 100644 --- a/fs/sdcardfs/super.c +++ b/fs/sdcardfs/super.c @@ -64,7 +64,7 @@ static int sdcardfs_statfs(struct dentry *dentry, struct kstatfs *buf) if (sbi->options.reserved_mb) { /* Invalid statfs informations. */ if (buf->f_bsize == 0) { - printk(KERN_ERR "Returned block size is zero.\n"); + pr_err("Returned block size is zero.\n"); return -EINVAL; } @@ -100,8 +100,7 @@ static int sdcardfs_remount_fs(struct super_block *sb, int *flags, char *options * SILENT, but anything else left over is an error. */ if ((*flags & ~(MS_RDONLY | MS_MANDLOCK | MS_SILENT)) != 0) { - printk(KERN_ERR - "sdcardfs: remount flags 0x%x unsupported\n", *flags); + pr_err("sdcardfs: remount flags 0x%x unsupported\n", *flags); err = -EINVAL; } From 91006b3837da22039d3c6b4a0cf71beba973c80c Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 16 Mar 2017 19:32:59 -0700 Subject: [PATCH 280/521] ANDROID: sdcardfs: Use to kstrout Switch from deprecated simple_strtoul to kstrout Signed-off-by: Daniel Rosenberg Bug: 35331000 Change-Id: If18bd133b4d2877f71e58b58fc31371ff6613ed5 --- fs/sdcardfs/derived_perm.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index eabdbfd30054..f82fedd29007 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -60,6 +60,8 @@ void get_derived_permission_new(struct dentry *parent, struct dentry *dentry, struct sdcardfs_inode_info *info = SDCARDFS_I(d_inode(dentry)); struct sdcardfs_inode_info *parent_info = SDCARDFS_I(d_inode(parent)); appid_t appid; + unsigned long user_num; + int err; struct qstr q_Android = QSTR_LITERAL("Android"); struct qstr q_data = QSTR_LITERAL("data"); struct qstr q_obb = QSTR_LITERAL("obb"); @@ -88,7 +90,11 @@ void get_derived_permission_new(struct dentry *parent, struct dentry *dentry, case PERM_PRE_ROOT: /* Legacy internal layout places users at top level */ info->perm = PERM_ROOT; - info->userid = simple_strtoul(name->name, NULL, 10); + err = kstrtoul(name->name, 10, &user_num); + if (err) + info->userid = 0; + else + info->userid = user_num; set_top(info, &info->vfs_inode); break; case PERM_ROOT: From 1ad403398f408f1f1820bac88b79c3386e45f61c Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Tue, 21 Mar 2017 17:27:40 -0700 Subject: [PATCH 281/521] ANDROID: sdcardfs: Use seq_puts over seq_printf Signed-off-by: Daniel Rosenberg Bug: 35331000 Change-Id: I3795ec61ce61e324738815b1ce3b0e09b25d723f --- fs/sdcardfs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/sdcardfs/super.c b/fs/sdcardfs/super.c index 67b4ae24337a..a3393e959c63 100644 --- a/fs/sdcardfs/super.c +++ b/fs/sdcardfs/super.c @@ -251,7 +251,7 @@ static int sdcardfs_show_options(struct vfsmount *mnt, struct seq_file *m, if (vfsopts->gid != 0) seq_printf(m, ",gid=%u", vfsopts->gid); if (opts->multiuser) - seq_printf(m, ",multiuser"); + seq_puts(m, ",multiuser"); if (vfsopts->mask) seq_printf(m, ",mask=%u", vfsopts->mask); if (opts->fs_user_id) From 53f987226cc079f68dade2b53a6ca09332ec2e70 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Tue, 21 Mar 2017 19:11:38 -0700 Subject: [PATCH 282/521] ANDROID: sdcardfs: Fix style issues in macros Signed-off-by: Daniel Rosenberg Bug: 35331000 Change-Id: I89c4035029dc2236081a7685c55cac595d9e7ebf --- fs/sdcardfs/sdcardfs.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/sdcardfs/sdcardfs.h b/fs/sdcardfs/sdcardfs.h index a9c5a8345d09..2b67b9a8ef9f 100644 --- a/fs/sdcardfs/sdcardfs.h +++ b/fs/sdcardfs/sdcardfs.h @@ -96,21 +96,21 @@ * placed at the beginning of a function, right after variable declaration. */ #define OVERRIDE_CRED(sdcardfs_sbi, saved_cred, info) \ + do { \ saved_cred = override_fsids(sdcardfs_sbi, info); \ - if (!saved_cred) { return -ENOMEM; } + if (!saved_cred) \ + return -ENOMEM; \ + } while (0) #define OVERRIDE_CRED_PTR(sdcardfs_sbi, saved_cred, info) \ + do { \ saved_cred = override_fsids(sdcardfs_sbi, info); \ - if (!saved_cred) { return ERR_PTR(-ENOMEM); } + if (!saved_cred) \ + return ERR_PTR(-ENOMEM); \ + } while (0) #define REVERT_CRED(saved_cred) revert_fsids(saved_cred) -#define DEBUG_CRED() \ - printk("KAKJAGI: %s:%d fsuid %d fsgid %d\n", \ - __FUNCTION__, __LINE__, \ - (int)current->cred->fsuid, \ - (int)current->cred->fsgid); - /* Android 5.0 support */ /* Permission mode for a specific node. Controls how file permissions From f85bf2f9882e89605d19abfd8d13370e6337e5d9 Mon Sep 17 00:00:00 2001 From: Prasanna Karthik Date: Tue, 8 Dec 2015 17:30:25 +0100 Subject: [PATCH 283/521] UPSTREAM: ARM: 8476/1: VDSO: use PTR_ERR_OR_ZERO for vma check (cherry pick from commit 38fc2f6c98262913388de338d5b0cda67e3f78cd) Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Signed-off-by: Prasanna Karthik Signed-off-by: Nathan Lynch Signed-off-by: Russell King Bug: 20045882 Bug: 19198045 Change-Id: Id4838948c19f031a66af1654a4c47668288a0037 --- arch/arm/kernel/vdso.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kernel/vdso.c b/arch/arm/kernel/vdso.c index 54a5aeab988d..994e971a8538 100644 --- a/arch/arm/kernel/vdso.c +++ b/arch/arm/kernel/vdso.c @@ -224,7 +224,7 @@ static int install_vvar(struct mm_struct *mm, unsigned long addr) VM_READ | VM_MAYREAD, &vdso_data_mapping); - return IS_ERR(vma) ? PTR_ERR(vma) : 0; + return PTR_ERR_OR_ZERO(vma); } /* assumes mmap_sem is write-locked */ From d127906192a7ce336930624983bd81679aeb320a Mon Sep 17 00:00:00 2001 From: Chen Gang Date: Thu, 14 Jan 2016 15:18:33 -0800 Subject: [PATCH 284/521] UPSTREAM: mm: add PHYS_PFN, use it in __phys_to_pfn() (cherry pick from commit 8f235d1a3eb7198affe7cadf676a10afb8a46a1a) __phys_to_pfn and __pfn_to_phys are symmetric, PHYS_PFN and PFN_PHYS are semmetric: - y = (phys_addr_t)x << PAGE_SHIFT - y >> PAGE_SHIFT = (phys_add_t)x - (unsigned long)(y >> PAGE_SHIFT) = x [akpm@linux-foundation.org: use macro arg name `x'] [arnd@arndb.de: include linux/pfn.h for PHYS_PFN definition] Signed-off-by: Chen Gang Cc: Oleg Nesterov Signed-off-by: Arnd Bergmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Bug: 20045882 Bug: 19198045 Change-Id: If968d2246b381b9e5d6446e9d6d9fa45bb718e91 --- include/asm-generic/memory_model.h | 4 +++- include/linux/pfn.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h index 4b4b056a6eb0..5148150cc80b 100644 --- a/include/asm-generic/memory_model.h +++ b/include/asm-generic/memory_model.h @@ -1,6 +1,8 @@ #ifndef __ASM_MEMORY_MODEL_H #define __ASM_MEMORY_MODEL_H +#include + #ifndef __ASSEMBLY__ #if defined(CONFIG_FLATMEM) @@ -72,7 +74,7 @@ /* * Convert a physical address to a Page Frame Number and back */ -#define __phys_to_pfn(paddr) ((unsigned long)((paddr) >> PAGE_SHIFT)) +#define __phys_to_pfn(paddr) PHYS_PFN(paddr) #define __pfn_to_phys(pfn) PFN_PHYS(pfn) #define page_to_pfn __page_to_pfn diff --git a/include/linux/pfn.h b/include/linux/pfn.h index 7646637221f3..97f3e88aead4 100644 --- a/include/linux/pfn.h +++ b/include/linux/pfn.h @@ -9,5 +9,6 @@ #define PFN_UP(x) (((x) + PAGE_SIZE-1) >> PAGE_SHIFT) #define PFN_DOWN(x) ((x) >> PAGE_SHIFT) #define PFN_PHYS(x) ((phys_addr_t)(x) << PAGE_SHIFT) +#define PHYS_PFN(x) ((unsigned long)((x) >> PAGE_SHIFT)) #endif From 9244f24304244b42806a04899de6205e961f9161 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sun, 13 Mar 2016 09:13:55 +0900 Subject: [PATCH 285/521] UPSTREAM: kbuild: drop FORCE from PHONY targets (cherry pick from commit 2e8d696b79e9c68d3005a9b09a8c72625d141ea6) These targets are marked as PHONY. No need to add FORCE to their dependency. Signed-off-by: Masahiro Yamada Signed-off-by: Michal Marek Bug: 20045882 Bug: 19198045 Change-Id: I718d46c339c99418d543cc64cabc851307e5abb7 --- Makefile | 8 ++++---- arch/arm/vdso/Makefile | 2 +- arch/ia64/Makefile | 4 ++-- arch/x86/entry/vdso/Makefile | 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/Makefile b/Makefile index fb7c2b40753d..b49bd95335e5 100644 --- a/Makefile +++ b/Makefile @@ -146,7 +146,7 @@ PHONY += $(MAKECMDGOALS) sub-make $(filter-out _all sub-make $(CURDIR)/Makefile, $(MAKECMDGOALS)) _all: sub-make @: -sub-make: FORCE +sub-make: $(Q)$(MAKE) -C $(KBUILD_OUTPUT) KBUILD_SRC=$(CURDIR) \ -f $(CURDIR)/Makefile $(filter-out _all sub-make,$(MAKECMDGOALS)) @@ -1000,7 +1000,7 @@ prepare1: prepare2 $(version_h) include/generated/utsrelease.h \ archprepare: archheaders archscripts prepare1 scripts_basic -prepare0: archprepare FORCE +prepare0: archprepare $(Q)$(MAKE) $(build)=. # All the preparing.. @@ -1045,7 +1045,7 @@ INSTALL_FW_PATH=$(INSTALL_MOD_PATH)/lib/firmware export INSTALL_FW_PATH PHONY += firmware_install -firmware_install: FORCE +firmware_install: @mkdir -p $(objtree)/firmware $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.fwinst obj=firmware __fw_install @@ -1065,7 +1065,7 @@ PHONY += archscripts archscripts: PHONY += __headers -__headers: $(version_h) scripts_basic asm-generic archheaders archscripts FORCE +__headers: $(version_h) scripts_basic asm-generic archheaders archscripts $(Q)$(MAKE) $(build)=scripts build_unifdef PHONY += headers_install_all diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile index 1160434eece0..59a8fa7b8a3b 100644 --- a/arch/arm/vdso/Makefile +++ b/arch/arm/vdso/Makefile @@ -74,5 +74,5 @@ $(MODLIB)/vdso: FORCE @mkdir -p $(MODLIB)/vdso PHONY += vdso_install -vdso_install: $(obj)/vdso.so.dbg $(MODLIB)/vdso FORCE +vdso_install: $(obj)/vdso.so.dbg $(MODLIB)/vdso $(call cmd,vdso_install) diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile index 970d0bd99621..648f1cef33fa 100644 --- a/arch/ia64/Makefile +++ b/arch/ia64/Makefile @@ -95,8 +95,8 @@ define archhelp echo '* unwcheck - Check vmlinux for invalid unwind info' endef -archprepare: make_nr_irqs_h FORCE +archprepare: make_nr_irqs_h PHONY += make_nr_irqs_h FORCE -make_nr_irqs_h: FORCE +make_nr_irqs_h: $(Q)$(MAKE) $(build)=arch/ia64/kernel include/generated/nr-irqs.h diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 265c0ed68118..7af017a8958f 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -187,10 +187,10 @@ vdso_img_insttargets := $(vdso_img_sodbg:%.dbg=install_%) $(MODLIB)/vdso: FORCE @mkdir -p $(MODLIB)/vdso -$(vdso_img_insttargets): install_%: $(obj)/%.dbg $(MODLIB)/vdso FORCE +$(vdso_img_insttargets): install_%: $(obj)/%.dbg $(MODLIB)/vdso $(call cmd,vdso_install) PHONY += vdso_install $(vdso_img_insttargets) -vdso_install: $(vdso_img_insttargets) FORCE +vdso_install: $(vdso_img_insttargets) clean-files := vdso32.so vdso32.so.dbg vdso64* vdso-image-*.c vdsox32.so* From a43810263077889d8fada20c78310692d990ada1 Mon Sep 17 00:00:00 2001 From: Kevin Brodsky Date: Thu, 12 May 2016 17:39:15 +0100 Subject: [PATCH 286/521] UPSTREAM: arm64: fix vdso-offsets.h dependency (cherry pick from commit a66649dab35033bd67988670fa60c268b0444cda) arm64/kernel/{vdso,signal}.c include vdso-offsets.h, as well as any file that includes asm/vdso.h. Therefore, vdso-offsets.h must be generated before these files are compiled. The current rules in arm64/kernel/Makefile do not actually enforce this, because even though $(obj)/vdso is listed as a prerequisite for vdso-offsets.h, this does not result in the intended effect of building the vdso subdirectory (before all the other objects). As a consequence, depending on the order in which the rules are followed, vdso-offsets.h is updated or not before arm64/kernel/{vdso,signal}.o are built. The current rules also impose an unnecessary dependency on vdso-offsets.h for all arm64/kernel/*.o, resulting in unnecessary rebuilds. This is made obvious when using make -j: touch arch/arm64/kernel/vdso/gettimeofday.S && make -j$NCPUS arch/arm64/kernel will sometimes result in none of arm64/kernel/*.o being rebuilt, sometimes all of them, or even just some of them. It is quite difficult to ensure that a header is generated before it is used with recursive Makefiles by using normal rules. Instead, arch-specific generated headers are normally built in the archprepare recipe in the arch Makefile (see for instance arch/ia64/Makefile). Unfortunately, asm-offsets.h is included in gettimeofday.S, and must therefore be generated before vdso-offsets.h, which is not the case if archprepare is used. For this reason, a rule run after archprepare has to be used. This commit adds rules in arm64/Makefile to build vdso-offsets.h during the prepare step, ensuring that vdso-offsets.h is generated before building anything. It also removes the now-unnecessary dependencies on vdso-offsets.h in arm64/kernel/Makefile. Finally, it removes the duplication of asm-offsets.h between arm64/kernel/vdso/ and include/generated/ and makes include/generated/vdso-offsets.h a target in arm64/kernel/vdso/Makefile. Cc: Will Deacon Cc: Michal Marek Signed-off-by: Kevin Brodsky Signed-off-by: Catalin Marinas Bug: 20045882 Bug: 19198045 Change-Id: Iaaf1c6d3c98287617fc7f63a736752f6597bc8d0 --- arch/arm64/Makefile | 10 ++++++++++ arch/arm64/kernel/Makefile | 4 ---- arch/arm64/kernel/vdso/Makefile | 7 +++---- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 26ce8c9a473c..f76ec8d5ac6b 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -130,6 +130,16 @@ archclean: $(Q)$(MAKE) $(clean)=$(boot) $(Q)$(MAKE) $(clean)=$(boot)/dts +# We need to generate vdso-offsets.h before compiling certain files in kernel/. +# In order to do that, we should use the archprepare target, but we can't since +# asm-offsets.h is included in some files used to generate vdso-offsets.h, and +# asm-offsets.h is built in prepare0, for which archprepare is a dependency. +# Therefore we need to generate the header after prepare0 has been made, hence +# this hack. +prepare: vdso_prepare +vdso_prepare: prepare0 + $(Q)$(MAKE) $(build)=arch/arm64/kernel/vdso include/generated/vdso-offsets.h + define archhelp echo '* Image.gz - Compressed kernel image (arch/$(ARCH)/boot/Image.gz)' echo ' Image - Uncompressed kernel image (arch/$(ARCH)/boot/Image)' diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 20bcc2db06bf..4e8a30b7e949 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -50,7 +50,3 @@ obj-y += $(arm64-obj-y) vdso/ probes/ obj-m += $(arm64-obj-m) head-y := head.o extra-y += $(head-y) vmlinux.lds - -# vDSO - this must be built first to generate the symbol offsets -$(call objectify,$(arm64-obj-y)): $(obj)/vdso/vdso-offsets.h -$(obj)/vdso/vdso-offsets.h: $(obj)/vdso diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index b467fd0a384b..62c84f7cb01b 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -23,7 +23,7 @@ GCOV_PROFILE := n ccflags-y += -Wl,-shared obj-y += vdso.o -extra-y += vdso.lds vdso-offsets.h +extra-y += vdso.lds CPPFLAGS_vdso.lds += -P -C -U$(ARCH) # Force dependency (incbin is bad) @@ -42,11 +42,10 @@ $(obj)/%.so: $(obj)/%.so.dbg FORCE gen-vdsosym := $(srctree)/$(src)/gen_vdso_offsets.sh quiet_cmd_vdsosym = VDSOSYM $@ define cmd_vdsosym - $(NM) $< | $(gen-vdsosym) | LC_ALL=C sort > $@ && \ - cp $@ include/generated/ + $(NM) $< | $(gen-vdsosym) | LC_ALL=C sort > $@ endef -$(obj)/vdso-offsets.h: $(obj)/vdso.so.dbg FORCE +include/generated/vdso-offsets.h: $(obj)/vdso.so.dbg FORCE $(call if_changed,vdsosym) # Assembly rules for the .S files From 3453b6f3469d49916f07164c39d7547b2315fa7c Mon Sep 17 00:00:00 2001 From: Kevin Brodsky Date: Tue, 12 Jul 2016 11:23:59 +0100 Subject: [PATCH 287/521] UPSTREAM: arm64: Refactor vDSO time functions (cherry pick from commit b33f491f5a9aaf171b7de0f905362eb0314af478) Time functions are directly implemented in assembly in arm64, and it is desirable to keep it this way for performance reasons (everything fits in registers, so that the stack is not used at all). However, the current implementation is quite difficult to read and understand (even considering it's assembly). Additionally, due to the structure of __kernel_clock_gettime, which heavily uses conditional branches to share code between the different clocks, it is difficult to support a new clock without making the branches even harder to follow. This commit completely refactors the structure of clock_gettime (and gettimeofday along the way) while keeping exactly the same algorithms. We no longer try to share code; instead, macros provide common operations. This new approach comes with a number of advantages: - In clock_gettime, clock implementations are no longer interspersed, making them much more readable. Additionally, macros only use registers passed as arguments or reserved with .req, this way it is easy to make sure that registers are properly allocated. To avoid a large number of branches in a given execution path, a jump table is used; a normal execution uses 3 unconditional branches. - __do_get_tspec has been replaced with 2 macros (get_ts_clock_mono, get_clock_shifted_nsec) and explicit loading of data from the vDSO page. Consequently, clock_gettime and gettimeofday are now leaf functions, and saving x30 (lr) is no longer necessary. - Variables protected by tb_seq_count are now loaded all at once, allowing to merge the seqcnt_read macro into seqcnt_check. - For CLOCK_REALTIME_COARSE, removed an unused load of the wall to monotonic timespec. - For CLOCK_MONOTONIC_COARSE, removed a few shift instructions. Obviously, the downside of sharing less code is an increase in code size. However since the vDSO has its own code page, this does not really matter, as long as the size of the DSO remains below 4 kB. For now this should be all right: Before After vdso.so size (B) 2776 3000 Signed-off-by: Kevin Brodsky Reviewed-by: Dave Martin Acked-by: Will Deacon Signed-off-by: Catalin Marinas Bug: 20045882 Bug: 19198045 Change-Id: I0a0036c54419ee561efe3aad97489e7748ddc59e --- arch/arm64/kernel/vdso/gettimeofday.S | 298 +++++++++++++++----------- 1 file changed, 172 insertions(+), 126 deletions(-) diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S index efa79e8d4196..c06919ee29e1 100644 --- a/arch/arm64/kernel/vdso/gettimeofday.S +++ b/arch/arm64/kernel/vdso/gettimeofday.S @@ -26,24 +26,100 @@ #define NSEC_PER_SEC_HI16 0x3b9a vdso_data .req x6 -use_syscall .req w7 -seqcnt .req w8 +seqcnt .req w7 +w_tmp .req w8 +x_tmp .req x8 + +/* + * Conventions for macro arguments: + * - An argument is write-only if its name starts with "res". + * - All other arguments are read-only, unless otherwise specified. + */ .macro seqcnt_acquire 9999: ldr seqcnt, [vdso_data, #VDSO_TB_SEQ_COUNT] tbnz seqcnt, #0, 9999b dmb ishld - ldr use_syscall, [vdso_data, #VDSO_USE_SYSCALL] .endm - .macro seqcnt_read, cnt + .macro seqcnt_check fail dmb ishld - ldr \cnt, [vdso_data, #VDSO_TB_SEQ_COUNT] + ldr w_tmp, [vdso_data, #VDSO_TB_SEQ_COUNT] + cmp w_tmp, seqcnt + b.ne \fail .endm - .macro seqcnt_check, cnt, fail - cmp \cnt, seqcnt - b.ne \fail + .macro syscall_check fail + ldr w_tmp, [vdso_data, #VDSO_USE_SYSCALL] + cbnz w_tmp, \fail + .endm + + .macro get_nsec_per_sec res + mov \res, #NSEC_PER_SEC_LO16 + movk \res, #NSEC_PER_SEC_HI16, lsl #16 + .endm + + /* + * Returns the clock delta, in nanoseconds left-shifted by the clock + * shift. + */ + .macro get_clock_shifted_nsec res, cycle_last, mult + /* Read the virtual counter. */ + isb + mrs x_tmp, cntvct_el0 + /* Calculate cycle delta and convert to ns. */ + sub \res, x_tmp, \cycle_last + /* We can only guarantee 56 bits of precision. */ + movn x_tmp, #0xff00, lsl #48 + and \res, x_tmp, \res + mul \res, \res, \mult + .endm + + /* + * Returns in res_{sec,nsec} the REALTIME timespec, based on the + * "wall time" (xtime) and the clock_mono delta. + */ + .macro get_ts_realtime res_sec, res_nsec, \ + clock_nsec, xtime_sec, xtime_nsec, nsec_to_sec + add \res_nsec, \clock_nsec, \xtime_nsec + udiv x_tmp, \res_nsec, \nsec_to_sec + add \res_sec, \xtime_sec, x_tmp + msub \res_nsec, x_tmp, \nsec_to_sec, \res_nsec + .endm + + /* sec and nsec are modified in place. */ + .macro add_ts sec, nsec, ts_sec, ts_nsec, nsec_to_sec + /* Add timespec. */ + add \sec, \sec, \ts_sec + add \nsec, \nsec, \ts_nsec + + /* Normalise the new timespec. */ + cmp \nsec, \nsec_to_sec + b.lt 9999f + sub \nsec, \nsec, \nsec_to_sec + add \sec, \sec, #1 +9999: + cmp \nsec, #0 + b.ge 9998f + add \nsec, \nsec, \nsec_to_sec + sub \sec, \sec, #1 +9998: + .endm + + .macro clock_gettime_return, shift=0 + .if \shift == 1 + lsr x11, x11, x12 + .endif + stp x10, x11, [x1, #TSPEC_TV_SEC] + mov x0, xzr + ret + .endm + + .macro jump_slot jumptable, index, label + .if (. - \jumptable) != 4 * (\index) + .error "Jump slot index mismatch" + .endif + b \label .endm .text @@ -51,18 +127,24 @@ seqcnt .req w8 /* int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); */ ENTRY(__kernel_gettimeofday) .cfi_startproc - mov x2, x30 - .cfi_register x30, x2 - - /* Acquire the sequence counter and get the timespec. */ adr vdso_data, _vdso_data -1: seqcnt_acquire - cbnz use_syscall, 4f - /* If tv is NULL, skip to the timezone code. */ cbz x0, 2f - bl __do_get_tspec - seqcnt_check w9, 1b + + /* Compute the time of day. */ +1: seqcnt_acquire + syscall_check fail=4f + ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] + ldp w11, w12, [vdso_data, #VDSO_CS_MULT] + ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] + seqcnt_check fail=1b + + get_nsec_per_sec res=x9 + lsl x9, x9, x12 + + get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 + get_ts_realtime res_sec=x10, res_nsec=x11, \ + clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9 /* Convert ns to us. */ mov x13, #1000 @@ -76,95 +158,102 @@ ENTRY(__kernel_gettimeofday) stp w4, w5, [x1, #TZ_MINWEST] 3: mov x0, xzr - ret x2 + ret 4: /* Syscall fallback. */ mov x8, #__NR_gettimeofday svc #0 - ret x2 + ret .cfi_endproc ENDPROC(__kernel_gettimeofday) +#define JUMPSLOT_MAX CLOCK_MONOTONIC_COARSE + /* int __kernel_clock_gettime(clockid_t clock_id, struct timespec *tp); */ ENTRY(__kernel_clock_gettime) .cfi_startproc - cmp w0, #CLOCK_REALTIME - ccmp w0, #CLOCK_MONOTONIC, #0x4, ne - b.ne 2f - - mov x2, x30 - .cfi_register x30, x2 - - /* Get kernel timespec. */ + cmp w0, #JUMPSLOT_MAX + b.hi syscall adr vdso_data, _vdso_data -1: seqcnt_acquire - cbnz use_syscall, 7f + adr x_tmp, jumptable + add x_tmp, x_tmp, w0, uxtw #2 + br x_tmp - bl __do_get_tspec - seqcnt_check w9, 1b + ALIGN +jumptable: + jump_slot jumptable, CLOCK_REALTIME, realtime + jump_slot jumptable, CLOCK_MONOTONIC, monotonic + b syscall + b syscall + b syscall + jump_slot jumptable, CLOCK_REALTIME_COARSE, realtime_coarse + jump_slot jumptable, CLOCK_MONOTONIC_COARSE, monotonic_coarse - mov x30, x2 + .if (. - jumptable) != 4 * (JUMPSLOT_MAX + 1) + .error "Wrong jumptable size" + .endif - cmp w0, #CLOCK_MONOTONIC - b.ne 6f + ALIGN +realtime: + seqcnt_acquire + syscall_check fail=syscall + ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] + ldp w11, w12, [vdso_data, #VDSO_CS_MULT] + ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] + seqcnt_check fail=realtime - /* Get wtm timespec. */ - ldp x13, x14, [vdso_data, #VDSO_WTM_CLK_SEC] + /* All computations are done with left-shifted nsecs. */ + get_nsec_per_sec res=x9 + lsl x9, x9, x12 - /* Check the sequence counter. */ - seqcnt_read w9 - seqcnt_check w9, 1b - b 4f -2: - cmp w0, #CLOCK_REALTIME_COARSE - ccmp w0, #CLOCK_MONOTONIC_COARSE, #0x4, ne - b.ne 8f + get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 + get_ts_realtime res_sec=x10, res_nsec=x11, \ + clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9 + clock_gettime_return, shift=1 - /* xtime_coarse_nsec is already right-shifted */ - mov x12, #0 + ALIGN +monotonic: + seqcnt_acquire + syscall_check fail=syscall + ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] + ldp w11, w12, [vdso_data, #VDSO_CS_MULT] + ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] + ldp x3, x4, [vdso_data, #VDSO_WTM_CLK_SEC] + seqcnt_check fail=monotonic - /* Get coarse timespec. */ - adr vdso_data, _vdso_data -3: seqcnt_acquire + /* All computations are done with left-shifted nsecs. */ + lsl x4, x4, x12 + get_nsec_per_sec res=x9 + lsl x9, x9, x12 + + get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 + get_ts_realtime res_sec=x10, res_nsec=x11, \ + clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9 + + add_ts sec=x10, nsec=x11, ts_sec=x3, ts_nsec=x4, nsec_to_sec=x9 + clock_gettime_return, shift=1 + + ALIGN +realtime_coarse: + seqcnt_acquire ldp x10, x11, [vdso_data, #VDSO_XTIME_CRS_SEC] + seqcnt_check fail=realtime_coarse + clock_gettime_return - /* Get wtm timespec. */ + ALIGN +monotonic_coarse: + seqcnt_acquire + ldp x10, x11, [vdso_data, #VDSO_XTIME_CRS_SEC] ldp x13, x14, [vdso_data, #VDSO_WTM_CLK_SEC] + seqcnt_check fail=monotonic_coarse - /* Check the sequence counter. */ - seqcnt_read w9 - seqcnt_check w9, 3b + /* Computations are done in (non-shifted) nsecs. */ + get_nsec_per_sec res=x9 + add_ts sec=x10, nsec=x11, ts_sec=x13, ts_nsec=x14, nsec_to_sec=x9 + clock_gettime_return - cmp w0, #CLOCK_MONOTONIC_COARSE - b.ne 6f -4: - /* Add on wtm timespec. */ - add x10, x10, x13 - lsl x14, x14, x12 - add x11, x11, x14 - - /* Normalise the new timespec. */ - mov x15, #NSEC_PER_SEC_LO16 - movk x15, #NSEC_PER_SEC_HI16, lsl #16 - lsl x15, x15, x12 - cmp x11, x15 - b.lt 5f - sub x11, x11, x15 - add x10, x10, #1 -5: - cmp x11, #0 - b.ge 6f - add x11, x11, x15 - sub x10, x10, #1 - -6: /* Store to the user timespec. */ - lsr x11, x11, x12 - stp x10, x11, [x1, #TSPEC_TV_SEC] - mov x0, xzr - ret -7: - mov x30, x2 -8: /* Syscall fallback. */ + ALIGN +syscall: /* Syscall fallback. */ mov x8, #__NR_clock_gettime svc #0 ret @@ -203,46 +292,3 @@ ENTRY(__kernel_clock_getres) .quad CLOCK_COARSE_RES .cfi_endproc ENDPROC(__kernel_clock_getres) - -/* - * Read the current time from the architected counter. - * Expects vdso_data to be initialised. - * Clobbers the temporary registers (x9 - x15). - * Returns: - * - w9 = vDSO sequence counter - * - (x10, x11) = (ts->tv_sec, shifted ts->tv_nsec) - * - w12 = cs_shift - */ -ENTRY(__do_get_tspec) - .cfi_startproc - - /* Read from the vDSO data page. */ - ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] - ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] - ldp w11, w12, [vdso_data, #VDSO_CS_MULT] - seqcnt_read w9 - - /* Read the virtual counter. */ - isb - mrs x15, cntvct_el0 - - /* Calculate cycle delta and convert to ns. */ - sub x10, x15, x10 - /* We can only guarantee 56 bits of precision. */ - movn x15, #0xff00, lsl #48 - and x10, x15, x10 - mul x10, x10, x11 - - /* Use the kernel time to calculate the new timespec. */ - mov x11, #NSEC_PER_SEC_LO16 - movk x11, #NSEC_PER_SEC_HI16, lsl #16 - lsl x11, x11, x12 - add x15, x10, x14 - udiv x14, x15, x11 - add x10, x13, x14 - mul x13, x14, x11 - sub x11, x15, x13 - - ret - .cfi_endproc -ENDPROC(__do_get_tspec) From 688a44fed318c23f71b9b78615a5029425b91dc8 Mon Sep 17 00:00:00 2001 From: Kevin Brodsky Date: Tue, 12 Jul 2016 11:24:00 +0100 Subject: [PATCH 288/521] UPSTREAM: arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO (cherry pick from commit 49eea433b326a0ac5c7c941a011b2c65990bd19b) So far the arm64 clock_gettime() vDSO implementation only supported the following clocks, falling back to the syscall for the others: - CLOCK_REALTIME{,_COARSE} - CLOCK_MONOTONIC{,_COARSE} This patch adds support for the CLOCK_MONOTONIC_RAW clock, taking advantage of the recent refactoring of the vDSO time functions. Like the non-_COARSE clocks, this only works when the "arch_sys_counter" clocksource is in use (allowing us to read the current time from the virtual counter register), otherwise we also have to fall back to the syscall. Most of the data is shared with CLOCK_MONOTONIC, and the algorithm is similar. The reference implementation in kernel/time/timekeeping.c shows that: - CLOCK_MONOTONIC = tk->wall_to_monotonic + tk->xtime_sec + timekeeping_get_ns(&tk->tkr_mono) - CLOCK_MONOTONIC_RAW = tk->raw_time + timekeeping_get_ns(&tk->tkr_raw) - tkr_mono and tkr_raw are identical (in particular, same clocksource), except these members: * mult (only mono's multiplier is NTP-adjusted) * xtime_nsec (always 0 for raw) Therefore, tk->raw_time and tkr_raw->mult are now also stored in the vDSO data page. Cc: Ali Saidi Signed-off-by: Kevin Brodsky Reviewed-by: Dave Martin Acked-by: Will Deacon Signed-off-by: Catalin Marinas Bug: 20045882 Bug: 19198045 Change-Id: I854adcc1192757a1ac40662e85d466ca709d0682 --- arch/arm64/include/asm/vdso_datapage.h | 8 +++- arch/arm64/kernel/asm-offsets.c | 6 ++- arch/arm64/kernel/vdso.c | 8 +++- arch/arm64/kernel/vdso/gettimeofday.S | 57 +++++++++++++++++++++----- 4 files changed, 64 insertions(+), 15 deletions(-) diff --git a/arch/arm64/include/asm/vdso_datapage.h b/arch/arm64/include/asm/vdso_datapage.h index de66199673d7..2b9a63771eda 100644 --- a/arch/arm64/include/asm/vdso_datapage.h +++ b/arch/arm64/include/asm/vdso_datapage.h @@ -22,6 +22,8 @@ struct vdso_data { __u64 cs_cycle_last; /* Timebase at clocksource init */ + __u64 raw_time_sec; /* Raw time */ + __u64 raw_time_nsec; __u64 xtime_clock_sec; /* Kernel time */ __u64 xtime_clock_nsec; __u64 xtime_coarse_sec; /* Coarse time */ @@ -29,8 +31,10 @@ struct vdso_data { __u64 wtm_clock_sec; /* Wall to monotonic time */ __u64 wtm_clock_nsec; __u32 tb_seq_count; /* Timebase sequence counter */ - __u32 cs_mult; /* Clocksource multiplier */ - __u32 cs_shift; /* Clocksource shift */ + /* cs_* members must be adjacent and in this order (ldp accesses) */ + __u32 cs_mono_mult; /* NTP-adjusted clocksource multiplier */ + __u32 cs_shift; /* Clocksource shift (mono = raw) */ + __u32 cs_raw_mult; /* Raw clocksource multiplier */ __u32 tz_minuteswest; /* Whacky timezone stuff */ __u32 tz_dsttime; __u32 use_syscall; diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index c9ea87198789..350c0e99fc6b 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -92,6 +92,7 @@ int main(void) BLANK(); DEFINE(CLOCK_REALTIME, CLOCK_REALTIME); DEFINE(CLOCK_MONOTONIC, CLOCK_MONOTONIC); + DEFINE(CLOCK_MONOTONIC_RAW, CLOCK_MONOTONIC_RAW); DEFINE(CLOCK_REALTIME_RES, MONOTONIC_RES_NSEC); DEFINE(CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE); DEFINE(CLOCK_MONOTONIC_COARSE,CLOCK_MONOTONIC_COARSE); @@ -99,6 +100,8 @@ int main(void) DEFINE(NSEC_PER_SEC, NSEC_PER_SEC); BLANK(); DEFINE(VDSO_CS_CYCLE_LAST, offsetof(struct vdso_data, cs_cycle_last)); + DEFINE(VDSO_RAW_TIME_SEC, offsetof(struct vdso_data, raw_time_sec)); + DEFINE(VDSO_RAW_TIME_NSEC, offsetof(struct vdso_data, raw_time_nsec)); DEFINE(VDSO_XTIME_CLK_SEC, offsetof(struct vdso_data, xtime_clock_sec)); DEFINE(VDSO_XTIME_CLK_NSEC, offsetof(struct vdso_data, xtime_clock_nsec)); DEFINE(VDSO_XTIME_CRS_SEC, offsetof(struct vdso_data, xtime_coarse_sec)); @@ -106,7 +109,8 @@ int main(void) DEFINE(VDSO_WTM_CLK_SEC, offsetof(struct vdso_data, wtm_clock_sec)); DEFINE(VDSO_WTM_CLK_NSEC, offsetof(struct vdso_data, wtm_clock_nsec)); DEFINE(VDSO_TB_SEQ_COUNT, offsetof(struct vdso_data, tb_seq_count)); - DEFINE(VDSO_CS_MULT, offsetof(struct vdso_data, cs_mult)); + DEFINE(VDSO_CS_MONO_MULT, offsetof(struct vdso_data, cs_mono_mult)); + DEFINE(VDSO_CS_RAW_MULT, offsetof(struct vdso_data, cs_raw_mult)); DEFINE(VDSO_CS_SHIFT, offsetof(struct vdso_data, cs_shift)); DEFINE(VDSO_TZ_MINWEST, offsetof(struct vdso_data, tz_minuteswest)); DEFINE(VDSO_TZ_DSTTIME, offsetof(struct vdso_data, tz_dsttime)); diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 97bc68f4c689..54f7b327fd18 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -212,10 +212,16 @@ void update_vsyscall(struct timekeeper *tk) vdso_data->wtm_clock_nsec = tk->wall_to_monotonic.tv_nsec; if (!use_syscall) { + /* tkr_mono.cycle_last == tkr_raw.cycle_last */ vdso_data->cs_cycle_last = tk->tkr_mono.cycle_last; + vdso_data->raw_time_sec = tk->raw_time.tv_sec; + vdso_data->raw_time_nsec = tk->raw_time.tv_nsec; vdso_data->xtime_clock_sec = tk->xtime_sec; vdso_data->xtime_clock_nsec = tk->tkr_mono.xtime_nsec; - vdso_data->cs_mult = tk->tkr_mono.mult; + /* tkr_raw.xtime_nsec == 0 */ + vdso_data->cs_mono_mult = tk->tkr_mono.mult; + vdso_data->cs_raw_mult = tk->tkr_raw.mult; + /* tkr_mono.shift == tkr_raw.shift */ vdso_data->cs_shift = tk->tkr_mono.shift; } diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S index c06919ee29e1..e00b4671bd7c 100644 --- a/arch/arm64/kernel/vdso/gettimeofday.S +++ b/arch/arm64/kernel/vdso/gettimeofday.S @@ -87,6 +87,15 @@ x_tmp .req x8 msub \res_nsec, x_tmp, \nsec_to_sec, \res_nsec .endm + /* + * Returns in res_{sec,nsec} the timespec based on the clock_raw delta, + * used for CLOCK_MONOTONIC_RAW. + */ + .macro get_ts_clock_raw res_sec, res_nsec, clock_nsec, nsec_to_sec + udiv \res_sec, \clock_nsec, \nsec_to_sec + msub \res_nsec, \res_sec, \nsec_to_sec, \clock_nsec + .endm + /* sec and nsec are modified in place. */ .macro add_ts sec, nsec, ts_sec, ts_nsec, nsec_to_sec /* Add timespec. */ @@ -135,7 +144,8 @@ ENTRY(__kernel_gettimeofday) 1: seqcnt_acquire syscall_check fail=4f ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] - ldp w11, w12, [vdso_data, #VDSO_CS_MULT] + /* w11 = cs_mono_mult, w12 = cs_shift */ + ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT] ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] seqcnt_check fail=1b @@ -172,20 +182,20 @@ ENDPROC(__kernel_gettimeofday) /* int __kernel_clock_gettime(clockid_t clock_id, struct timespec *tp); */ ENTRY(__kernel_clock_gettime) .cfi_startproc - cmp w0, #JUMPSLOT_MAX - b.hi syscall + cmp w0, #JUMPSLOT_MAX + b.hi syscall adr vdso_data, _vdso_data - adr x_tmp, jumptable - add x_tmp, x_tmp, w0, uxtw #2 - br x_tmp + adr x_tmp, jumptable + add x_tmp, x_tmp, w0, uxtw #2 + br x_tmp ALIGN jumptable: jump_slot jumptable, CLOCK_REALTIME, realtime jump_slot jumptable, CLOCK_MONOTONIC, monotonic - b syscall - b syscall - b syscall + b syscall + b syscall + jump_slot jumptable, CLOCK_MONOTONIC_RAW, monotonic_raw jump_slot jumptable, CLOCK_REALTIME_COARSE, realtime_coarse jump_slot jumptable, CLOCK_MONOTONIC_COARSE, monotonic_coarse @@ -198,7 +208,8 @@ realtime: seqcnt_acquire syscall_check fail=syscall ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] - ldp w11, w12, [vdso_data, #VDSO_CS_MULT] + /* w11 = cs_mono_mult, w12 = cs_shift */ + ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT] ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] seqcnt_check fail=realtime @@ -216,7 +227,8 @@ monotonic: seqcnt_acquire syscall_check fail=syscall ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] - ldp w11, w12, [vdso_data, #VDSO_CS_MULT] + /* w11 = cs_mono_mult, w12 = cs_shift */ + ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT] ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] ldp x3, x4, [vdso_data, #VDSO_WTM_CLK_SEC] seqcnt_check fail=monotonic @@ -233,6 +245,28 @@ monotonic: add_ts sec=x10, nsec=x11, ts_sec=x3, ts_nsec=x4, nsec_to_sec=x9 clock_gettime_return, shift=1 + ALIGN +monotonic_raw: + seqcnt_acquire + syscall_check fail=syscall + ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] + /* w11 = cs_raw_mult, w12 = cs_shift */ + ldp w12, w11, [vdso_data, #VDSO_CS_SHIFT] + ldp x13, x14, [vdso_data, #VDSO_RAW_TIME_SEC] + seqcnt_check fail=monotonic_raw + + /* All computations are done with left-shifted nsecs. */ + lsl x14, x14, x12 + get_nsec_per_sec res=x9 + lsl x9, x9, x12 + + get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 + get_ts_clock_raw res_sec=x10, res_nsec=x11, \ + clock_nsec=x15, nsec_to_sec=x9 + + add_ts sec=x10, nsec=x11, ts_sec=x13, ts_nsec=x14, nsec_to_sec=x9 + clock_gettime_return, shift=1 + ALIGN realtime_coarse: seqcnt_acquire @@ -265,6 +299,7 @@ ENTRY(__kernel_clock_getres) .cfi_startproc cmp w0, #CLOCK_REALTIME ccmp w0, #CLOCK_MONOTONIC, #0x4, ne + ccmp w0, #CLOCK_MONOTONIC_RAW, #0x4, ne b.ne 1f ldr x2, 5f From 74a0b466674c7ebaf8aeb03cbb0299be30bb78e9 Mon Sep 17 00:00:00 2001 From: Jisheng Zhang Date: Mon, 15 Aug 2016 09:20:21 +0100 Subject: [PATCH 289/521] UPSTREAM: ARM: 8597/1: VDSO: put RO and RO after init objects into proper sections (cherry pick from commit 92bb8d5d55f7fe1a7a1201c42120c1611840807c) vdso_data_mapping is never modified, so mark it as const. vdso_total_pages, vdso_data_page, vdso_text_mapping and cntvct_ok are initialized by vdso_init(), thereafter are read only. The fact that they are read only after init makes them candidates for __ro_after_init declarations. Signed-off-by: Jisheng Zhang Reviewed-by: Kees Cook Acked-by: Nathan Lynch Signed-off-by: Russell King Bug: 20045882 Bug: 19198045 Change-Id: I0b2f9807d3cd66c5ef7457ac6fc180a41cbe05c6 --- arch/arm/kernel/vdso.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/arm/kernel/vdso.c b/arch/arm/kernel/vdso.c index 994e971a8538..bbbffe946122 100644 --- a/arch/arm/kernel/vdso.c +++ b/arch/arm/kernel/vdso.c @@ -17,6 +17,7 @@ * along with this program. If not, see . */ +#include #include #include #include @@ -39,7 +40,7 @@ static struct page **vdso_text_pagelist; /* Total number of pages needed for the data and text portions of the VDSO. */ -unsigned int vdso_total_pages __read_mostly; +unsigned int vdso_total_pages __ro_after_init; /* * The VDSO data page. @@ -47,13 +48,13 @@ unsigned int vdso_total_pages __read_mostly; static union vdso_data_store vdso_data_store __page_aligned_data; static struct vdso_data *vdso_data = &vdso_data_store.data; -static struct page *vdso_data_page; -static struct vm_special_mapping vdso_data_mapping = { +static struct page *vdso_data_page __ro_after_init; +static const struct vm_special_mapping vdso_data_mapping = { .name = "[vvar]", .pages = &vdso_data_page, }; -static struct vm_special_mapping vdso_text_mapping = { +static struct vm_special_mapping vdso_text_mapping __ro_after_init = { .name = "[vdso]", }; @@ -67,7 +68,7 @@ struct elfinfo { /* Cached result of boot-time check for whether the arch timer exists, * and if so, whether the virtual counter is useable. */ -static bool cntvct_ok __read_mostly; +static bool cntvct_ok __ro_after_init; static bool __init cntvct_functional(void) { From 7a2364f89c98cc226b6bff28dc4ece8137f8d0a1 Mon Sep 17 00:00:00 2001 From: Jisheng Zhang Date: Mon, 15 Aug 2016 14:45:44 +0800 Subject: [PATCH 290/521] UPSTREAM: arm64: vdso: add __init section marker to alloc_vectors_page (cherry pick from commit 1aed28f94ce6c1f6c24bcbbd5fcd749b55f65e9e) It is not needed after booting, this patch moves the alloc_vectors_page function to the __init section. Acked-by: Mark Rutland Signed-off-by: Jisheng Zhang Signed-off-by: Will Deacon Bug: 20045882 Bug: 19198045 Change-Id: I1d7b89f9e20890b18e77c8752f1eef33d721ea81 --- arch/arm64/kernel/vdso.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 54f7b327fd18..0949b4812887 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -55,7 +55,7 @@ struct vdso_data *vdso_data = &vdso_data_store.data; */ static struct page *vectors_page[1]; -static int alloc_vectors_page(void) +static int __init alloc_vectors_page(void) { extern char __kuser_helper_start[], __kuser_helper_end[]; extern char __aarch32_sigret_code_start[], __aarch32_sigret_code_end[]; From b18f4800124818b2412f995a42ae9224ddfdafea Mon Sep 17 00:00:00 2001 From: Jisheng Zhang Date: Mon, 15 Aug 2016 14:45:45 +0800 Subject: [PATCH 291/521] UPSTREAM: arm64: vdso: constify vm_special_mapping used for aarch32 vectors page (cherry pick from commit b6d081bddf397026575a437b603b118dff2606ff) The vm_special_mapping spec which is used for aarch32 vectors page is never modified, so mark it as const. Acked-by: Mark Rutland Signed-off-by: Jisheng Zhang Signed-off-by: Will Deacon Bug: 20045882 Bug: 19198045 Change-Id: If1afd11bfeefee45a2b96634b18143937a773a14 --- arch/arm64/kernel/vdso.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 0949b4812887..3b8acfae7797 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -88,7 +88,7 @@ int aarch32_setup_vectors_page(struct linux_binprm *bprm, int uses_interp) { struct mm_struct *mm = current->mm; unsigned long addr = AARCH32_VECTORS_BASE; - static struct vm_special_mapping spec = { + static const struct vm_special_mapping spec = { .name = "[vectors]", .pages = vectors_page, From 1adc015041ee37104594c3fc945a8a645bc5cb28 Mon Sep 17 00:00:00 2001 From: Dmitry Shmidt Date: Tue, 28 Mar 2017 13:30:18 -0700 Subject: [PATCH 292/521] ANDROID: ARM64: Allow to choose appended kernel image By default appended kernel image is Image.gz-dtb. New config option BUILD_ARM64_APPENDED_KERNEL_IMAGE_NAME allows to choose between Image.gz-dtb and Image-dtb. Change-Id: I1c71b85136f1beeb61782e4646820718c1ccd7e4 Signed-off-by: Dmitry Shmidt --- arch/arm64/Kconfig | 20 ++++++++++++++++++++ arch/arm64/Makefile | 2 +- 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 4bda2c06fe05..bf87f37e84b2 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -934,6 +934,26 @@ config BUILD_ARM64_APPENDED_DTB_IMAGE DTBs to be built by default (instead of a standalone Image.gz.) The image will built in arch/arm64/boot/Image.gz-dtb +choice + prompt "Appended DTB Kernel Image name" + depends on BUILD_ARM64_APPENDED_DTB_IMAGE + help + Enabling this option will cause a specific kernel image Image or + Image.gz to be used for final image creation. + The image will built in arch/arm64/boot/IMAGE-NAME-dtb + + config IMG_GZ_DTB + bool "Image.gz-dtb" + config IMG_DTB + bool "Image-dtb" +endchoice + +config BUILD_ARM64_APPENDED_KERNEL_IMAGE_NAME + string + depends on BUILD_ARM64_APPENDED_DTB_IMAGE + default "Image.gz-dtb" if IMG_GZ_DTB + default "Image-dtb" if IMG_DTB + config BUILD_ARM64_APPENDED_DTB_IMAGE_NAMES string "Default dtb names" depends on BUILD_ARM64_APPENDED_DTB_IMAGE diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index f76ec8d5ac6b..f1d8a05727cf 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -87,7 +87,7 @@ core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a # Default target when executing plain make ifeq ($(CONFIG_BUILD_ARM64_APPENDED_DTB_IMAGE),y) -KBUILD_IMAGE := Image.gz-dtb +KBUILD_IMAGE := $(subst $\",,$(CONFIG_BUILD_ARM64_APPENDED_KERNEL_IMAGE_NAME)) else KBUILD_IMAGE := Image.gz endif From 8be5a2f50dc724ff9b60f4d90778b85106f99f51 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Tue, 11 Oct 2016 13:54:38 -0700 Subject: [PATCH 293/521] config: android: move device mapper options to recommended CONFIG_MD is in recommended, but other dependent options like DM_CRYPT and DM_VERITY options are in base. The result is the options in base don't get enabled when applying both base and recommended fragments. Move all the options to recommended. Link: http://lkml.kernel.org/r/20160908185934.18098-1-robh@kernel.org Signed-off-by: Rob Herring Acked-by: John Stultz Cc: Amit Pundir Cc: Dmitry Shmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit f023a3956f273859ed36f624f75a66c272124b16) Signed-off-by: Greg Kroah-Hartman --- android/configs/android-base.cfg | 4 ---- android/configs/android-recommended.cfg | 4 ++++ 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index d1f9628e4377..59d4908b3343 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -13,7 +13,6 @@ CONFIG_ANDROID_LOW_MEMORY_KILLER=y CONFIG_ARMV8_DEPRECATED=y CONFIG_ASHMEM=y CONFIG_AUDIT=y -CONFIG_BLK_DEV_DM=y CONFIG_BLK_DEV_INITRD=y CONFIG_CGROUPS=y CONFIG_CGROUP_CPUACCT=y @@ -21,9 +20,6 @@ CONFIG_CGROUP_DEBUG=y CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_SCHED=y CONFIG_CP15_BARRIER_EMULATION=y -CONFIG_DM_CRYPT=y -CONFIG_DM_VERITY=y -CONFIG_DM_VERITY_FEC=y CONFIG_EMBEDDED=y CONFIG_FB=y CONFIG_HARDENED_USERCOPY=y diff --git a/android/configs/android-recommended.cfg b/android/configs/android-recommended.cfg index 28610303db60..e346b378dd64 100644 --- a/android/configs/android-recommended.cfg +++ b/android/configs/android-recommended.cfg @@ -10,6 +10,7 @@ CONFIG_ANDROID_TIMED_GPIO=y CONFIG_ARM_KERNMEM_PERMS=y CONFIG_ARM64_SW_TTBR0_PAN=y CONFIG_BACKLIGHT_LCD_SUPPORT=y +CONFIG_BLK_DEV_DM=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=8192 @@ -17,7 +18,10 @@ CONFIG_CC_STACKPROTECTOR_STRONG=y CONFIG_COMPACTION=y CONFIG_CPU_SW_DOMAIN_PAN=y CONFIG_DEBUG_RODATA=y +CONFIG_DM_CRYPT=y CONFIG_DM_UEVENT=y +CONFIG_DM_VERITY=y +CONFIG_DM_VERITY_FEC=y CONFIG_DRAGONRISE_FF=y CONFIG_ENABLE_DEFAULT_TRACERS=y CONFIG_EXT4_FS=y From 9a714c9661a0509c00847b4aaf5d96144e66dbc8 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Tue, 11 Oct 2016 13:54:41 -0700 Subject: [PATCH 294/521] UPSTREAM: config: android: set SELinux as default security mode Android won't boot without SELinux enabled, so make it the default. Link: http://lkml.kernel.org/r/20160908185934.18098-2-robh@kernel.org Signed-off-by: Rob Herring Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit d90ae51a3e7556c9f50431db43cd8190934ccf94) Signed-off-by: Greg Kroah-Hartman --- android/configs/android-base.cfg | 1 + 1 file changed, 1 insertion(+) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index 59d4908b3343..3672bf339006 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -20,6 +20,7 @@ CONFIG_CGROUP_DEBUG=y CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_SCHED=y CONFIG_CP15_BARRIER_EMULATION=y +CONFIG_DEFAULT_SECURITY_SELINUX=y CONFIG_EMBEDDED=y CONFIG_FB=y CONFIG_HARDENED_USERCOPY=y From f91f25c224eead4166bbe96835ab04f13458585d Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Tue, 11 Oct 2016 13:54:36 -0700 Subject: [PATCH 295/521] UPSTREAM: config/android: Remove CONFIG_IPV6_PRIVACY Option is long gone, see commit 5d9efa7ee99e ("ipv6: Remove privacy config option.") Link: http://lkml.kernel.org/r/20160811170340.9859-1-bp@alien8.de Signed-off-by: Borislav Petkov Cc: Rob Herring Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit a2c6a235dbf4318fc7f7981932478e6c47f093ab) Signed-off-by: Greg Kroah-Hartman --- android/configs/android-base.cfg | 1 - 1 file changed, 1 deletion(-) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index 3672bf339006..1ff32af855c3 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -41,7 +41,6 @@ CONFIG_IPV6=y CONFIG_IPV6_MIP6=y CONFIG_IPV6_MULTIPLE_TABLES=y CONFIG_IPV6_OPTIMISTIC_DAD=y -CONFIG_IPV6_PRIVACY=y CONFIG_IPV6_ROUTER_PREF=y CONFIG_IPV6_ROUTE_INFO=y CONFIG_IP_ADVANCED_ROUTER=y From 2168b67c0425f45e9fa6004ab19f9b6f5d561d52 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sat, 1 Apr 2017 19:08:12 +0200 Subject: [PATCH 296/521] ANDROID: sort android-recommended.cfg It got out-of-order, so resort it to make it easier to sync with other trees. Signed-off-by: Greg Kroah-Hartman --- android/configs/android-recommended.cfg | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/android/configs/android-recommended.cfg b/android/configs/android-recommended.cfg index e346b378dd64..eecf8d80453a 100644 --- a/android/configs/android-recommended.cfg +++ b/android/configs/android-recommended.cfg @@ -7,8 +7,8 @@ # CONFIG_PM_WAKELOCKS_GC is not set # CONFIG_VT is not set CONFIG_ANDROID_TIMED_GPIO=y -CONFIG_ARM_KERNMEM_PERMS=y CONFIG_ARM64_SW_TTBR0_PAN=y +CONFIG_ARM_KERNMEM_PERMS=y CONFIG_BACKLIGHT_LCD_SUPPORT=y CONFIG_BLK_DEV_DM=y CONFIG_BLK_DEV_LOOP=y @@ -97,6 +97,7 @@ CONFIG_LOGIRUMBLEPAD2_FF=y CONFIG_LOGITECH_FF=y CONFIG_MD=y CONFIG_MEDIA_SUPPORT=y +CONFIG_MEMORY_STATE_TIME=y CONFIG_MSDOS_FS=y CONFIG_PANIC_TIMEOUT=5 CONFIG_PANTHERLORD_FF=y @@ -126,7 +127,6 @@ CONFIG_TIMER_STATS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_UHID=y -CONFIG_MEMORY_STATE_TIME=y CONFIG_USB_ANNOUNCE_NEW_DEVICES=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_HIDDEV=y From 39353176c5da6b722b913cfdcd3ca1a08c3d09d6 Mon Sep 17 00:00:00 2001 From: Martijn Coenen Date: Fri, 31 Mar 2017 09:33:33 -0700 Subject: [PATCH 297/521] ANDROID: binder: add hwbinder,vndbinder to BINDER_DEVICES. These will be required going forward. Change-Id: Idf0593461cef88051564ae0df495c156e31048c4 Signed-off-by: Martijn Coenen --- android/configs/android-base.cfg | 1 + drivers/android/Kconfig | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index 1ff32af855c3..ece06816e046 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -9,6 +9,7 @@ # CONFIG_USELIB is not set CONFIG_ANDROID=y CONFIG_ANDROID_BINDER_IPC=y +CONFIG_ANDROID_BINDER_DEVICES=binder,hwbinder,vndbinder CONFIG_ANDROID_LOW_MEMORY_KILLER=y CONFIG_ARMV8_DEPRECATED=y CONFIG_ASHMEM=y diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig index a82fc022d34b..4d4cdc1a6e25 100644 --- a/drivers/android/Kconfig +++ b/drivers/android/Kconfig @@ -22,7 +22,7 @@ config ANDROID_BINDER_IPC config ANDROID_BINDER_DEVICES string "Android Binder devices" depends on ANDROID_BINDER_IPC - default "binder" + default "binder,hwbinder,vndbinder" ---help--- Default value for the binder.devices parameter. From 9b4449a65cb89cd27805b8a2b0dce0cc3016a76d Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 4 Apr 2017 19:50:20 +0200 Subject: [PATCH 298/521] ANDROID: android-base.cfg: properly sort the file It somehow got out of alphabetical order, fix it to make merges and testing easier. Signed-off-by: Greg Kroah-Hartman --- android/configs/android-base.cfg | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index ece06816e046..a1969d0d7926 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -8,8 +8,8 @@ # CONFIG_SYSVIPC is not set # CONFIG_USELIB is not set CONFIG_ANDROID=y -CONFIG_ANDROID_BINDER_IPC=y CONFIG_ANDROID_BINDER_DEVICES=binder,hwbinder,vndbinder +CONFIG_ANDROID_BINDER_IPC=y CONFIG_ANDROID_LOW_MEMORY_KILLER=y CONFIG_ARMV8_DEPRECATED=y CONFIG_ASHMEM=y @@ -140,9 +140,9 @@ CONFIG_PREEMPT=y CONFIG_PROFILING=y CONFIG_QFMT_V2=y CONFIG_QUOTA=y +CONFIG_QUOTACTL=y CONFIG_QUOTA_NETLINK_INTERFACE=y CONFIG_QUOTA_TREE=y -CONFIG_QUOTACTL=y CONFIG_RANDOMIZE_BASE=y CONFIG_RTC_CLASS=y CONFIG_RT_GROUP_SCHED=y @@ -158,14 +158,14 @@ CONFIG_SYNC=y CONFIG_TUN=y CONFIG_UID_SYS_STATS=y CONFIG_UNIX=y -CONFIG_USB_GADGET=y CONFIG_USB_CONFIGFS=y -CONFIG_USB_CONFIGFS_F_FS=y -CONFIG_USB_CONFIGFS_F_MTP=y -CONFIG_USB_CONFIGFS_F_PTP=y CONFIG_USB_CONFIGFS_F_ACC=y CONFIG_USB_CONFIGFS_F_AUDIO_SRC=y -CONFIG_USB_CONFIGFS_UEVENT=y +CONFIG_USB_CONFIGFS_F_FS=y CONFIG_USB_CONFIGFS_F_MIDI=y +CONFIG_USB_CONFIGFS_F_MTP=y +CONFIG_USB_CONFIGFS_F_PTP=y +CONFIG_USB_CONFIGFS_UEVENT=y +CONFIG_USB_GADGET=y CONFIG_USB_OTG_WAKELOCK=y CONFIG_XFRM_USER=y From 6286b142aeb28bd7c7a7cd4679f694686e5af6ee Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 4 Apr 2017 19:45:38 +0200 Subject: [PATCH 299/521] ANDROID: android-base.cfg: add CONFIG_IKCONFIG option This adds CONFIG_IKCONFIG and CONFIG_IKCONFIG_PROC options, which are a requirement for the O release. Bug: 35803310 Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 7d9280f579ff0731facb1e10f32e4a88a07f33f8) --- android/configs/android-base.cfg | 2 ++ 1 file changed, 2 insertions(+) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index a1969d0d7926..7d4ce82f5b8e 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -26,6 +26,8 @@ CONFIG_EMBEDDED=y CONFIG_FB=y CONFIG_HARDENED_USERCOPY=y CONFIG_HIGH_RES_TIMERS=y +CONFIG_IKCONFIG=y +CONFIG_IKCONFIG_PROC=y CONFIG_INET6_AH=y CONFIG_INET6_ESP=y CONFIG_INET6_IPCOMP=y From 353a964727cfbca134e91134996e2bd9039d792c Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 4 Apr 2017 19:46:59 +0200 Subject: [PATCH 300/521] ANDROID: android-base.cfg: add CONFIG_MODULES option This adds CONFIG_MODULES, CONFIG_MODULE_UNLOAD, and CONFIG_MODVERSIONS which are required by the O release. Bug: 35803310 Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 56f22e654a311f3c2492b8b3609916265fe34e20) --- android/configs/android-base.cfg | 3 +++ 1 file changed, 3 insertions(+) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index 7d4ce82f5b8e..59c4f3a0df19 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -65,6 +65,9 @@ CONFIG_IP_NF_TARGET_MASQUERADE=y CONFIG_IP_NF_TARGET_NETMAP=y CONFIG_IP_NF_TARGET_REDIRECT=y CONFIG_IP_NF_TARGET_REJECT=y +CONFIG_MODULES=y +CONFIG_MODULE_UNLOAD=y +CONFIG_MODVERSIONS=y CONFIG_NET=y CONFIG_NETDEVICES=y CONFIG_NETFILTER=y From aadba8263313c853416a3c33f23b1d8c077c3f39 Mon Sep 17 00:00:00 2001 From: Lorenzo Colitti Date: Mon, 16 Jan 2017 17:35:52 +0900 Subject: [PATCH 301/521] android: base-cfg: enable CONFIG_INET_DIAG_DESTROY As of Android N, this is required to close sockets when a network disconnects. Change-Id: I9fe81c5fc5224c17bfd8d9e236ea9e436b5971cb (cherry picked from commit 4a15cee4bdaf764756e98cd8f03784f330459ab1) Signed-off-by: Greg Kroah-Hartman --- android/configs/android-base.cfg | 1 + 1 file changed, 1 insertion(+) diff --git a/android/configs/android-base.cfg b/android/configs/android-base.cfg index 59c4f3a0df19..30e01074f64a 100644 --- a/android/configs/android-base.cfg +++ b/android/configs/android-base.cfg @@ -29,6 +29,7 @@ CONFIG_HIGH_RES_TIMERS=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_INET6_AH=y +CONFIG_INET6_DIAG_DESTROY=y CONFIG_INET6_ESP=y CONFIG_INET6_IPCOMP=y CONFIG_INET=y From f8dc2db873ec44ab98cf5ecdec3b16d092eea80e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Maciej=20=C5=BBenczykowski?= Date: Tue, 27 Sep 2016 23:57:58 -0700 Subject: [PATCH 302/521] UPSTREAM: ipv6 addrconf: implement RFC7559 router solicitation backoff MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This implements: https://tools.ietf.org/html/rfc7559 Backoff is performed according to RFC3315 section 14: https://tools.ietf.org/html/rfc3315#section-14 We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations to a negative value meaning an unlimited number of retransmits, and we make this the new default (inline with the RFC). We also add a new setting: /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval defaulting to 1 hour (per RFC recommendation). Signed-off-by: Maciej Żenczykowski Acked-by: Erik Kline Signed-off-by: David S. Miller (cherry picked from commit bd11f0741fa5a2c296629898ad07759dd12b35bb in DaveM's net-next/master, should make Linus' tree in 4.9-rc1) Change-Id: Ia32cdc5c61481893ef8040734e014bf2229fc39e --- include/linux/ipv6.h | 1 + include/net/addrconf.h | 3 ++- include/net/if_inet6.h | 1 + net/ipv6/addrconf.c | 51 ++++++++++++++++++++++++++++++++++++------ 4 files changed, 48 insertions(+), 8 deletions(-) diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 1182f0e21697..a0fc3cf932af 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -18,6 +18,7 @@ struct ipv6_devconf { __s32 dad_transmits; __s32 rtr_solicits; __s32 rtr_solicit_interval; + __s32 rtr_solicit_max_interval; __s32 rtr_solicit_delay; __s32 force_mld_version; __s32 mldv1_unsolicited_report_interval; diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 3275ddf9f00d..d540657819ef 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -1,8 +1,9 @@ #ifndef _ADDRCONF_H #define _ADDRCONF_H -#define MAX_RTR_SOLICITATIONS 3 +#define MAX_RTR_SOLICITATIONS -1 /* unlimited */ #define RTR_SOLICITATION_INTERVAL (4*HZ) +#define RTR_SOLICITATION_MAX_INTERVAL (3600*HZ) /* 1 hour */ #define MIN_VALID_LIFETIME (2*3600) /* 2 hours */ diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h index 1c8b6820b694..515352c6280a 100644 --- a/include/net/if_inet6.h +++ b/include/net/if_inet6.h @@ -201,6 +201,7 @@ struct inet6_dev { struct ipv6_devstat stats; struct timer_list rs_timer; + __s32 rs_interval; /* in jiffies */ __u8 rs_probes; __u8 addr_gen_mode; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 860ffbe778cf..a942a18b7943 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -112,6 +112,27 @@ static inline u32 cstamp_delta(unsigned long cstamp) return (cstamp - INITIAL_JIFFIES) * 100UL / HZ; } +static inline s32 rfc3315_s14_backoff_init(s32 irt) +{ + /* multiply 'initial retransmission time' by 0.9 .. 1.1 */ + u64 tmp = (900000 + prandom_u32() % 200001) * (u64)irt; + do_div(tmp, 1000000); + return (s32)tmp; +} + +static inline s32 rfc3315_s14_backoff_update(s32 rt, s32 mrt) +{ + /* multiply 'retransmission timeout' by 1.9 .. 2.1 */ + u64 tmp = (1900000 + prandom_u32() % 200001) * (u64)rt; + do_div(tmp, 1000000); + if ((s32)tmp > mrt) { + /* multiply 'maximum retransmission time' by 0.9 .. 1.1 */ + tmp = (900000 + prandom_u32() % 200001) * (u64)mrt; + do_div(tmp, 1000000); + } + return (s32)tmp; +} + #ifdef CONFIG_SYSCTL static int addrconf_sysctl_register(struct inet6_dev *idev); static void addrconf_sysctl_unregister(struct inet6_dev *idev); @@ -187,6 +208,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .dad_transmits = 1, .rtr_solicits = MAX_RTR_SOLICITATIONS, .rtr_solicit_interval = RTR_SOLICITATION_INTERVAL, + .rtr_solicit_max_interval = RTR_SOLICITATION_MAX_INTERVAL, .rtr_solicit_delay = MAX_RTR_SOLICITATION_DELAY, .use_tempaddr = 0, .temp_valid_lft = TEMP_VALID_LIFETIME, @@ -233,6 +255,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .dad_transmits = 1, .rtr_solicits = MAX_RTR_SOLICITATIONS, .rtr_solicit_interval = RTR_SOLICITATION_INTERVAL, + .rtr_solicit_max_interval = RTR_SOLICITATION_MAX_INTERVAL, .rtr_solicit_delay = MAX_RTR_SOLICITATION_DELAY, .use_tempaddr = 0, .temp_valid_lft = TEMP_VALID_LIFETIME, @@ -3487,7 +3510,7 @@ static void addrconf_rs_timer(unsigned long data) if (idev->if_flags & IF_RA_RCVD) goto out; - if (idev->rs_probes++ < idev->cnf.rtr_solicits) { + if (idev->rs_probes++ < idev->cnf.rtr_solicits || idev->cnf.rtr_solicits < 0) { write_unlock(&idev->lock); if (!ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE)) ndisc_send_rs(dev, &lladdr, @@ -3496,11 +3519,13 @@ static void addrconf_rs_timer(unsigned long data) goto put; write_lock(&idev->lock); + idev->rs_interval = rfc3315_s14_backoff_update( + idev->rs_interval, idev->cnf.rtr_solicit_max_interval); /* The wait after the last probe can be shorter */ addrconf_mod_rs_timer(idev, (idev->rs_probes == idev->cnf.rtr_solicits) ? idev->cnf.rtr_solicit_delay : - idev->cnf.rtr_solicit_interval); + idev->rs_interval); } else { /* * Note: we do not support deprecated "all on-link" @@ -3728,7 +3753,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp) send_mld = ifp->scope == IFA_LINK && ipv6_lonely_lladdr(ifp); send_rs = send_mld && ipv6_accept_ra(ifp->idev) && - ifp->idev->cnf.rtr_solicits > 0 && + ifp->idev->cnf.rtr_solicits != 0 && (dev->flags&IFF_LOOPBACK) == 0; read_unlock_bh(&ifp->idev->lock); @@ -3750,10 +3775,11 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp) write_lock_bh(&ifp->idev->lock); spin_lock(&ifp->lock); + ifp->idev->rs_interval = rfc3315_s14_backoff_init( + ifp->idev->cnf.rtr_solicit_interval); ifp->idev->rs_probes = 1; ifp->idev->if_flags |= IF_RS_SENT; - addrconf_mod_rs_timer(ifp->idev, - ifp->idev->cnf.rtr_solicit_interval); + addrconf_mod_rs_timer(ifp->idev, ifp->idev->rs_interval); spin_unlock(&ifp->lock); write_unlock_bh(&ifp->idev->lock); } @@ -4670,6 +4696,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, array[DEVCONF_RTR_SOLICITS] = cnf->rtr_solicits; array[DEVCONF_RTR_SOLICIT_INTERVAL] = jiffies_to_msecs(cnf->rtr_solicit_interval); + array[DEVCONF_RTR_SOLICIT_MAX_INTERVAL] = + jiffies_to_msecs(cnf->rtr_solicit_max_interval); array[DEVCONF_RTR_SOLICIT_DELAY] = jiffies_to_msecs(cnf->rtr_solicit_delay); array[DEVCONF_FORCE_MLD_VERSION] = cnf->force_mld_version; @@ -4879,7 +4907,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token) return -EINVAL; if (!ipv6_accept_ra(idev)) return -EINVAL; - if (idev->cnf.rtr_solicits <= 0) + if (idev->cnf.rtr_solicits == 0) return -EINVAL; write_lock_bh(&idev->lock); @@ -4904,8 +4932,10 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token) if (update_rs) { idev->if_flags |= IF_RS_SENT; + idev->rs_interval = rfc3315_s14_backoff_init( + idev->cnf.rtr_solicit_interval); idev->rs_probes = 1; - addrconf_mod_rs_timer(idev, idev->cnf.rtr_solicit_interval); + addrconf_mod_rs_timer(idev, idev->rs_interval); } /* Well, that's kinda nasty ... */ @@ -5542,6 +5572,13 @@ static struct addrconf_sysctl_table .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, + { + .procname = "router_solicitation_max_interval", + .data = &ipv6_devconf.rtr_solicit_max_interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_jiffies, + }, { .procname = "router_solicitation_delay", .data = &ipv6_devconf.rtr_solicit_delay, From 4e2cd7c27606e5666d679b692242132d9abdebb8 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 9 Mar 2017 21:14:45 -0800 Subject: [PATCH 303/521] ANDROID: sdcardfs: remove unnecessary call to do_munmap Adapted from wrapfs commit 5be6de9ecf02 ("Wrapfs: use vm_munmap in ->mmap") commit 2c9f6014a8bb ("Wrapfs: remove unnecessary call to vm_unmap in ->mmap") Code is unnecessary and causes deadlocks in newer kernels. Signed-off-by: Erez Zadok Signed-off-by: Daniel Rosenberg Bug: 35766959 Change-Id: Ia252d60c60799d7e28fc5f1f0f5b5ec2430a2379 --- fs/sdcardfs/file.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c index eee4eb5896e5..7e2c50b30782 100644 --- a/fs/sdcardfs/file.c +++ b/fs/sdcardfs/file.c @@ -176,12 +176,6 @@ static int sdcardfs_mmap(struct file *file, struct vm_area_struct *vma) goto out; } saved_vm_ops = vma->vm_ops; /* save: came from lower ->mmap */ - err = do_munmap(current->mm, vma->vm_start, - vma->vm_end - vma->vm_start); - if (err) { - pr_err("sdcardfs: do_munmap failed %d\n", err); - goto out; - } } /* From 2c852b7279a7a406631d671a9ede48420fb1e85f Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 9 Mar 2017 21:24:58 -0800 Subject: [PATCH 304/521] ANDROID: sdcardfs: copy lower inode attributes in ->ioctl Adapted from wrapfs commit fbc9c6f83ea6 ("Wrapfs: copy lower inode attributes in ->ioctl") commit e97d8e26cc9e ("Wrapfs: use file_inode helper") Some ioctls (e.g., EXT2_IOC_SETFLAGS) can change inode attributes, so copy them from lower inode. Signed-off-by: Erez Zadok Signed-off-by: Daniel Rosenberg Bug: 35766959 Change-Id: I0f12684b9dbd4088b4a622c7ea9c03087f40e572 --- fs/sdcardfs/file.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c index 7e2c50b30782..41a550fec3f2 100644 --- a/fs/sdcardfs/file.c +++ b/fs/sdcardfs/file.c @@ -113,6 +113,10 @@ static long sdcardfs_unlocked_ioctl(struct file *file, unsigned int cmd, if (lower_file->f_op->unlocked_ioctl) err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg); + /* some ioctls can change inode attributes (EXT2_IOC_SETFLAGS) */ + if (!err) + sdcardfs_copy_and_fix_attrs(file_inode(file), + file_inode(lower_file)); out: return err; } From cde04fd571f4b90a67c3b02be7ebe169fa8fd7a8 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 9 Mar 2017 21:42:01 -0800 Subject: [PATCH 305/521] ANDROID: sdcardfs: fix ->llseek to update upper and lower offset Adapted from wrapfs commit 1d1d23a47baa ("Wrapfs: fix ->llseek to update upper and lower offsets") Fixes bug: xfstests generic/257. f_pos consistently is required by and only by dir_ops->wrapfs_readdir, main_ops is not affected. Signed-off-by: Erez Zadok Signed-off-by: Mengyang Li Signed-off-by: Daniel Rosenberg Bug: 35766959 Change-Id: I360a1368ac37ea8966910a58972b81504031d437 --- fs/sdcardfs/file.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c index 41a550fec3f2..f49df7ac370e 100644 --- a/fs/sdcardfs/file.c +++ b/fs/sdcardfs/file.c @@ -316,6 +316,29 @@ static int sdcardfs_fasync(int fd, struct file *file, int flag) return err; } +/* + * Sdcardfs cannot use generic_file_llseek as ->llseek, because it would + * only set the offset of the upper file. So we have to implement our + * own method to set both the upper and lower file offsets + * consistently. + */ +static loff_t sdcardfs_file_llseek(struct file *file, loff_t offset, int whence) +{ + int err; + struct file *lower_file; + + err = generic_file_llseek(file, offset, whence); + if (err < 0) + goto out; + + lower_file = sdcardfs_lower_file(file); + err = generic_file_llseek(lower_file, offset, whence); + +out: + return err; +} + + const struct file_operations sdcardfs_main_fops = { .llseek = generic_file_llseek, .read = sdcardfs_read, @@ -334,7 +357,7 @@ const struct file_operations sdcardfs_main_fops = { /* trimmed directory options */ const struct file_operations sdcardfs_dir_fops = { - .llseek = generic_file_llseek, + .llseek = sdcardfs_file_llseek, .read = generic_read_dir, .iterate = sdcardfs_readdir, .unlocked_ioctl = sdcardfs_unlocked_ioctl, From b8d0ae906807a5ea6fcc1f4eedc224a1e0a68453 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 9 Mar 2017 21:48:09 -0800 Subject: [PATCH 306/521] ANDROID: sdcardfs: add read_iter/write_iter opeations Adapted from wrapfs commit f398bf6a7377 ("Wrapfs: add read_iter/write_iter opeations") Signed-off-by: Erez Zadok Signed-off-by: Mengyang Li Signed-off-by: Daniel Rosenberg Bug: 35766959 Change-Id: I2b3de59c9682fc705bf21df0de6df81e76fd2e40 --- fs/sdcardfs/file.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c index f49df7ac370e..c0146e03fa2e 100644 --- a/fs/sdcardfs/file.c +++ b/fs/sdcardfs/file.c @@ -338,6 +338,52 @@ static loff_t sdcardfs_file_llseek(struct file *file, loff_t offset, int whence) return err; } +/* + * Sdcardfs read_iter, redirect modified iocb to lower read_iter + */ +ssize_t sdcardfs_read_iter(struct kiocb *iocb, struct iov_iter *iter) +{ + int err; + struct file *file = iocb->ki_filp, *lower_file; + + lower_file = sdcardfs_lower_file(file); + if (!lower_file->f_op->read_iter) { + err = -EINVAL; + goto out; + } + + get_file(lower_file); /* prevent lower_file from being released */ + iocb->ki_filp = lower_file; + err = lower_file->f_op->read_iter(iocb, iter); + /* ? wait IO finish to update atime as ecryptfs ? */ + iocb->ki_filp = file; + fput(lower_file); +out: + return err; +} + +/* + * Sdcardfs write_iter, redirect modified iocb to lower write_iter + */ +ssize_t sdcardfs_write_iter(struct kiocb *iocb, struct iov_iter *iter) +{ + int err; + struct file *file = iocb->ki_filp, *lower_file; + + lower_file = sdcardfs_lower_file(file); + if (!lower_file->f_op->write_iter) { + err = -EINVAL; + goto out; + } + + get_file(lower_file); /* prevent lower_file from being released */ + iocb->ki_filp = lower_file; + err = lower_file->f_op->write_iter(iocb, iter); + iocb->ki_filp = file; + fput(lower_file); +out: + return err; +} const struct file_operations sdcardfs_main_fops = { .llseek = generic_file_llseek, @@ -353,6 +399,8 @@ const struct file_operations sdcardfs_main_fops = { .release = sdcardfs_file_release, .fsync = sdcardfs_fsync, .fasync = sdcardfs_fasync, + .read_iter = sdcardfs_read_iter, + .write_iter = sdcardfs_write_iter, }; /* trimmed directory options */ From 01c1398a44f8b398ceeffcde012d9c4450afa697 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 9 Mar 2017 22:11:08 -0800 Subject: [PATCH 307/521] ANDROID: sdcardfs: use d_splice_alias adapted from wrapfs commit 9671770ff8b9 ("Wrapfs: use d_splice_alias") Refactor interpose code to allow lookup to use d_splice_alias. Signed-off-by: Erez Zadok Signed-off-by: Daniel Rosenberg Bug: 35766959 Change-Id: Icf51db8658202c48456724275b03dc77f73f585b --- fs/sdcardfs/lookup.c | 55 +++++++++++++++++++++++++++++++------------- 1 file changed, 39 insertions(+), 16 deletions(-) diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index cfa2b12b6411..f10f7752f514 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -164,27 +164,25 @@ struct inode *sdcardfs_iget(struct super_block *sb, struct inode *lower_inode, u } /* - * Connect a sdcardfs inode dentry/inode with several lower ones. This is - * the classic stackable file system "vnode interposition" action. - * - * @dentry: sdcardfs's dentry which interposes on lower one - * @sb: sdcardfs's super_block - * @lower_path: the lower path (caller does path_get/put) + * Helper interpose routine, called directly by ->lookup to handle + * spliced dentries. */ -int sdcardfs_interpose(struct dentry *dentry, struct super_block *sb, - struct path *lower_path, userid_t id) +static struct dentry *__sdcardfs_interpose(struct dentry *dentry, + struct super_block *sb, + struct path *lower_path, + userid_t id) { - int err = 0; struct inode *inode; struct inode *lower_inode; struct super_block *lower_sb; + struct dentry *ret_dentry; lower_inode = d_inode(lower_path->dentry); lower_sb = sdcardfs_lower_super(sb); /* check that the lower file system didn't cross a mount point */ if (lower_inode->i_sb != lower_sb) { - err = -EXDEV; + ret_dentry = ERR_PTR(-EXDEV); goto out; } @@ -196,14 +194,32 @@ int sdcardfs_interpose(struct dentry *dentry, struct super_block *sb, /* inherit lower inode number for sdcardfs's inode */ inode = sdcardfs_iget(sb, lower_inode, id); if (IS_ERR(inode)) { - err = PTR_ERR(inode); + ret_dentry = ERR_CAST(inode); goto out; } - d_add(dentry, inode); + ret_dentry = d_splice_alias(inode, dentry); + dentry = ret_dentry ?: dentry; update_derived_permission_lock(dentry); out: - return err; + return ret_dentry; +} + +/* + * Connect an sdcardfs inode dentry/inode with several lower ones. This is + * the classic stackable file system "vnode interposition" action. + * + * @dentry: sdcardfs's dentry which interposes on lower one + * @sb: sdcardfs's super_block + * @lower_path: the lower path (caller does path_get/put) + */ +int sdcardfs_interpose(struct dentry *dentry, struct super_block *sb, + struct path *lower_path, userid_t id) +{ + struct dentry *ret_dentry; + + ret_dentry = __sdcardfs_interpose(dentry, sb, lower_path, id); + return PTR_ERR(ret_dentry); } struct sdcardfs_name_data { @@ -244,6 +260,7 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, const struct qstr *name; struct path lower_path; struct qstr dname; + struct dentry *ret_dentry = NULL; struct sdcardfs_sb_info *sbi; sbi = SDCARDFS_SB(dentry->d_sb); @@ -330,9 +347,13 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, } sdcardfs_set_lower_path(dentry, &lower_path); - err = sdcardfs_interpose(dentry, dentry->d_sb, &lower_path, id); - if (err) /* path_put underlying path on error */ + ret_dentry = + __sdcardfs_interpose(dentry, dentry->d_sb, &lower_path, id); + if (IS_ERR(ret_dentry)) { + err = PTR_ERR(ret_dentry); + /* path_put underlying path on error */ sdcardfs_put_reset_lower_path(dentry); + } goto out; } @@ -372,7 +393,9 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, err = 0; out: - return ERR_PTR(err); + if (err) + return ERR_PTR(err); + return ret_dentry; } /* From 9221207894a7a76936bf67c35d81057ea9cd97fa Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 9 Mar 2017 20:56:05 -0800 Subject: [PATCH 308/521] ANDROID: sdcardfs: update module info Signed-off-by: Daniel Rosenberg Change-Id: I958c7c226d4e9265fea8996803e5b004fb33d8ad --- fs/sdcardfs/main.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/fs/sdcardfs/main.c b/fs/sdcardfs/main.c index 7344635522cd..953d2156d2e9 100644 --- a/fs/sdcardfs/main.c +++ b/fs/sdcardfs/main.c @@ -471,10 +471,15 @@ static void __exit exit_sdcardfs_fs(void) pr_info("Completed sdcardfs module unload\n"); } -MODULE_AUTHOR("Erez Zadok, Filesystems and Storage Lab, Stony Brook University" - " (http://www.fsl.cs.sunysb.edu/)"); -MODULE_DESCRIPTION("Wrapfs " SDCARDFS_VERSION - " (http://wrapfs.filesystems.org/)"); +/* Original wrapfs authors */ +MODULE_AUTHOR("Erez Zadok, Filesystems and Storage Lab, Stony Brook University (http://www.fsl.cs.sunysb.edu/)"); + +/* Original sdcardfs authors */ +MODULE_AUTHOR("Woojoong Lee, Daeho Jeong, Kitae Lee, Yeongjin Gil System Memory Lab., Samsung Electronics"); + +/* Current maintainer */ +MODULE_AUTHOR("Daniel Rosenberg, Google"); +MODULE_DESCRIPTION("Sdcardfs " SDCARDFS_VERSION); MODULE_LICENSE("GPL"); module_init(init_sdcardfs_fs); From 9e6a00e5afd2dbdedbd68b3cf19148ed424c9587 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 10 Apr 2017 23:56:50 +0530 Subject: [PATCH 309/521] arm64: move {PAGE,CONT}_SHIFT into Kconfig In some cases (e.g. the awk for CONFIG_RANDOMIZE_TEXT_OFFSET) we would like to make use of PAGE_SHIFT outside of code that can include the usual header files. Add a new CONFIG_ARM64_PAGE_SHIFT for this, likewise with ARM64_CONT_SHIFT for consistency. Signed-off-by: Mark Rutland Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: Sudeep Holla Cc: Will Deacon Signed-off-by: Will Deacon (cherry picked from commit 030c4d24447cbf2bd612baea5695952e5f62c042) Signed-off-by: Amit Pundir Signed-off-by: Alex Shi --- arch/arm64/Kconfig | 12 ++++++++++++ arch/arm64/include/asm/page.h | 12 ++---------- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 2693917d200f..f1a57b769c79 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -107,6 +107,18 @@ config ARCH_PHYS_ADDR_T_64BIT config MMU def_bool y +config ARM64_PAGE_SHIFT + int + default 16 if ARM64_64K_PAGES + default 14 if ARM64_16K_PAGES + default 12 + +config ARM64_CONT_SHIFT + int + default 5 if ARM64_64K_PAGES + default 7 if ARM64_16K_PAGES + default 4 + config NO_IOPORT_MAP def_bool y if !PCI diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 9b2f5a9d019d..344915bfb8eb 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -21,16 +21,8 @@ /* PAGE_SHIFT determines the page size */ /* CONT_SHIFT determines the number of pages which can be tracked together */ -#ifdef CONFIG_ARM64_64K_PAGES -#define PAGE_SHIFT 16 -#define CONT_SHIFT 5 -#elif defined(CONFIG_ARM64_16K_PAGES) -#define PAGE_SHIFT 14 -#define CONT_SHIFT 7 -#else -#define PAGE_SHIFT 12 -#define CONT_SHIFT 4 -#endif +#define PAGE_SHIFT CONFIG_ARM64_PAGE_SHIFT +#define CONT_SHIFT CONFIG_ARM64_CONT_SHIFT #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) From ed528923541afc1228c5a66e98845148aca51e24 Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom Date: Mon, 27 Mar 2017 11:09:08 +0200 Subject: [PATCH 310/521] drm/vmwgfx: Type-check lookups of fence objects commit f7652afa8eadb416b23eb57dec6f158529942041 upstream. A malicious caller could otherwise hand over handles to other objects causing all sorts of interesting problems. Testing done: Ran a Fedora 25 desktop using both Xorg and gnome-shell/Wayland. Signed-off-by: Thomas Hellstrom Reviewed-by: Sinclair Yeh Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 75 +++++++++++++++++---------- 1 file changed, 49 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index 8e689b439890..b2f329917eda 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -539,7 +539,7 @@ int vmw_fence_create(struct vmw_fence_manager *fman, struct vmw_fence_obj **p_fence) { struct vmw_fence_obj *fence; - int ret; + int ret; fence = kzalloc(sizeof(*fence), GFP_KERNEL); if (unlikely(fence == NULL)) @@ -702,6 +702,41 @@ void vmw_fence_fifo_up(struct vmw_fence_manager *fman) } +/** + * vmw_fence_obj_lookup - Look up a user-space fence object + * + * @tfile: A struct ttm_object_file identifying the caller. + * @handle: A handle identifying the fence object. + * @return: A struct vmw_user_fence base ttm object on success or + * an error pointer on failure. + * + * The fence object is looked up and type-checked. The caller needs + * to have opened the fence object first, but since that happens on + * creation and fence objects aren't shareable, that's not an + * issue currently. + */ +static struct ttm_base_object * +vmw_fence_obj_lookup(struct ttm_object_file *tfile, u32 handle) +{ + struct ttm_base_object *base = ttm_base_object_lookup(tfile, handle); + + if (!base) { + pr_err("Invalid fence object handle 0x%08lx.\n", + (unsigned long)handle); + return ERR_PTR(-EINVAL); + } + + if (base->refcount_release != vmw_user_fence_base_release) { + pr_err("Invalid fence object handle 0x%08lx.\n", + (unsigned long)handle); + ttm_base_object_unref(&base); + return ERR_PTR(-EINVAL); + } + + return base; +} + + int vmw_fence_obj_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { @@ -727,13 +762,9 @@ int vmw_fence_obj_wait_ioctl(struct drm_device *dev, void *data, arg->kernel_cookie = jiffies + wait_timeout; } - base = ttm_base_object_lookup(tfile, arg->handle); - if (unlikely(base == NULL)) { - printk(KERN_ERR "Wait invalid fence object handle " - "0x%08lx.\n", - (unsigned long)arg->handle); - return -EINVAL; - } + base = vmw_fence_obj_lookup(tfile, arg->handle); + if (IS_ERR(base)) + return PTR_ERR(base); fence = &(container_of(base, struct vmw_user_fence, base)->fence); @@ -772,13 +803,9 @@ int vmw_fence_obj_signaled_ioctl(struct drm_device *dev, void *data, struct ttm_object_file *tfile = vmw_fpriv(file_priv)->tfile; struct vmw_private *dev_priv = vmw_priv(dev); - base = ttm_base_object_lookup(tfile, arg->handle); - if (unlikely(base == NULL)) { - printk(KERN_ERR "Fence signaled invalid fence object handle " - "0x%08lx.\n", - (unsigned long)arg->handle); - return -EINVAL; - } + base = vmw_fence_obj_lookup(tfile, arg->handle); + if (IS_ERR(base)) + return PTR_ERR(base); fence = &(container_of(base, struct vmw_user_fence, base)->fence); fman = fman_from_fence(fence); @@ -1093,6 +1120,7 @@ int vmw_fence_event_ioctl(struct drm_device *dev, void *data, (struct drm_vmw_fence_event_arg *) data; struct vmw_fence_obj *fence = NULL; struct vmw_fpriv *vmw_fp = vmw_fpriv(file_priv); + struct ttm_object_file *tfile = vmw_fp->tfile; struct drm_vmw_fence_rep __user *user_fence_rep = (struct drm_vmw_fence_rep __user *)(unsigned long) arg->fence_rep; @@ -1106,15 +1134,11 @@ int vmw_fence_event_ioctl(struct drm_device *dev, void *data, */ if (arg->handle) { struct ttm_base_object *base = - ttm_base_object_lookup_for_ref(dev_priv->tdev, - arg->handle); + vmw_fence_obj_lookup(tfile, arg->handle); + + if (IS_ERR(base)) + return PTR_ERR(base); - if (unlikely(base == NULL)) { - DRM_ERROR("Fence event invalid fence object handle " - "0x%08lx.\n", - (unsigned long)arg->handle); - return -EINVAL; - } fence = &(container_of(base, struct vmw_user_fence, base)->fence); (void) vmw_fence_obj_reference(fence); @@ -1122,7 +1146,7 @@ int vmw_fence_event_ioctl(struct drm_device *dev, void *data, if (user_fence_rep != NULL) { bool existed; - ret = ttm_ref_object_add(vmw_fp->tfile, base, + ret = ttm_ref_object_add(tfile, base, TTM_REF_USAGE, &existed); if (unlikely(ret != 0)) { DRM_ERROR("Failed to reference a fence " @@ -1166,8 +1190,7 @@ int vmw_fence_event_ioctl(struct drm_device *dev, void *data, return 0; out_no_create: if (user_fence_rep != NULL) - ttm_ref_object_base_unref(vmw_fpriv(file_priv)->tfile, - handle, TTM_REF_USAGE); + ttm_ref_object_base_unref(tfile, handle, TTM_REF_USAGE); out_no_ref_obj: vmw_fence_obj_unreference(&fence); return ret; From b26629453c7b2a6c82000b36fbd1cfc4d9101808 Mon Sep 17 00:00:00 2001 From: Murray McAllister Date: Mon, 27 Mar 2017 11:12:53 +0200 Subject: [PATCH 311/521] drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl() commit 36274ab8c596f1240c606bb514da329add2a1bcd upstream. Before memory allocations vmw_surface_define_ioctl() checks the upper-bounds of a user-supplied size, but does not check if the supplied size is 0. Add check to avoid NULL pointer dereferences. Signed-off-by: Murray McAllister Reviewed-by: Sinclair Yeh Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c index 7d620e82e000..b363f0be6512 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c @@ -718,8 +718,8 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void *data, for (i = 0; i < DRM_VMW_MAX_SURFACE_FACES; ++i) num_sizes += req->mip_levels[i]; - if (num_sizes > DRM_VMW_MAX_SURFACE_FACES * - DRM_VMW_MAX_MIP_LEVELS) + if (num_sizes > DRM_VMW_MAX_SURFACE_FACES * DRM_VMW_MAX_MIP_LEVELS || + num_sizes == 0) return -EINVAL; size = vmw_user_surface_size + 128 + From 0e075f266749ea6507758123f553fece6664e4e2 Mon Sep 17 00:00:00 2001 From: Murray McAllister Date: Mon, 27 Mar 2017 11:15:12 +0200 Subject: [PATCH 312/521] drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl() commit 63774069d9527a1aeaa4aa20e929ef5e8e9ecc38 upstream. In vmw_get_cap_3d_ioctl(), a user can supply 0 for a size that is used in vzalloc(). This eventually calls dump_stack() (in warn_alloc()), which can leak useful addresses to dmesg. Add check to avoid a size of 0. Signed-off-by: Murray McAllister Reviewed-by: Sinclair Yeh Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c index b8c6a03c8c54..1802d0e7fab8 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c @@ -186,7 +186,7 @@ int vmw_get_cap_3d_ioctl(struct drm_device *dev, void *data, bool gb_objects = !!(dev_priv->capabilities & SVGA_CAP_GBOBJECTS); struct vmw_fpriv *vmw_fp = vmw_fpriv(file_priv); - if (unlikely(arg->pad64 != 0)) { + if (unlikely(arg->pad64 != 0 || arg->max_size == 0)) { DRM_ERROR("Illegal GET_3D_CAP argument.\n"); return -EINVAL; } From ad4ae2feef4f65b860f139e0d8455e2a16efb93c Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom Date: Mon, 27 Mar 2017 11:21:25 +0200 Subject: [PATCH 313/521] drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces commit fe25deb7737ce6c0879ccf79c99fa1221d428bf2 upstream. Previously, when a surface was opened using a legacy (non prime) handle, it was verified to have been created by a client in the same master realm. Relax this so that opening is also allowed recursively if the client already has the surface open. This works around a regression in svga mesa where opening of a shared surface is used recursively to obtain surface information. Signed-off-by: Thomas Hellstrom Reviewed-by: Sinclair Yeh Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/ttm/ttm_object.c | 10 +++++++--- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 6 ++---- drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 4 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 22 +++++++++------------- include/drm/ttm/ttm_object.h | 5 ++++- 5 files changed, 24 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_object.c b/drivers/gpu/drm/ttm/ttm_object.c index 4f5fa8d65fe9..144367c0c28f 100644 --- a/drivers/gpu/drm/ttm/ttm_object.c +++ b/drivers/gpu/drm/ttm/ttm_object.c @@ -179,7 +179,7 @@ int ttm_base_object_init(struct ttm_object_file *tfile, if (unlikely(ret != 0)) goto out_err0; - ret = ttm_ref_object_add(tfile, base, TTM_REF_USAGE, NULL); + ret = ttm_ref_object_add(tfile, base, TTM_REF_USAGE, NULL, false); if (unlikely(ret != 0)) goto out_err1; @@ -318,7 +318,8 @@ EXPORT_SYMBOL(ttm_ref_object_exists); int ttm_ref_object_add(struct ttm_object_file *tfile, struct ttm_base_object *base, - enum ttm_ref_type ref_type, bool *existed) + enum ttm_ref_type ref_type, bool *existed, + bool require_existed) { struct drm_open_hash *ht = &tfile->ref_hash[ref_type]; struct ttm_ref_object *ref; @@ -345,6 +346,9 @@ int ttm_ref_object_add(struct ttm_object_file *tfile, } rcu_read_unlock(); + if (require_existed) + return -EPERM; + ret = ttm_mem_global_alloc(mem_glob, sizeof(*ref), false, false); if (unlikely(ret != 0)) @@ -635,7 +639,7 @@ int ttm_prime_fd_to_handle(struct ttm_object_file *tfile, prime = (struct ttm_prime_object *) dma_buf->priv; base = &prime->base; *handle = base->hash.key; - ret = ttm_ref_object_add(tfile, base, TTM_REF_USAGE, NULL); + ret = ttm_ref_object_add(tfile, base, TTM_REF_USAGE, NULL, false); dma_buf_put(dma_buf); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index b2f329917eda..6c649f7b5929 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -1144,10 +1144,8 @@ int vmw_fence_event_ioctl(struct drm_device *dev, void *data, (void) vmw_fence_obj_reference(fence); if (user_fence_rep != NULL) { - bool existed; - - ret = ttm_ref_object_add(tfile, base, - TTM_REF_USAGE, &existed); + ret = ttm_ref_object_add(vmw_fp->tfile, base, + TTM_REF_USAGE, NULL, false); if (unlikely(ret != 0)) { DRM_ERROR("Failed to reference a fence " "object.\n"); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c index e57667ca7557..dbca128a9aa6 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c @@ -591,7 +591,7 @@ static int vmw_user_dmabuf_synccpu_grab(struct vmw_user_dma_buffer *user_bo, return ret; ret = ttm_ref_object_add(tfile, &user_bo->prime.base, - TTM_REF_SYNCCPU_WRITE, &existed); + TTM_REF_SYNCCPU_WRITE, &existed, false); if (ret != 0 || existed) ttm_bo_synccpu_write_release(&user_bo->dma.base); @@ -775,7 +775,7 @@ int vmw_user_dmabuf_reference(struct ttm_object_file *tfile, *handle = user_bo->prime.base.hash.key; return ttm_ref_object_add(tfile, &user_bo->prime.base, - TTM_REF_USAGE, NULL); + TTM_REF_USAGE, NULL, false); } /* diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c index b363f0be6512..79f78a68d92d 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c @@ -904,17 +904,16 @@ vmw_surface_handle_reference(struct vmw_private *dev_priv, uint32_t handle; struct ttm_base_object *base; int ret; + bool require_exist = false; if (handle_type == DRM_VMW_HANDLE_PRIME) { ret = ttm_prime_fd_to_handle(tfile, u_handle, &handle); if (unlikely(ret != 0)) return ret; } else { - if (unlikely(drm_is_render_client(file_priv))) { - DRM_ERROR("Render client refused legacy " - "surface reference.\n"); - return -EACCES; - } + if (unlikely(drm_is_render_client(file_priv))) + require_exist = true; + if (ACCESS_ONCE(vmw_fpriv(file_priv)->locked_master)) { DRM_ERROR("Locked master refused legacy " "surface reference.\n"); @@ -942,17 +941,14 @@ vmw_surface_handle_reference(struct vmw_private *dev_priv, /* * Make sure the surface creator has the same - * authenticating master. + * authenticating master, or is already registered with us. */ if (drm_is_primary_client(file_priv) && - user_srf->master != file_priv->master) { - DRM_ERROR("Trying to reference surface outside of" - " master domain.\n"); - ret = -EACCES; - goto out_bad_resource; - } + user_srf->master != file_priv->master) + require_exist = true; - ret = ttm_ref_object_add(tfile, base, TTM_REF_USAGE, NULL); + ret = ttm_ref_object_add(tfile, base, TTM_REF_USAGE, NULL, + require_exist); if (unlikely(ret != 0)) { DRM_ERROR("Could not add a reference to a surface.\n"); goto out_bad_resource; diff --git a/include/drm/ttm/ttm_object.h b/include/drm/ttm/ttm_object.h index ed953f98f0e1..1487011fe057 100644 --- a/include/drm/ttm/ttm_object.h +++ b/include/drm/ttm/ttm_object.h @@ -229,6 +229,8 @@ extern void ttm_base_object_unref(struct ttm_base_object **p_base); * @ref_type: The type of reference. * @existed: Upon completion, indicates that an identical reference object * already existed, and the refcount was upped on that object instead. + * @require_existed: Fail with -EPERM if an identical ref object didn't + * already exist. * * Checks that the base object is shareable and adds a ref object to it. * @@ -243,7 +245,8 @@ extern void ttm_base_object_unref(struct ttm_base_object **p_base); */ extern int ttm_ref_object_add(struct ttm_object_file *tfile, struct ttm_base_object *base, - enum ttm_ref_type ref_type, bool *existed); + enum ttm_ref_type ref_type, bool *existed, + bool require_existed); extern bool ttm_ref_object_exists(struct ttm_object_file *tfile, struct ttm_base_object *base); From 235e914069bd501be22597e6c0176f16b477ae37 Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom Date: Mon, 27 Mar 2017 13:06:05 +0200 Subject: [PATCH 314/521] drm/vmwgfx: Remove getparam error message commit 53e16798b0864464c5444a204e1bb93ae246c429 upstream. The mesa winsys sometimes uses unimplemented parameter requests to check for features. Remove the error message to avoid bloating the kernel log. Signed-off-by: Thomas Hellstrom Reviewed-by: Brian Paul Reviewed-by: Sinclair Yeh Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c index 1802d0e7fab8..5ec24fd801cd 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c @@ -114,8 +114,6 @@ int vmw_getparam_ioctl(struct drm_device *dev, void *data, param->value = dev_priv->has_dx; break; default: - DRM_ERROR("Illegal vmwgfx get param request: %d\n", - param->param); return -EINVAL; } From c21636bd64c511160846bdf87ef4c7ff48680c99 Mon Sep 17 00:00:00 2001 From: Li Qiang Date: Mon, 27 Mar 2017 20:10:53 -0700 Subject: [PATCH 315/521] drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl() commit e7e11f99564222d82f0ce84bd521e57d78a6b678 upstream. In vmw_surface_define_ioctl(), the 'num_sizes' is the sum of the 'req->mip_levels' array. This array can be assigned any value from the user space. As both the 'num_sizes' and the array is uint32_t, it is easy to make 'num_sizes' overflow. The later 'mip_levels' is used as the loop count. This can lead an oob write. Add the check of 'req->mip_levels' to avoid this. Signed-off-by: Li Qiang Reviewed-by: Thomas Hellstrom Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c index 79f78a68d92d..c9c04ccccdd9 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c @@ -715,8 +715,11 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void *data, 128; num_sizes = 0; - for (i = 0; i < DRM_VMW_MAX_SURFACE_FACES; ++i) + for (i = 0; i < DRM_VMW_MAX_SURFACE_FACES; ++i) { + if (req->mip_levels[i] > DRM_VMW_MAX_MIP_LEVELS) + return -EINVAL; num_sizes += req->mip_levels[i]; + } if (num_sizes > DRM_VMW_MAX_SURFACE_FACES * DRM_VMW_MAX_MIP_LEVELS || num_sizes == 0) From 69d8d58bf50d9cd1bb6f000bbdf54026e74717a3 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 3 Apr 2017 11:30:34 +1000 Subject: [PATCH 316/521] sysfs: be careful of error returns from ops->show() commit c8a139d001a1aab1ea8734db14b22dac9dd143b6 upstream. ops->show() can return a negative error code. Commit 65da3484d9be ("sysfs: correctly handle short reads on PREALLOC attrs.") (in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors would look like large numbers. As a result, if an error is returned, sysfs_kf_read() will return the value of 'count', typically 4096. Commit 17d0774f8068 ("sysfs: correctly handle read offset on PREALLOC attrs") (in v4.8) extended this error to use the unsigned large 'len' as a size for memmove(). Consequently, if ->show returns an error, then the first read() on the sysfs file will return 4096 and could return uninitialized memory to user-space. If the application performs a subsequent read, this will trigger a memmove() with extremely large count, and is likely to crash the machine is bizarre ways. This bug can currently only be triggered by reading from an md sysfs attribute declared with __ATTR_PREALLOC() during the brief period between when mddev_put() deletes an mddev from the ->all_mddevs list, and when mddev_delayed_delete() - which is scheduled on a workqueue - completes. Before this, an error won't be returned by the ->show() After this, the ->show() won't be called. I can reproduce it reliably only by putting delay like usleep_range(500000,700000); early in mddev_delayed_delete(). Then after creating an md device md0 run echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state The bug can be triggered without the usleep. Fixes: 65da3484d9be ("sysfs: correctly handle short reads on PREALLOC attrs.") Fixes: 17d0774f8068 ("sysfs: correctly handle read offset on PREALLOC attrs") Signed-off-by: NeilBrown Acked-by: Tejun Heo Reported-and-tested-by: Miroslav Benes Signed-off-by: Greg Kroah-Hartman --- fs/sysfs/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index b803213d1307..39c75a86c67f 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -108,7 +108,7 @@ static ssize_t sysfs_kf_read(struct kernfs_open_file *of, char *buf, { const struct sysfs_ops *ops = sysfs_file_ops(of->kn); struct kobject *kobj = of->kn->parent->priv; - size_t len; + ssize_t len; /* * If buf != of->prealloc_buf, we don't know how @@ -117,13 +117,15 @@ static ssize_t sysfs_kf_read(struct kernfs_open_file *of, char *buf, if (WARN_ON_ONCE(buf != of->prealloc_buf)) return 0; len = ops->show(kobj, of->kn->priv, buf); + if (len < 0) + return len; if (pos) { if (len <= pos) return 0; len -= pos; memmove(buf, buf + pos, len); } - return min(count, len); + return min_t(ssize_t, count, len); } /* kernfs write callback for regular sysfs files */ From 193b590c71cd4c1fd54f4b4cab1ba73b6212c073 Mon Sep 17 00:00:00 2001 From: Shuxiao Zhang Date: Thu, 6 Apr 2017 22:30:29 +0800 Subject: [PATCH 317/521] staging: android: ashmem: lseek failed due to no FMODE_LSEEK. commit 97fbfef6bd597888485b653175fb846c6998b60c upstream. vfs_llseek will check whether the file mode has FMODE_LSEEK, no return failure. But ashmem can be lseek, so add FMODE_LSEEK to ashmem file. Comment From Greg Hackmann: ashmem_llseek() passes the llseek() call through to the backing shmem file. 91360b02ab48 ("ashmem: use vfs_llseek()") changed this from directly calling the file's llseek() op into a VFS layer call. This also adds a check for the FMODE_LSEEK bit, so without that bit ashmem_llseek() now always fails with -ESPIPE. Fixes: 91360b02ab48 ("ashmem: use vfs_llseek()") Signed-off-by: Shuxiao Zhang Tested-by: Greg Hackmann Signed-off-by: Greg Kroah-Hartman --- drivers/staging/android/ashmem.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c index 3f2a3d611e4b..9c6357c03905 100644 --- a/drivers/staging/android/ashmem.c +++ b/drivers/staging/android/ashmem.c @@ -392,6 +392,7 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma) ret = PTR_ERR(vmfile); goto out; } + vmfile->f_mode |= FMODE_LSEEK; asma->file = vmfile; } get_file(asma->file); From 8e88806117e4868bc459a3042e55f8bf06c0b9e0 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 16 Mar 2017 18:20:49 +0000 Subject: [PATCH 318/521] arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm commit 90f6e150e44a0dc3883110eeb3ab35d1be42b6bb upstream. We don't hold the mmap_sem while searching for the VMAs when we try to unmap each memslot for a VM. Fix this properly to avoid unexpected results. Fixes: commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm") Reviewed-by: Christoffer Dall Signed-off-by: Suzuki K Poulose Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/kvm/mmu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 11b6595c2672..5366a736151e 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -796,6 +796,7 @@ void stage2_unmap_vm(struct kvm *kvm) int idx; idx = srcu_read_lock(&kvm->srcu); + down_read(¤t->mm->mmap_sem); spin_lock(&kvm->mmu_lock); slots = kvm_memslots(kvm); @@ -803,6 +804,7 @@ void stage2_unmap_vm(struct kvm *kvm) stage2_unmap_memslot(kvm, memslot); spin_unlock(&kvm->mmu_lock); + up_read(¤t->mm->mmap_sem); srcu_read_unlock(&kvm->srcu, idx); } From d4ad442b9982fba9eab0f9003c8cd185a1afeff6 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 16 Mar 2017 18:20:50 +0000 Subject: [PATCH 319/521] arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region commit 72f310481a08db821b614e7b5d00febcc9064b36 upstream. We don't hold the mmap_sem while searching for VMAs (via find_vma), in kvm_arch_prepare_memory_region, which can end up in expected failures. Fixes: commit 8eef91239e57 ("arm/arm64: KVM: map MMIO regions at creation time") Cc: Ard Biesheuvel Cc: Eric Auger Reviewed-by: Christoffer Dall [ Handle dirty page logging failure case ] Signed-off-by: Suzuki K Poulose Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/kvm/mmu.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 5366a736151e..f91ee2f27b41 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -1761,6 +1761,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, (KVM_PHYS_SIZE >> PAGE_SHIFT)) return -EFAULT; + down_read(¤t->mm->mmap_sem); /* * A memory region could potentially cover multiple VMAs, and any holes * between them, so iterate over all of them to find out if we can map @@ -1804,8 +1805,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, pa += vm_start - vma->vm_start; /* IO region dirty page logging not allowed */ - if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) - return -EINVAL; + if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) { + ret = -EINVAL; + goto out; + } ret = kvm_phys_addr_ioremap(kvm, gpa, pa, vm_end - vm_start, @@ -1817,7 +1820,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, } while (hva < reg_end); if (change == KVM_MR_FLAGS_ONLY) - return ret; + goto out; spin_lock(&kvm->mmu_lock); if (ret) @@ -1825,6 +1828,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, else stage2_flush_memslot(kvm, memslot); spin_unlock(&kvm->mmu_lock); +out: + up_read(¤t->mm->mmap_sem); return ret; } From 8ff7eb4bc8b8cf0416e0746dcdb1545fc6869e98 Mon Sep 17 00:00:00 2001 From: Quentin Schulz Date: Tue, 21 Mar 2017 16:52:14 +0100 Subject: [PATCH 320/521] iio: bmg160: reset chip when probing commit 4bdc9029685ac03be50b320b29691766d2326c2b upstream. The gyroscope chip might need to be reset to be used. Without the chip being reset, the driver stopped at the first regmap_read (to get the CHIP_ID) and failed to probe. The datasheet of the gyroscope says that a minimum wait of 30ms after the reset has to be done. This patch has been checked on a BMX055 and the datasheet of the BMG160 and the BMI055 give the same reset register and bits. Signed-off-by: Quentin Schulz Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/gyro/bmg160_core.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/iio/gyro/bmg160_core.c b/drivers/iio/gyro/bmg160_core.c index acb3b303d800..90841abd3ce4 100644 --- a/drivers/iio/gyro/bmg160_core.c +++ b/drivers/iio/gyro/bmg160_core.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "bmg160.h" #define BMG160_IRQ_NAME "bmg160_event" @@ -53,6 +54,9 @@ #define BMG160_NO_FILTER 0 #define BMG160_DEF_BW 100 +#define BMG160_GYRO_REG_RESET 0x14 +#define BMG160_GYRO_RESET_VAL 0xb6 + #define BMG160_REG_INT_MAP_0 0x17 #define BMG160_INT_MAP_0_BIT_ANY BIT(1) @@ -186,6 +190,14 @@ static int bmg160_chip_init(struct bmg160_data *data) int ret; unsigned int val; + /* + * Reset chip to get it in a known good state. A delay of 30ms after + * reset is required according to the datasheet. + */ + regmap_write(data->regmap, BMG160_GYRO_REG_RESET, + BMG160_GYRO_RESET_VAL); + usleep_range(30000, 30700); + ret = regmap_read(data->regmap, BMG160_REG_CHIP_ID, &val); if (ret < 0) { dev_err(data->dev, "Error reading reg_chip_id\n"); From 5a69c2b268ed938d44011274e6bc87562542ef94 Mon Sep 17 00:00:00 2001 From: Jan-Marek Glogowski Date: Mon, 20 Feb 2017 12:25:58 +0100 Subject: [PATCH 321/521] Reset TreeId to zero on SMB2 TREE_CONNECT commit 806a28efe9b78ffae5e2757e1ee924b8e50c08ab upstream. Currently the cifs module breaks the CIFS specs on reconnect as described in http://msdn.microsoft.com/en-us/library/cc246529.aspx: "TreeId (4 bytes): Uniquely identifies the tree connect for the command. This MUST be 0 for the SMB2 TREE_CONNECT Request." Signed-off-by: Jan-Marek Glogowski Reviewed-by: Aurelien Aptel Tested-by: Aurelien Aptel Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/smb2pdu.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 2fa754c5fd62..6cb5c4b30e78 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -952,6 +952,10 @@ SMB2_tcon(const unsigned int xid, struct cifs_ses *ses, const char *tree, return -EINVAL; } + /* SMB2 TREE_CONNECT request must be called with TreeId == 0 */ + if (tcon) + tcon->tid = 0; + rc = small_smb2_init(SMB2_TREE_CONNECT, tcon, (void **) &req); if (rc) { kfree(unc_path); From 926e1ed2b8ce683f137ea8e0683ac4f6d27c8afb Mon Sep 17 00:00:00 2001 From: "bsegall@google.com" Date: Fri, 7 Apr 2017 16:04:51 -0700 Subject: [PATCH 322/521] ptrace: fix PTRACE_LISTEN race corrupting task->state commit 5402e97af667e35e54177af8f6575518bf251d51 upstream. In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against __TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end of a PTRACE_LISTEN, this can wake the task /after/ the check against __TASK_TRACED, but before the reset of state to TASK_TRACED. This causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup against TRACED while the task is still on the rq wake_list, corrupting it. Oleg said: "The kernel can crash or this can lead to other hard-to-debug problems. In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced() assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the contract. Obviusly it is very wrong to manipulate task->state if this task is already running, or WAKING, or it sleeps again" [akpm@linux-foundation.org: coding-style fixes] Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL") Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com Signed-off-by: Ben Segall Acked-by: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/ptrace.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index a46c40bfb5f6..c7e8ed99c953 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -151,11 +151,17 @@ static void ptrace_unfreeze_traced(struct task_struct *task) WARN_ON(!task->ptrace || task->parent != current); + /* + * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely. + * Recheck state under the lock to close this race. + */ spin_lock_irq(&task->sighand->siglock); - if (__fatal_signal_pending(task)) - wake_up_state(task, __TASK_TRACED); - else - task->state = TASK_TRACED; + if (task->state == __TASK_TRACED) { + if (__fatal_signal_pending(task)) + wake_up_state(task, __TASK_TRACED); + else + task->state = TASK_TRACED; + } spin_unlock_irq(&task->sighand->siglock); } From 5cc244782dabaee110ed9c3900d40cd4b481a517 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Fri, 17 Jun 2016 17:33:59 +0000 Subject: [PATCH 323/521] ring-buffer: Fix return value check in test_ringbuffer() commit 62277de758b155dc04b78f195a1cb5208c37b2df upstream. In case of error, the function kthread_run() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com Fixes: 6c43e554a ("ring-buffer: Add ring buffer startup selftest") Signed-off-by: Wei Yongjun Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/ring_buffer.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index acbb0e73d3a2..7d7f99b0db47 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -4875,9 +4875,9 @@ static __init int test_ringbuffer(void) rb_data[cpu].cnt = cpu; rb_threads[cpu] = kthread_create(rb_test, &rb_data[cpu], "rbtester/%d", cpu); - if (WARN_ON(!rb_threads[cpu])) { + if (WARN_ON(IS_ERR(rb_threads[cpu]))) { pr_cont("FAILED\n"); - ret = -1; + ret = PTR_ERR(rb_threads[cpu]); goto out_free; } @@ -4887,9 +4887,9 @@ static __init int test_ringbuffer(void) /* Now create the rb hammer! */ rb_hammer = kthread_run(rb_hammer_test, NULL, "rbhammer"); - if (WARN_ON(!rb_hammer)) { + if (WARN_ON(IS_ERR(rb_hammer))) { pr_cont("FAILED\n"); - ret = -1; + ret = PTR_ERR(rb_hammer); goto out_free; } From ce962cf480331380d7eb3c8e3c625a975e0aa38f Mon Sep 17 00:00:00 2001 From: James Hogan Date: Fri, 31 Mar 2017 10:37:44 +0100 Subject: [PATCH 324/521] metag/usercopy: Drop unused macros commit ef62a2d81f73d9cddef14bc3d9097a57010d551c upstream. Metag's lib/usercopy.c has a bunch of copy_from_user macros for larger copies between 5 and 16 bytes which are completely unused. Before fixing zeroing lets drop these macros so there is less to fix. Signed-off-by: James Hogan Cc: Al Viro Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/metag/lib/usercopy.c | 113 -------------------------------------- 1 file changed, 113 deletions(-) diff --git a/arch/metag/lib/usercopy.c b/arch/metag/lib/usercopy.c index b3ebfe9c8e88..b4eb1f17069f 100644 --- a/arch/metag/lib/usercopy.c +++ b/arch/metag/lib/usercopy.c @@ -651,119 +651,6 @@ EXPORT_SYMBOL(__copy_user); #define __asm_copy_from_user_4(to, from, ret) \ __asm_copy_from_user_4x_cont(to, from, ret, "", "", "") -#define __asm_copy_from_user_5(to, from, ret) \ - __asm_copy_from_user_4x_cont(to, from, ret, \ - " GETB D1Ar1,[%1++]\n" \ - "4: SETB [%0++],D1Ar1\n", \ - "5: ADD %2,%2,#1\n" \ - " SETB [%0++],D1Ar1\n", \ - " .long 4b,5b\n") - -#define __asm_copy_from_user_6x_cont(to, from, ret, COPY, FIXUP, TENTRY) \ - __asm_copy_from_user_4x_cont(to, from, ret, \ - " GETW D1Ar1,[%1++]\n" \ - "4: SETW [%0++],D1Ar1\n" COPY, \ - "5: ADD %2,%2,#2\n" \ - " SETW [%0++],D1Ar1\n" FIXUP, \ - " .long 4b,5b\n" TENTRY) - -#define __asm_copy_from_user_6(to, from, ret) \ - __asm_copy_from_user_6x_cont(to, from, ret, "", "", "") - -#define __asm_copy_from_user_7(to, from, ret) \ - __asm_copy_from_user_6x_cont(to, from, ret, \ - " GETB D1Ar1,[%1++]\n" \ - "6: SETB [%0++],D1Ar1\n", \ - "7: ADD %2,%2,#1\n" \ - " SETB [%0++],D1Ar1\n", \ - " .long 6b,7b\n") - -#define __asm_copy_from_user_8x_cont(to, from, ret, COPY, FIXUP, TENTRY) \ - __asm_copy_from_user_4x_cont(to, from, ret, \ - " GETD D1Ar1,[%1++]\n" \ - "4: SETD [%0++],D1Ar1\n" COPY, \ - "5: ADD %2,%2,#4\n" \ - " SETD [%0++],D1Ar1\n" FIXUP, \ - " .long 4b,5b\n" TENTRY) - -#define __asm_copy_from_user_8(to, from, ret) \ - __asm_copy_from_user_8x_cont(to, from, ret, "", "", "") - -#define __asm_copy_from_user_9(to, from, ret) \ - __asm_copy_from_user_8x_cont(to, from, ret, \ - " GETB D1Ar1,[%1++]\n" \ - "6: SETB [%0++],D1Ar1\n", \ - "7: ADD %2,%2,#1\n" \ - " SETB [%0++],D1Ar1\n", \ - " .long 6b,7b\n") - -#define __asm_copy_from_user_10x_cont(to, from, ret, COPY, FIXUP, TENTRY) \ - __asm_copy_from_user_8x_cont(to, from, ret, \ - " GETW D1Ar1,[%1++]\n" \ - "6: SETW [%0++],D1Ar1\n" COPY, \ - "7: ADD %2,%2,#2\n" \ - " SETW [%0++],D1Ar1\n" FIXUP, \ - " .long 6b,7b\n" TENTRY) - -#define __asm_copy_from_user_10(to, from, ret) \ - __asm_copy_from_user_10x_cont(to, from, ret, "", "", "") - -#define __asm_copy_from_user_11(to, from, ret) \ - __asm_copy_from_user_10x_cont(to, from, ret, \ - " GETB D1Ar1,[%1++]\n" \ - "8: SETB [%0++],D1Ar1\n", \ - "9: ADD %2,%2,#1\n" \ - " SETB [%0++],D1Ar1\n", \ - " .long 8b,9b\n") - -#define __asm_copy_from_user_12x_cont(to, from, ret, COPY, FIXUP, TENTRY) \ - __asm_copy_from_user_8x_cont(to, from, ret, \ - " GETD D1Ar1,[%1++]\n" \ - "6: SETD [%0++],D1Ar1\n" COPY, \ - "7: ADD %2,%2,#4\n" \ - " SETD [%0++],D1Ar1\n" FIXUP, \ - " .long 6b,7b\n" TENTRY) - -#define __asm_copy_from_user_12(to, from, ret) \ - __asm_copy_from_user_12x_cont(to, from, ret, "", "", "") - -#define __asm_copy_from_user_13(to, from, ret) \ - __asm_copy_from_user_12x_cont(to, from, ret, \ - " GETB D1Ar1,[%1++]\n" \ - "8: SETB [%0++],D1Ar1\n", \ - "9: ADD %2,%2,#1\n" \ - " SETB [%0++],D1Ar1\n", \ - " .long 8b,9b\n") - -#define __asm_copy_from_user_14x_cont(to, from, ret, COPY, FIXUP, TENTRY) \ - __asm_copy_from_user_12x_cont(to, from, ret, \ - " GETW D1Ar1,[%1++]\n" \ - "8: SETW [%0++],D1Ar1\n" COPY, \ - "9: ADD %2,%2,#2\n" \ - " SETW [%0++],D1Ar1\n" FIXUP, \ - " .long 8b,9b\n" TENTRY) - -#define __asm_copy_from_user_14(to, from, ret) \ - __asm_copy_from_user_14x_cont(to, from, ret, "", "", "") - -#define __asm_copy_from_user_15(to, from, ret) \ - __asm_copy_from_user_14x_cont(to, from, ret, \ - " GETB D1Ar1,[%1++]\n" \ - "10: SETB [%0++],D1Ar1\n", \ - "11: ADD %2,%2,#1\n" \ - " SETB [%0++],D1Ar1\n", \ - " .long 10b,11b\n") - -#define __asm_copy_from_user_16x_cont(to, from, ret, COPY, FIXUP, TENTRY) \ - __asm_copy_from_user_12x_cont(to, from, ret, \ - " GETD D1Ar1,[%1++]\n" \ - "8: SETD [%0++],D1Ar1\n" COPY, \ - "9: ADD %2,%2,#4\n" \ - " SETD [%0++],D1Ar1\n" FIXUP, \ - " .long 8b,9b\n" TENTRY) - -#define __asm_copy_from_user_16(to, from, ret) \ - __asm_copy_from_user_16x_cont(to, from, ret, "", "", "") #define __asm_copy_from_user_8x64(to, from, ret) \ asm volatile ( \ From ae781dee56e4805311f0615ca04ea226bfbcafcd Mon Sep 17 00:00:00 2001 From: James Hogan Date: Fri, 31 Mar 2017 11:23:18 +0100 Subject: [PATCH 325/521] metag/usercopy: Fix alignment error checking commit 2257211942bbbf6c798ab70b487d7e62f7835a1a upstream. Fix the error checking of the alignment adjustment code in raw_copy_from_user(), which mistakenly considers it safe to skip the error check when aligning the source buffer on a 2 or 4 byte boundary. If the destination buffer was unaligned it may have started to copy using byte or word accesses, which could well be at the start of a new (valid) source page. This would result in it appearing to have copied 1 or 2 bytes at the end of the first (invalid) page rather than none at all. Fixes: 373cd784d0fc ("metag: Memory handling") Signed-off-by: James Hogan Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/metag/lib/usercopy.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/arch/metag/lib/usercopy.c b/arch/metag/lib/usercopy.c index b4eb1f17069f..a6ced9691ddb 100644 --- a/arch/metag/lib/usercopy.c +++ b/arch/metag/lib/usercopy.c @@ -717,6 +717,8 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, if ((unsigned long) src & 1) { __asm_copy_from_user_1(dst, src, retn); n--; + if (retn) + goto copy_exception_bytes; } if ((unsigned long) dst & 1) { /* Worst case - byte copy */ @@ -730,6 +732,8 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, if (((unsigned long) src & 2) && n >= 2) { __asm_copy_from_user_2(dst, src, retn); n -= 2; + if (retn) + goto copy_exception_bytes; } if ((unsigned long) dst & 2) { /* Second worst case - word copy */ @@ -741,12 +745,6 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, } } - /* We only need one check after the unalignment-adjustments, - because if both adjustments were done, either both or - neither reference had an exception. */ - if (retn != 0) - goto copy_exception_bytes; - #ifdef USE_RAPF /* 64 bit copy loop */ if (!(((unsigned long) src | (unsigned long) dst) & 7)) { From dde6f22c1e122907717f45405cbc2c6227e259e5 Mon Sep 17 00:00:00 2001 From: James Hogan Date: Fri, 31 Mar 2017 13:35:01 +0100 Subject: [PATCH 326/521] metag/usercopy: Add early abort to copy_to_user commit fb8ea062a8f2e85256e13f55696c5c5f0dfdcc8b upstream. When copying to userland on Meta, if any faults are encountered immediately abort the copy instead of continuing on and repeatedly faulting, and worse potentially copying further bytes successfully to subsequent valid pages. Fixes: 373cd784d0fc ("metag: Memory handling") Reported-by: Al Viro Signed-off-by: James Hogan Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/metag/lib/usercopy.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/arch/metag/lib/usercopy.c b/arch/metag/lib/usercopy.c index a6ced9691ddb..714d8562aa20 100644 --- a/arch/metag/lib/usercopy.c +++ b/arch/metag/lib/usercopy.c @@ -538,23 +538,31 @@ unsigned long __copy_user(void __user *pdst, const void *psrc, if ((unsigned long) src & 1) { __asm_copy_to_user_1(dst, src, retn); n--; + if (retn) + return retn + n; } if ((unsigned long) dst & 1) { /* Worst case - byte copy */ while (n > 0) { __asm_copy_to_user_1(dst, src, retn); n--; + if (retn) + return retn + n; } } if (((unsigned long) src & 2) && n >= 2) { __asm_copy_to_user_2(dst, src, retn); n -= 2; + if (retn) + return retn + n; } if ((unsigned long) dst & 2) { /* Second worst case - word copy */ while (n >= 2) { __asm_copy_to_user_2(dst, src, retn); n -= 2; + if (retn) + return retn + n; } } @@ -569,6 +577,8 @@ unsigned long __copy_user(void __user *pdst, const void *psrc, while (n >= 8) { __asm_copy_to_user_8x64(dst, src, retn); n -= 8; + if (retn) + return retn + n; } } if (n >= RAPF_MIN_BUF_SIZE) { @@ -581,6 +591,8 @@ unsigned long __copy_user(void __user *pdst, const void *psrc, while (n >= 8) { __asm_copy_to_user_8x64(dst, src, retn); n -= 8; + if (retn) + return retn + n; } } #endif @@ -588,11 +600,15 @@ unsigned long __copy_user(void __user *pdst, const void *psrc, while (n >= 16) { __asm_copy_to_user_16(dst, src, retn); n -= 16; + if (retn) + return retn + n; } while (n >= 4) { __asm_copy_to_user_4(dst, src, retn); n -= 4; + if (retn) + return retn + n; } switch (n) { @@ -609,6 +625,10 @@ unsigned long __copy_user(void __user *pdst, const void *psrc, break; } + /* + * If we get here, retn correctly reflects the number of failing + * bytes. + */ return retn; } EXPORT_SYMBOL(__copy_user); From 29b5eb517c6961ea9e9b16c49b5cf7fd93860be2 Mon Sep 17 00:00:00 2001 From: James Hogan Date: Fri, 31 Mar 2017 11:14:02 +0100 Subject: [PATCH 327/521] metag/usercopy: Zero rest of buffer from copy_from_user commit 563ddc1076109f2b3f88e6d355eab7b6fd4662cb upstream. Currently we try to zero the destination for a failed read from userland in fixup code in the usercopy.c macros. The rest of the destination buffer is then zeroed from __copy_user_zeroing(), which is used for both copy_from_user() and __copy_from_user(). Unfortunately we fail to zero in the fixup code as D1Ar1 is set to 0 before the fixup code entry labels, and __copy_from_user() shouldn't even be zeroing the rest of the buffer. Move the zeroing out into copy_from_user() and rename __copy_user_zeroing() to raw_copy_from_user() since it no longer does any zeroing. This also conveniently matches the name needed for RAW_COPY_USER support in a later patch. Fixes: 373cd784d0fc ("metag: Memory handling") Reported-by: Al Viro Signed-off-by: James Hogan Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/metag/include/asm/uaccess.h | 15 +++++---- arch/metag/lib/usercopy.c | 57 ++++++++++---------------------- 2 files changed, 26 insertions(+), 46 deletions(-) diff --git a/arch/metag/include/asm/uaccess.h b/arch/metag/include/asm/uaccess.h index 273e61225c27..07238b39638c 100644 --- a/arch/metag/include/asm/uaccess.h +++ b/arch/metag/include/asm/uaccess.h @@ -197,20 +197,21 @@ extern long __must_check strnlen_user(const char __user *src, long count); #define strlen_user(str) strnlen_user(str, 32767) -extern unsigned long __must_check __copy_user_zeroing(void *to, - const void __user *from, - unsigned long n); +extern unsigned long raw_copy_from_user(void *to, const void __user *from, + unsigned long n); static inline unsigned long copy_from_user(void *to, const void __user *from, unsigned long n) { + unsigned long res = n; if (likely(access_ok(VERIFY_READ, from, n))) - return __copy_user_zeroing(to, from, n); - memset(to, 0, n); - return n; + res = raw_copy_from_user(to, from, n); + if (unlikely(res)) + memset(to + (n - res), 0, res); + return res; } -#define __copy_from_user(to, from, n) __copy_user_zeroing(to, from, n) +#define __copy_from_user(to, from, n) raw_copy_from_user(to, from, n) #define __copy_from_user_inatomic __copy_from_user extern unsigned long __must_check __copy_user(void __user *to, diff --git a/arch/metag/lib/usercopy.c b/arch/metag/lib/usercopy.c index 714d8562aa20..e1d553872fd7 100644 --- a/arch/metag/lib/usercopy.c +++ b/arch/metag/lib/usercopy.c @@ -29,7 +29,6 @@ COPY \ "1:\n" \ " .section .fixup,\"ax\"\n" \ - " MOV D1Ar1,#0\n" \ FIXUP \ " MOVT D1Ar1,#HI(1b)\n" \ " JUMP D1Ar1,#LO(1b)\n" \ @@ -637,16 +636,14 @@ EXPORT_SYMBOL(__copy_user); __asm_copy_user_cont(to, from, ret, \ " GETB D1Ar1,[%1++]\n" \ "2: SETB [%0++],D1Ar1\n", \ - "3: ADD %2,%2,#1\n" \ - " SETB [%0++],D1Ar1\n", \ + "3: ADD %2,%2,#1\n", \ " .long 2b,3b\n") #define __asm_copy_from_user_2x_cont(to, from, ret, COPY, FIXUP, TENTRY) \ __asm_copy_user_cont(to, from, ret, \ " GETW D1Ar1,[%1++]\n" \ "2: SETW [%0++],D1Ar1\n" COPY, \ - "3: ADD %2,%2,#2\n" \ - " SETW [%0++],D1Ar1\n" FIXUP, \ + "3: ADD %2,%2,#2\n" FIXUP, \ " .long 2b,3b\n" TENTRY) #define __asm_copy_from_user_2(to, from, ret) \ @@ -656,32 +653,26 @@ EXPORT_SYMBOL(__copy_user); __asm_copy_from_user_2x_cont(to, from, ret, \ " GETB D1Ar1,[%1++]\n" \ "4: SETB [%0++],D1Ar1\n", \ - "5: ADD %2,%2,#1\n" \ - " SETB [%0++],D1Ar1\n", \ + "5: ADD %2,%2,#1\n", \ " .long 4b,5b\n") #define __asm_copy_from_user_4x_cont(to, from, ret, COPY, FIXUP, TENTRY) \ __asm_copy_user_cont(to, from, ret, \ " GETD D1Ar1,[%1++]\n" \ "2: SETD [%0++],D1Ar1\n" COPY, \ - "3: ADD %2,%2,#4\n" \ - " SETD [%0++],D1Ar1\n" FIXUP, \ + "3: ADD %2,%2,#4\n" FIXUP, \ " .long 2b,3b\n" TENTRY) #define __asm_copy_from_user_4(to, from, ret) \ __asm_copy_from_user_4x_cont(to, from, ret, "", "", "") - #define __asm_copy_from_user_8x64(to, from, ret) \ asm volatile ( \ " GETL D0Ar2,D1Ar1,[%1++]\n" \ "2: SETL [%0++],D0Ar2,D1Ar1\n" \ "1:\n" \ " .section .fixup,\"ax\"\n" \ - " MOV D1Ar1,#0\n" \ - " MOV D0Ar2,#0\n" \ "3: ADD %2,%2,#8\n" \ - " SETL [%0++],D0Ar2,D1Ar1\n" \ " MOVT D0Ar2,#HI(1b)\n" \ " JUMP D0Ar2,#LO(1b)\n" \ " .previous\n" \ @@ -721,11 +712,12 @@ EXPORT_SYMBOL(__copy_user); "SUB %1, %1, #4\n") -/* Copy from user to kernel, zeroing the bytes that were inaccessible in - userland. The return-value is the number of bytes that were - inaccessible. */ -unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, - unsigned long n) +/* + * Copy from user to kernel. The return-value is the number of bytes that were + * inaccessible. + */ +unsigned long raw_copy_from_user(void *pdst, const void __user *psrc, + unsigned long n) { register char *dst asm ("A0.2") = pdst; register const char __user *src asm ("A1.2") = psrc; @@ -738,7 +730,7 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, __asm_copy_from_user_1(dst, src, retn); n--; if (retn) - goto copy_exception_bytes; + return retn + n; } if ((unsigned long) dst & 1) { /* Worst case - byte copy */ @@ -746,14 +738,14 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, __asm_copy_from_user_1(dst, src, retn); n--; if (retn) - goto copy_exception_bytes; + return retn + n; } } if (((unsigned long) src & 2) && n >= 2) { __asm_copy_from_user_2(dst, src, retn); n -= 2; if (retn) - goto copy_exception_bytes; + return retn + n; } if ((unsigned long) dst & 2) { /* Second worst case - word copy */ @@ -761,7 +753,7 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, __asm_copy_from_user_2(dst, src, retn); n -= 2; if (retn) - goto copy_exception_bytes; + return retn + n; } } @@ -777,7 +769,7 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, __asm_copy_from_user_8x64(dst, src, retn); n -= 8; if (retn) - goto copy_exception_bytes; + return retn + n; } } @@ -793,7 +785,7 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, __asm_copy_from_user_8x64(dst, src, retn); n -= 8; if (retn) - goto copy_exception_bytes; + return retn + n; } } #endif @@ -803,7 +795,7 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, n -= 4; if (retn) - goto copy_exception_bytes; + return retn + n; } /* If we get here, there were no memory read faults. */ @@ -829,21 +821,8 @@ unsigned long __copy_user_zeroing(void *pdst, const void __user *psrc, /* If we get here, retn correctly reflects the number of failing bytes. */ return retn; - - copy_exception_bytes: - /* We already have "retn" bytes cleared, and need to clear the - remaining "n" bytes. A non-optimized simple byte-for-byte in-line - memset is preferred here, since this isn't speed-critical code and - we'd rather have this a leaf-function than calling memset. */ - { - char *endp; - for (endp = dst + n; dst < endp; dst++) - *dst = 0; - } - - return retn + n; } -EXPORT_SYMBOL(__copy_user_zeroing); +EXPORT_SYMBOL(raw_copy_from_user); #define __asm_clear_8x64(to, ret) \ asm volatile ( \ From beb0ad97ad099ac99f0354e195bd129586a60694 Mon Sep 17 00:00:00 2001 From: James Hogan Date: Tue, 4 Apr 2017 11:43:26 +0100 Subject: [PATCH 328/521] metag/usercopy: Set flags before ADDZ commit fd40eee1290ad7add7aa665e3ce6b0f9fe9734b4 upstream. The fixup code for the copy_to_user rapf loops reads TXStatus.LSM_STEP to decide how far to rewind the source pointer. There is a special case for the last execution of an MGETL/MGETD, since it leaves LSM_STEP=0 even though the number of MGETLs/MGETDs attempted was 4. This uses ADDZ which is conditional upon the Z condition flag, but the AND instruction which masked the TXStatus.LSM_STEP field didn't set the condition flags based on the result. Fix that now by using ANDS which does set the flags, and also marking the condition codes as clobbered by the inline assembly. Fixes: 373cd784d0fc ("metag: Memory handling") Signed-off-by: James Hogan Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/metag/lib/usercopy.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/metag/lib/usercopy.c b/arch/metag/lib/usercopy.c index e1d553872fd7..4422928a1746 100644 --- a/arch/metag/lib/usercopy.c +++ b/arch/metag/lib/usercopy.c @@ -315,7 +315,7 @@ " .previous\n" \ : "=r" (to), "=r" (from), "=r" (ret), "=d" (n) \ : "0" (to), "1" (from), "2" (ret), "3" (n) \ - : "D1Ar1", "D0Ar2", "memory") + : "D1Ar1", "D0Ar2", "cc", "memory") /* rewind 'to' and 'from' pointers when a fault occurs * @@ -341,7 +341,7 @@ #define __asm_copy_to_user_64bit_rapf_loop(to, from, ret, n, id)\ __asm_copy_user_64bit_rapf_loop(to, from, ret, n, id, \ "LSR D0Ar2, D0Ar2, #8\n" \ - "AND D0Ar2, D0Ar2, #0x7\n" \ + "ANDS D0Ar2, D0Ar2, #0x7\n" \ "ADDZ D0Ar2, D0Ar2, #4\n" \ "SUB D0Ar2, D0Ar2, #1\n" \ "MOV D1Ar1, #4\n" \ @@ -486,7 +486,7 @@ " .previous\n" \ : "=r" (to), "=r" (from), "=r" (ret), "=d" (n) \ : "0" (to), "1" (from), "2" (ret), "3" (n) \ - : "D1Ar1", "D0Ar2", "memory") + : "D1Ar1", "D0Ar2", "cc", "memory") /* rewind 'to' and 'from' pointers when a fault occurs * @@ -512,7 +512,7 @@ #define __asm_copy_to_user_32bit_rapf_loop(to, from, ret, n, id)\ __asm_copy_user_32bit_rapf_loop(to, from, ret, n, id, \ "LSR D0Ar2, D0Ar2, #8\n" \ - "AND D0Ar2, D0Ar2, #0x7\n" \ + "ANDS D0Ar2, D0Ar2, #0x7\n" \ "ADDZ D0Ar2, D0Ar2, #4\n" \ "SUB D0Ar2, D0Ar2, #1\n" \ "MOV D1Ar1, #4\n" \ From 3040ecd4253a4ef996e6f940801ee4b80b01c87a Mon Sep 17 00:00:00 2001 From: James Hogan Date: Mon, 3 Apr 2017 17:41:40 +0100 Subject: [PATCH 329/521] metag/usercopy: Fix src fixup in from user rapf loops commit 2c0b1df88b987a12d95ea1d6beaf01894f3cc725 upstream. The fixup code to rewind the source pointer in __asm_copy_from_user_{32,64}bit_rapf_loop() always rewound the source by a single unit (4 or 8 bytes), however this is insufficient if the fault didn't occur on the first load in the loop, as the source pointer will have been incremented but nothing will have been stored until all 4 register [pairs] are loaded. Read the LSM_STEP field of TXSTATUS (which is already loaded into a register), a bit like the copy_to_user versions, to determine how many iterations of MGET[DL] have taken place, all of which need rewinding. Fixes: 373cd784d0fc ("metag: Memory handling") Signed-off-by: James Hogan Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/metag/lib/usercopy.c | 36 ++++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/arch/metag/lib/usercopy.c b/arch/metag/lib/usercopy.c index 4422928a1746..e09c95ba028c 100644 --- a/arch/metag/lib/usercopy.c +++ b/arch/metag/lib/usercopy.c @@ -687,29 +687,49 @@ EXPORT_SYMBOL(__copy_user); * * Rationale: * A fault occurs while reading from user buffer, which is the - * source. Since the fault is at a single address, we only - * need to rewind by 8 bytes. + * source. * Since we don't write to kernel buffer until we read first, * the kernel buffer is at the right state and needn't be - * corrected. + * corrected, but the source must be rewound to the beginning of + * the block, which is LSM_STEP*8 bytes. + * LSM_STEP is bits 10:8 in TXSTATUS which is already read + * and stored in D0Ar2 + * + * NOTE: If a fault occurs at the last operation in M{G,S}ETL + * LSM_STEP will be 0. ie: we do 4 writes in our case, if + * a fault happens at the 4th write, LSM_STEP will be 0 + * instead of 4. The code copes with that. */ #define __asm_copy_from_user_64bit_rapf_loop(to, from, ret, n, id) \ __asm_copy_user_64bit_rapf_loop(to, from, ret, n, id, \ - "SUB %1, %1, #8\n") + "LSR D0Ar2, D0Ar2, #5\n" \ + "ANDS D0Ar2, D0Ar2, #0x38\n" \ + "ADDZ D0Ar2, D0Ar2, #32\n" \ + "SUB %1, %1, D0Ar2\n") /* rewind 'from' pointer when a fault occurs * * Rationale: * A fault occurs while reading from user buffer, which is the - * source. Since the fault is at a single address, we only - * need to rewind by 4 bytes. + * source. * Since we don't write to kernel buffer until we read first, * the kernel buffer is at the right state and needn't be - * corrected. + * corrected, but the source must be rewound to the beginning of + * the block, which is LSM_STEP*4 bytes. + * LSM_STEP is bits 10:8 in TXSTATUS which is already read + * and stored in D0Ar2 + * + * NOTE: If a fault occurs at the last operation in M{G,S}ETL + * LSM_STEP will be 0. ie: we do 4 writes in our case, if + * a fault happens at the 4th write, LSM_STEP will be 0 + * instead of 4. The code copes with that. */ #define __asm_copy_from_user_32bit_rapf_loop(to, from, ret, n, id) \ __asm_copy_user_32bit_rapf_loop(to, from, ret, n, id, \ - "SUB %1, %1, #4\n") + "LSR D0Ar2, D0Ar2, #6\n" \ + "ANDS D0Ar2, D0Ar2, #0x1c\n" \ + "ADDZ D0Ar2, D0Ar2, #16\n" \ + "SUB %1, %1, D0Ar2\n") /* From 435cc436a88652046b9ca89fb56acf3a4b1a44b8 Mon Sep 17 00:00:00 2001 From: James Hogan Date: Tue, 4 Apr 2017 08:51:34 +0100 Subject: [PATCH 330/521] metag/usercopy: Add missing fixups commit b884a190afcecdbef34ca508ea5ee88bb7c77861 upstream. The rapf copy loops in the Meta usercopy code is missing some extable entries for HTP cores with unaligned access checking enabled, where faults occur on the instruction immediately after the faulting access. Add the fixup labels and extable entries for these cases so that corner case user copy failures don't cause kernel crashes. Fixes: 373cd784d0fc ("metag: Memory handling") Signed-off-by: James Hogan Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/metag/lib/usercopy.c | 84 +++++++++++++++++++++++++-------------- 1 file changed, 54 insertions(+), 30 deletions(-) diff --git a/arch/metag/lib/usercopy.c b/arch/metag/lib/usercopy.c index e09c95ba028c..2792fc621088 100644 --- a/arch/metag/lib/usercopy.c +++ b/arch/metag/lib/usercopy.c @@ -259,27 +259,31 @@ "MGETL D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ "22:\n" \ "MSETL [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ - "SUB %3, %3, #32\n" \ "23:\n" \ - "MGETL D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ + "SUB %3, %3, #32\n" \ "24:\n" \ + "MGETL D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ + "25:\n" \ "MSETL [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ + "26:\n" \ "SUB %3, %3, #32\n" \ "DCACHE [%1+#-64], D0Ar6\n" \ "BR $Lloop"id"\n" \ \ "MOV RAPF, %1\n" \ - "25:\n" \ - "MGETL D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ - "26:\n" \ - "MSETL [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ - "SUB %3, %3, #32\n" \ "27:\n" \ "MGETL D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ "28:\n" \ "MSETL [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ - "SUB %0, %0, #8\n" \ "29:\n" \ + "SUB %3, %3, #32\n" \ + "30:\n" \ + "MGETL D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ + "31:\n" \ + "MSETL [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ + "32:\n" \ + "SUB %0, %0, #8\n" \ + "33:\n" \ "SETL [%0++], D0.7, D1.7\n" \ "SUB %3, %3, #32\n" \ "1:" \ @@ -311,7 +315,11 @@ " .long 26b,3b\n" \ " .long 27b,3b\n" \ " .long 28b,3b\n" \ - " .long 29b,4b\n" \ + " .long 29b,3b\n" \ + " .long 30b,3b\n" \ + " .long 31b,3b\n" \ + " .long 32b,3b\n" \ + " .long 33b,4b\n" \ " .previous\n" \ : "=r" (to), "=r" (from), "=r" (ret), "=d" (n) \ : "0" (to), "1" (from), "2" (ret), "3" (n) \ @@ -402,47 +410,55 @@ "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ "22:\n" \ "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ - "SUB %3, %3, #16\n" \ "23:\n" \ - "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ - "24:\n" \ - "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ "SUB %3, %3, #16\n" \ - "25:\n" \ + "24:\n" \ "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ - "26:\n" \ + "25:\n" \ "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ + "26:\n" \ "SUB %3, %3, #16\n" \ "27:\n" \ "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ "28:\n" \ "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ + "29:\n" \ + "SUB %3, %3, #16\n" \ + "30:\n" \ + "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ + "31:\n" \ + "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ + "32:\n" \ "SUB %3, %3, #16\n" \ "DCACHE [%1+#-64], D0Ar6\n" \ "BR $Lloop"id"\n" \ \ "MOV RAPF, %1\n" \ - "29:\n" \ - "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ - "30:\n" \ - "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ - "SUB %3, %3, #16\n" \ - "31:\n" \ - "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ - "32:\n" \ - "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ - "SUB %3, %3, #16\n" \ "33:\n" \ "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ "34:\n" \ "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ - "SUB %3, %3, #16\n" \ "35:\n" \ - "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ + "SUB %3, %3, #16\n" \ "36:\n" \ - "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ - "SUB %0, %0, #4\n" \ + "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ "37:\n" \ + "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ + "38:\n" \ + "SUB %3, %3, #16\n" \ + "39:\n" \ + "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ + "40:\n" \ + "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ + "41:\n" \ + "SUB %3, %3, #16\n" \ + "42:\n" \ + "MGETD D0FrT, D0.5, D0.6, D0.7, [%1++]\n" \ + "43:\n" \ + "MSETD [%0++], D0FrT, D0.5, D0.6, D0.7\n" \ + "44:\n" \ + "SUB %0, %0, #4\n" \ + "45:\n" \ "SETD [%0++], D0.7\n" \ "SUB %3, %3, #16\n" \ "1:" \ @@ -482,7 +498,15 @@ " .long 34b,3b\n" \ " .long 35b,3b\n" \ " .long 36b,3b\n" \ - " .long 37b,4b\n" \ + " .long 37b,3b\n" \ + " .long 38b,3b\n" \ + " .long 39b,3b\n" \ + " .long 40b,3b\n" \ + " .long 41b,3b\n" \ + " .long 42b,3b\n" \ + " .long 43b,3b\n" \ + " .long 44b,3b\n" \ + " .long 45b,4b\n" \ " .previous\n" \ : "=r" (to), "=r" (from), "=r" (ret), "=d" (n) \ : "0" (to), "1" (from), "2" (ret), "3" (n) \ From a67004a3896eacd109a0138b5526957381fe4337 Mon Sep 17 00:00:00 2001 From: Frederic Barrat Date: Wed, 29 Mar 2017 19:19:42 +0200 Subject: [PATCH 331/521] powerpc/mm: Add missing global TLB invalidate if cxl is active commit 88b1bf7268f56887ca88eb09c6fb0f4fc970121a upstream. Commit 4c6d9acce1f4 ("powerpc/mm: Add hooks for cxl") converted local TLB invalidates to global if the cxl driver is active. This is necessary because the CAPP snoops invalidations to forward them to the PSL on the cxl adapter. However one path was forgotten. native_flush_hash_range() still does local TLB invalidates, as found out the hard way recently. This patch fixes it by following the same logic as previously: if the cxl driver is active, the local TLB invalidates are 'upgraded' to global. Fixes: 4c6d9acce1f4 ("powerpc/mm: Add hooks for cxl") Signed-off-by: Frederic Barrat Reviewed-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/mm/hash_native_64.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c index c8822af10a58..19d9b2d2d212 100644 --- a/arch/powerpc/mm/hash_native_64.c +++ b/arch/powerpc/mm/hash_native_64.c @@ -645,6 +645,10 @@ static void native_flush_hash_range(unsigned long number, int local) unsigned long psize = batch->psize; int ssize = batch->ssize; int i; + unsigned int use_local; + + use_local = local && mmu_has_feature(MMU_FTR_TLBIEL) && + mmu_psize_defs[psize].tlbiel && !cxl_ctx_in_use(); local_irq_save(flags); @@ -671,8 +675,7 @@ static void native_flush_hash_range(unsigned long number, int local) } pte_iterate_hashed_end(); } - if (mmu_has_feature(MMU_FTR_TLBIEL) && - mmu_psize_defs[psize].tlbiel && local) { + if (use_local) { asm volatile("ptesync":::"memory"); for (i = 0; i < number; i++) { vpn = batch->vpn[i]; From ca9bd55235b346da89dadc1821e37bb4ec22b7eb Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Tue, 4 Apr 2017 14:56:05 +1000 Subject: [PATCH 332/521] powerpc: Don't try to fix up misaligned load-with-reservation instructions commit 48fe9e9488743eec9b7c1addd3c93f12f2123d54 upstream. In the past, there was only one load-with-reservation instruction, lwarx, and if a program attempted a lwarx on a misaligned address, it would take an alignment interrupt and the kernel handler would emulate it as though it was lwzx, which was not really correct, but benign since it is loading the right amount of data, and the lwarx should be paired with a stwcx. to the same address, which would also cause an alignment interrupt which would result in a SIGBUS being delivered to the process. We now have 5 different sizes of load-with-reservation instruction. Of those, lharx and ldarx cause an immediate SIGBUS by luck since their entries in aligninfo[] overlap instructions which were not fixed up, but lqarx overlaps with lhz and will be emulated as such. lbarx can never generate an alignment interrupt since it only operates on 1 byte. To straighten this out and fix the lqarx case, this adds code to detect the l[hwdq]arx instructions and return without fixing them up, resulting in a SIGBUS being delivered to the process. Signed-off-by: Paul Mackerras Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/align.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index 86150fbb42c3..91e5c1758b5c 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -808,14 +808,25 @@ int fix_alignment(struct pt_regs *regs) nb = aligninfo[instr].len; flags = aligninfo[instr].flags; - /* ldbrx/stdbrx overlap lfs/stfs in the DSISR unfortunately */ - if (IS_XFORM(instruction) && ((instruction >> 1) & 0x3ff) == 532) { - nb = 8; - flags = LD+SW; - } else if (IS_XFORM(instruction) && - ((instruction >> 1) & 0x3ff) == 660) { - nb = 8; - flags = ST+SW; + /* + * Handle some cases which give overlaps in the DSISR values. + */ + if (IS_XFORM(instruction)) { + switch (get_xop(instruction)) { + case 532: /* ldbrx */ + nb = 8; + flags = LD+SW; + break; + case 660: /* stdbrx */ + nb = 8; + flags = ST+SW; + break; + case 20: /* lwarx */ + case 84: /* ldarx */ + case 116: /* lharx */ + case 276: /* lqarx */ + return 0; /* not emulated ever */ + } } /* Byteswap little endian loads and stores */ From 1c47303355dc970d692f3625839da43f6b969622 Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Sun, 2 Apr 2017 20:08:04 -0700 Subject: [PATCH 333/521] nios2: reserve boot memory for device tree commit 921d701e6f31e1ffaca3560416af1aa04edb4c4f upstream. Make sure to reserve the boot memory for the flattened device tree. Otherwise it might get overwritten, e.g. when initial_boot_params is copied, leading to a corrupted FDT and a boot hang/crash: bootconsole [early0] enabled Early console on uart16650 initialized at 0xf8001600 OF: fdt: Error -11 processing FDT Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! ---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! Guenter Roeck says: > I think I found the problem. In unflatten_and_copy_device_tree(), with added > debug information: > > OF: fdt: initial_boot_params=c861e400, dt=c861f000 size=28874 (0x70ca) > > ... and then initial_boot_params is copied to dt, which results in corrupted > fdt since the memory overlaps. Looks like the initial_boot_params memory > is not reserved and (re-)allocated by early_init_dt_alloc_memory_arch(). Reported-by: Guenter Roeck Reference: http://lkml.kernel.org/r/20170226210338.GA19476@roeck-us.net Tested-by: Guenter Roeck Signed-off-by: Tobias Klauser Acked-by: Ley Foon Tan Signed-off-by: Greg Kroah-Hartman --- arch/nios2/kernel/prom.c | 7 +++++++ arch/nios2/kernel/setup.c | 3 +++ 2 files changed, 10 insertions(+) diff --git a/arch/nios2/kernel/prom.c b/arch/nios2/kernel/prom.c index 718dd197909f..de73beb36910 100644 --- a/arch/nios2/kernel/prom.c +++ b/arch/nios2/kernel/prom.c @@ -48,6 +48,13 @@ void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align) return alloc_bootmem_align(size, align); } +int __init early_init_dt_reserve_memory_arch(phys_addr_t base, phys_addr_t size, + bool nomap) +{ + reserve_bootmem(base, size, BOOTMEM_DEFAULT); + return 0; +} + void __init early_init_devtree(void *params) { __be32 *dtb = (u32 *)__dtb_start; diff --git a/arch/nios2/kernel/setup.c b/arch/nios2/kernel/setup.c index a4ff86d58d5c..6c4e351a7930 100644 --- a/arch/nios2/kernel/setup.c +++ b/arch/nios2/kernel/setup.c @@ -195,6 +195,9 @@ void __init setup_arch(char **cmdline_p) } #endif /* CONFIG_BLK_DEV_INITRD */ + early_init_fdt_reserve_self(); + early_init_fdt_scan_reserved_mem(); + unflatten_and_copy_device_tree(); setup_cpuinfo(); From 765ee8ce4e3d059378aefc40666b024e4cd494f2 Mon Sep 17 00:00:00 2001 From: Marcelo Henrique Cerri Date: Mon, 13 Mar 2017 12:14:58 -0300 Subject: [PATCH 334/521] s390/decompressor: fix initrd corruption caused by bss clear commit d82c0d12c92705ef468683c9b7a8298dd61ed191 upstream. Reorder the operations in decompress_kernel() to ensure initrd is moved to a safe location before the bss section is zeroed. During decompression bss can overlap with the initrd and this can corrupt the initrd contents depending on the size of the compressed kernel (which affects where the initrd is placed by the bootloader) and the size of the bss section of the decompressor. Also use the correct initrd size when checking for overlaps with parmblock. Fixes: 06c0dd72aea3 ([S390] fix boot failures with compressed kernels) Reviewed-by: Joy Latten Reviewed-by: Vineetha HariPai Signed-off-by: Marcelo Henrique Cerri Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/boot/compressed/misc.c | 35 +++++++++++++++++--------------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/arch/s390/boot/compressed/misc.c b/arch/s390/boot/compressed/misc.c index 4da604ebf6fd..ca15613eaaa4 100644 --- a/arch/s390/boot/compressed/misc.c +++ b/arch/s390/boot/compressed/misc.c @@ -141,31 +141,34 @@ static void check_ipl_parmblock(void *start, unsigned long size) unsigned long decompress_kernel(void) { - unsigned long output_addr; - unsigned char *output; + void *output, *kernel_end; - output_addr = ((unsigned long) &_end + HEAP_SIZE + 4095UL) & -4096UL; - check_ipl_parmblock((void *) 0, output_addr + SZ__bss_start); - memset(&_bss, 0, &_ebss - &_bss); - free_mem_ptr = (unsigned long)&_end; - free_mem_end_ptr = free_mem_ptr + HEAP_SIZE; - output = (unsigned char *) output_addr; + output = (void *) ALIGN((unsigned long) &_end + HEAP_SIZE, PAGE_SIZE); + kernel_end = output + SZ__bss_start; + check_ipl_parmblock((void *) 0, (unsigned long) kernel_end); #ifdef CONFIG_BLK_DEV_INITRD /* * Move the initrd right behind the end of the decompressed - * kernel image. + * kernel image. This also prevents initrd corruption caused by + * bss clearing since kernel_end will always be located behind the + * current bss section.. */ - if (INITRD_START && INITRD_SIZE && - INITRD_START < (unsigned long) output + SZ__bss_start) { - check_ipl_parmblock(output + SZ__bss_start, - INITRD_START + INITRD_SIZE); - memmove(output + SZ__bss_start, - (void *) INITRD_START, INITRD_SIZE); - INITRD_START = (unsigned long) output + SZ__bss_start; + if (INITRD_START && INITRD_SIZE && kernel_end > (void *) INITRD_START) { + check_ipl_parmblock(kernel_end, INITRD_SIZE); + memmove(kernel_end, (void *) INITRD_START, INITRD_SIZE); + INITRD_START = (unsigned long) kernel_end; } #endif + /* + * Clear bss section. free_mem_ptr and free_mem_end_ptr need to be + * initialized afterwards since they reside in bss. + */ + memset(&_bss, 0, &_ebss - &_bss); + free_mem_ptr = (unsigned long) &_end; + free_mem_end_ptr = free_mem_ptr + HEAP_SIZE; + puts("Uncompressing Linux... "); __decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error); puts("Ok, booting the kernel.\n"); From 0f5d17253b2868a3e75d623dcb2514e305bc7447 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 27 Mar 2017 09:48:04 +0200 Subject: [PATCH 335/521] s390/uaccess: get_user() should zero on failure (again) commit d09c5373e8e4eaaa09233552cbf75dc4c4f21203 upstream. Commit fd2d2b191fe7 ("s390: get_user() should zero on failure") intended to fix s390's get_user() implementation which did not zero the target operand if the read from user space faulted. Unfortunately the patch has no effect: the corresponding inline assembly specifies that the operand is only written to ("=") and the previous value is discarded. Therefore the compiler is free to and actually does omit the zero initialization. To fix this simply change the contraint modifier to "+", so the compiler cannot omit the initialization anymore. Fixes: c9ca78415ac1 ("s390/uaccess: provide inline variants of get_user/put_user") Fixes: fd2d2b191fe7 ("s390: get_user() should zero on failure") Cc: Al Viro Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h index 5c7381c5ad7f..c8d837f0fbbc 100644 --- a/arch/s390/include/asm/uaccess.h +++ b/arch/s390/include/asm/uaccess.h @@ -150,7 +150,7 @@ unsigned long __must_check __copy_to_user(void __user *to, const void *from, " jg 2b\n" \ ".popsection\n" \ EX_TABLE(0b,3b) EX_TABLE(1b,3b) \ - : "=d" (__rc), "=Q" (*(to)) \ + : "=d" (__rc), "+Q" (*(to)) \ : "d" (size), "Q" (*(from)), \ "d" (__reg0), "K" (-EFAULT) \ : "cc"); \ From 394d71b1ea24c248a8d497d10635b86dd2fccef7 Mon Sep 17 00:00:00 2001 From: James Hogan Date: Thu, 16 Feb 2017 12:39:01 +0000 Subject: [PATCH 336/521] MIPS: Force o32 fp64 support on 32bit MIPS64r6 kernels commit 2e6c7747730296a6d4fd700894286db1132598c4 upstream. When a 32-bit kernel is configured to support MIPS64r6 (CPU_MIPS64_R6), MIPS_O32_FP64_SUPPORT won't be selected as it should be because MIPS32_O32 is disabled (o32 is already the default ABI available on 32-bit kernels). This results in userland FP breakage as CP0_Status.FR is read-only 1 since r6 (when an FPU is present) so __enable_fpu() will fail to clear FR. This causes the FPU emulator to get used which will incorrectly emulate 32-bit FPU registers. Force o32 fp64 support in this case by also selecting MIPS_O32_FP64_SUPPORT from CPU_MIPS64_R6 if 32BIT. Fixes: 4e9d324d4288 ("MIPS: Require O32 FP64 support for MIPS64 with O32 compat") Signed-off-by: James Hogan Reviewed-by: Paul Burton Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15310/ Signed-off-by: James Hogan Signed-off-by: Greg Kroah-Hartman --- arch/mips/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index db459612de44..75bfca69e418 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -1412,7 +1412,7 @@ config CPU_MIPS32_R6 select CPU_SUPPORTS_MSA select GENERIC_CSUM select HAVE_KVM - select MIPS_O32_FP64_SUPPORT + select MIPS_O32_FP64_SUPPORT if 32BIT help Choose this option to build a kernel for release 6 or later of the MIPS32 architecture. New MIPS processors, starting with the Warrior From 22665fe0a60a73734889e1cfc7f8fba4036e0b9a Mon Sep 17 00:00:00 2001 From: John Crispin Date: Sat, 25 Feb 2017 11:54:23 +0100 Subject: [PATCH 337/521] MIPS: ralink: Fix typos in rt3883 pinctrl commit 7c5a3d813050ee235817b0220dd8c42359a9efd8 upstream. There are two copy & paste errors in the definition of the 5GHz LNA and second ethernet pinmux. Fixes: f576fb6a0700 ("MIPS: ralink: cleanup the soc specific pinmux data") Signed-off-by: John Crispin Signed-off-by: Daniel Golle Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15328/ Signed-off-by: James Hogan Signed-off-by: Greg Kroah-Hartman --- arch/mips/ralink/rt3883.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/mips/ralink/rt3883.c b/arch/mips/ralink/rt3883.c index f42834c7f007..3c575093f8f1 100644 --- a/arch/mips/ralink/rt3883.c +++ b/arch/mips/ralink/rt3883.c @@ -36,7 +36,7 @@ static struct rt2880_pmx_func uartlite_func[] = { FUNC("uartlite", 0, 15, 2) }; static struct rt2880_pmx_func jtag_func[] = { FUNC("jtag", 0, 17, 5) }; static struct rt2880_pmx_func mdio_func[] = { FUNC("mdio", 0, 22, 2) }; static struct rt2880_pmx_func lna_a_func[] = { FUNC("lna a", 0, 32, 3) }; -static struct rt2880_pmx_func lna_g_func[] = { FUNC("lna a", 0, 35, 3) }; +static struct rt2880_pmx_func lna_g_func[] = { FUNC("lna g", 0, 35, 3) }; static struct rt2880_pmx_func pci_func[] = { FUNC("pci-dev", 0, 40, 32), FUNC("pci-host2", 1, 40, 32), @@ -44,7 +44,7 @@ static struct rt2880_pmx_func pci_func[] = { FUNC("pci-fnc", 3, 40, 32) }; static struct rt2880_pmx_func ge1_func[] = { FUNC("ge1", 0, 72, 12) }; -static struct rt2880_pmx_func ge2_func[] = { FUNC("ge1", 0, 84, 12) }; +static struct rt2880_pmx_func ge2_func[] = { FUNC("ge2", 0, 84, 12) }; static struct rt2880_pmx_group rt3883_pinmux_data[] = { GRP("i2c", i2c_func, 1, RT3883_GPIO_MODE_I2C), From 768019030ab58e9622caeb6c5a06553260609327 Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Thu, 23 Feb 2017 14:50:24 +0000 Subject: [PATCH 338/521] MIPS: End spinlocks with .insn commit 4b5347a24a0f2d3272032c120664b484478455de upstream. When building for microMIPS we need to ensure that the assembler always knows that there is code at the target of a branch or jump. Recent toolchains will fail to link a microMIPS kernel when this isn't the case due to what it thinks is a branch to non-microMIPS code. mips-mti-linux-gnu-ld kernel/built-in.o: .spinlock.text+0x2fc: Unsupported branch between ISA modes. mips-mti-linux-gnu-ld final link failed: Bad value This is due to inline assembly labels in spinlock.h not being followed by an instruction mnemonic, either due to a .subsection pseudo-op or the end of the inline asm block. Fix this with a .insn direction after such labels. Signed-off-by: Paul Burton Signed-off-by: James Hogan Reviewed-by: Maciej W. Rozycki Cc: Ralf Baechle Cc: Peter Zijlstra Cc: Ingo Molnar Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/15325/ Signed-off-by: James Hogan Signed-off-by: Greg Kroah-Hartman --- arch/mips/include/asm/spinlock.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h index 40196bebe849..2365ce0ad8f2 100644 --- a/arch/mips/include/asm/spinlock.h +++ b/arch/mips/include/asm/spinlock.h @@ -112,7 +112,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) " andi %[ticket], %[ticket], 0xffff \n" " bne %[ticket], %[my_ticket], 4f \n" " subu %[ticket], %[my_ticket], %[ticket] \n" - "2: \n" + "2: .insn \n" " .subsection 2 \n" "4: andi %[ticket], %[ticket], 0xffff \n" " sll %[ticket], 5 \n" @@ -187,7 +187,7 @@ static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock) " sc %[ticket], %[ticket_ptr] \n" " beqz %[ticket], 1b \n" " li %[ticket], 1 \n" - "2: \n" + "2: .insn \n" " .subsection 2 \n" "3: b 2b \n" " li %[ticket], 0 \n" @@ -367,7 +367,7 @@ static inline int arch_read_trylock(arch_rwlock_t *rw) " .set reorder \n" __WEAK_LLSC_MB " li %2, 1 \n" - "2: \n" + "2: .insn \n" : "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp), "=&r" (ret) : GCC_OFF_SMALL_ASM() (rw->lock) : "memory"); @@ -407,7 +407,7 @@ static inline int arch_write_trylock(arch_rwlock_t *rw) " lui %1, 0x8000 \n" " sc %1, %0 \n" " li %2, 1 \n" - "2: \n" + "2: .insn \n" : "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp), "=&r" (ret) : GCC_OFF_SMALL_ASM() (rw->lock) From 55f67b97ca05df00c3b61123a7e9e363819c60ee Mon Sep 17 00:00:00 2001 From: Hauke Mehrtens Date: Wed, 15 Mar 2017 23:26:42 +0100 Subject: [PATCH 339/521] MIPS: Lantiq: fix missing xbar kernel panic commit 6ef90877eee63a0d03e83183bb44b64229b624e6 upstream. Commit 08b3c894e565 ("MIPS: lantiq: Disable xbar fpi burst mode") accidentally requested the resources from the pmu address region instead of the xbar registers region, but the check for the return value of request_mem_region() was wrong. Commit 98ea51cb0c8c ("MIPS: Lantiq: Fix another request_mem_region() return code check") fixed the check of the return value of request_mem_region() which made the kernel panics. This patch now makes use of the correct memory region for the cross bar. Fixes: 08b3c894e565 ("MIPS: lantiq: Disable xbar fpi burst mode") Signed-off-by: Hauke Mehrtens Cc: John Crispin Cc: james.hogan@imgtec.com Cc: arnd@arndb.de Cc: sergei.shtylyov@cogentembedded.com Cc: john@phrozen.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15751 Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman --- arch/mips/lantiq/xway/sysctrl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/lantiq/xway/sysctrl.c b/arch/mips/lantiq/xway/sysctrl.c index 3e390a4e3897..daf580ce5ca2 100644 --- a/arch/mips/lantiq/xway/sysctrl.c +++ b/arch/mips/lantiq/xway/sysctrl.c @@ -467,7 +467,7 @@ void __init ltq_soc_init(void) if (!np_xbar) panic("Failed to load xbar nodes from devicetree"); - if (of_address_to_resource(np_pmu, 0, &res_xbar)) + if (of_address_to_resource(np_xbar, 0, &res_xbar)) panic("Failed to get xbar resources"); if (request_mem_region(res_xbar.start, resource_size(&res_xbar), res_xbar.name) < 0) From 2d1af1b7025f96c86938af7f7c54d60adb4773fe Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Thu, 16 Mar 2017 21:00:27 +0800 Subject: [PATCH 340/521] MIPS: Flush wrong invalid FTLB entry for huge page commit 0115f6cbf26663c86496bc56eeea293f85b77897 upstream. On VTLB+FTLB platforms (such as Loongson-3A R2), FTLB's pagesize is usually configured the same as PAGE_SIZE. In such a case, Huge page entry is not suitable to write in FTLB. Unfortunately, when a huge page is created, its page table entries haven't created immediately. Then the TLB refill handler will fetch an invalid page table entry which has no "HUGE" bit, and this entry may be written to FTLB. Since it is invalid, TLB load/store handler will then use tlbwi to write the valid entry at the same place. However, the valid entry is a huge page entry which isn't suitable for FTLB. Our solution is to modify build_huge_handler_tail. Flush the invalid old entry (whether it is in FTLB or VTLB, this is in order to reduce branches) and use tlbwr to write the valid new entry. Signed-off-by: Rui Wang Signed-off-by: Huacai Chen Cc: John Crispin Cc: Steven J . Hill Cc: Fuxin Zhang Cc: Zhangjin Wu Cc: Huacai Chen Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15754/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman --- arch/mips/mm/tlbex.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c index 29f73e00253d..63b7d6f82d24 100644 --- a/arch/mips/mm/tlbex.c +++ b/arch/mips/mm/tlbex.c @@ -757,7 +757,8 @@ static void build_huge_update_entries(u32 **p, unsigned int pte, static void build_huge_handler_tail(u32 **p, struct uasm_reloc **r, struct uasm_label **l, unsigned int pte, - unsigned int ptr) + unsigned int ptr, + unsigned int flush) { #ifdef CONFIG_SMP UASM_i_SC(p, pte, 0, ptr); @@ -766,6 +767,22 @@ static void build_huge_handler_tail(u32 **p, struct uasm_reloc **r, #else UASM_i_SW(p, pte, 0, ptr); #endif + if (cpu_has_ftlb && flush) { + BUG_ON(!cpu_has_tlbinv); + + UASM_i_MFC0(p, ptr, C0_ENTRYHI); + uasm_i_ori(p, ptr, ptr, MIPS_ENTRYHI_EHINV); + UASM_i_MTC0(p, ptr, C0_ENTRYHI); + build_tlb_write_entry(p, l, r, tlb_indexed); + + uasm_i_xori(p, ptr, ptr, MIPS_ENTRYHI_EHINV); + UASM_i_MTC0(p, ptr, C0_ENTRYHI); + build_huge_update_entries(p, pte, ptr); + build_huge_tlb_write_entry(p, l, r, pte, tlb_random, 0); + + return; + } + build_huge_update_entries(p, pte, ptr); build_huge_tlb_write_entry(p, l, r, pte, tlb_indexed, 0); } @@ -2082,7 +2099,7 @@ static void build_r4000_tlb_load_handler(void) uasm_l_tlbl_goaround2(&l, p); } uasm_i_ori(&p, wr.r1, wr.r1, (_PAGE_ACCESSED | _PAGE_VALID)); - build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2); + build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2, 1); #endif uasm_l_nopage_tlbl(&l, p); @@ -2137,7 +2154,7 @@ static void build_r4000_tlb_store_handler(void) build_tlb_probe_entry(&p); uasm_i_ori(&p, wr.r1, wr.r1, _PAGE_ACCESSED | _PAGE_MODIFIED | _PAGE_VALID | _PAGE_DIRTY); - build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2); + build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2, 1); #endif uasm_l_nopage_tlbs(&l, p); @@ -2193,7 +2210,7 @@ static void build_r4000_tlb_modify_handler(void) build_tlb_probe_entry(&p); uasm_i_ori(&p, wr.r1, wr.r1, _PAGE_ACCESSED | _PAGE_MODIFIED | _PAGE_VALID | _PAGE_DIRTY); - build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2); + build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2, 0); #endif uasm_l_nopage_tlbm(&l, p); From b73d08ce20c5cb2e0cec8c019a27b9574e2c4ec2 Mon Sep 17 00:00:00 2001 From: Chris Salls Date: Fri, 7 Apr 2017 23:48:11 -0700 Subject: [PATCH 341/521] mm/mempolicy.c: fix error handling in set_mempolicy and mbind. commit cf01fb9985e8deb25ccf0ea54d916b8871ae0e62 upstream. In the case that compat_get_bitmap fails we do not want to copy the bitmap to the user as it will contain uninitialized stack data and leak sensitive data. Signed-off-by: Chris Salls Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/mempolicy.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index a4217fe60dff..e09b1a0e2cfe 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1492,7 +1492,6 @@ COMPAT_SYSCALL_DEFINE5(get_mempolicy, int __user *, policy, COMPAT_SYSCALL_DEFINE3(set_mempolicy, int, mode, compat_ulong_t __user *, nmask, compat_ulong_t, maxnode) { - long err = 0; unsigned long __user *nm = NULL; unsigned long nr_bits, alloc_size; DECLARE_BITMAP(bm, MAX_NUMNODES); @@ -1501,14 +1500,13 @@ COMPAT_SYSCALL_DEFINE3(set_mempolicy, int, mode, compat_ulong_t __user *, nmask, alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8; if (nmask) { - err = compat_get_bitmap(bm, nmask, nr_bits); + if (compat_get_bitmap(bm, nmask, nr_bits)) + return -EFAULT; nm = compat_alloc_user_space(alloc_size); - err |= copy_to_user(nm, bm, alloc_size); + if (copy_to_user(nm, bm, alloc_size)) + return -EFAULT; } - if (err) - return -EFAULT; - return sys_set_mempolicy(mode, nm, nr_bits+1); } @@ -1516,7 +1514,6 @@ COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len, compat_ulong_t, mode, compat_ulong_t __user *, nmask, compat_ulong_t, maxnode, compat_ulong_t, flags) { - long err = 0; unsigned long __user *nm = NULL; unsigned long nr_bits, alloc_size; nodemask_t bm; @@ -1525,14 +1522,13 @@ COMPAT_SYSCALL_DEFINE6(mbind, compat_ulong_t, start, compat_ulong_t, len, alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8; if (nmask) { - err = compat_get_bitmap(nodes_addr(bm), nmask, nr_bits); + if (compat_get_bitmap(nodes_addr(bm), nmask, nr_bits)) + return -EFAULT; nm = compat_alloc_user_space(alloc_size); - err |= copy_to_user(nm, nodes_addr(bm), alloc_size); + if (copy_to_user(nm, nodes_addr(bm), alloc_size)) + return -EFAULT; } - if (err) - return -EFAULT; - return sys_mbind(start, len, mode, nm, nr_bits+1, flags); } From ec5e61608ad1919c1ff3cc0369dbf1b1ede9eb88 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 12 Apr 2017 12:38:50 +0200 Subject: [PATCH 342/521] Linux 4.4.61 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index fb7c2b40753d..ef5045b8201d 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 60 +SUBLEVEL = 61 EXTRAVERSION = NAME = Blurry Fish Butt From cb0a2cba62d58caf6668f630858acc15ed40ee23 Mon Sep 17 00:00:00 2001 From: Mika Kuoppala Date: Wed, 15 Feb 2017 15:52:59 +0200 Subject: [PATCH 343/521] drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 34dc8993eef63681b062871413a9484008a2a78f upstream. Certain Baytrails, namely the 4 cpu core variants, have been plaqued by spurious system hangs, mostly occurring with light loads. Multiple bisects by various people point to a commit which changes the reclocking strategy for Baytrail to follow its bigger brethen: commit 8fb55197e64d ("drm/i915: Agressive downclocking on Baytrail") There is also a review comment attached to this commit from Deepak S on avoiding punit access on Cherryview and thus it was excluded on common reclocking path. By taking the same approach and omitting the punit access by not tweaking the thresholds when the hardware has been asked to move into different frequency, considerable gains in stability have been observed. With J1900 box, light render/video load would end up in system hang in usually less than 12 hours. With this patch applied, the cumulative uptime has now been 34 days without issues. To provoke system hang, light loads on both render and bsd engines in parallel have been used: glxgears >/dev/null 2>/dev/null & mpv --vo=vaapi --hwdec=vaapi --loop=inf vid.mp4 So far, author has not witnessed system hang with above load and this patch applied. Reports from the tenacious people at kernel bugzilla are also promising. Considering that the punit access frequency with this patch is considerably less, there is a possibility that this will push the, still unknown, root cause past the triggering point on most loads. But as we now can reliably reproduce the hang independently, we can reduce the pain that users are having and use a static thresholds until a root cause is found. v3: don't break debugfs and simplification (Chris Wilson) References: https://bugzilla.kernel.org/show_bug.cgi?id=109051 Cc: Chris Wilson Cc: Ville Syrjälä Cc: Len Brown Cc: Daniel Vetter Cc: Jani Nikula Cc: fritsch@xbmc.org Cc: miku@iki.fi Cc: Ezequiel Garcia CC: Michal Feix Cc: Hans de Goede Cc: Deepak S Cc: Jarkko Nikula Acked-by: Daniel Vetter Acked-by: Chris Wilson Signed-off-by: Mika Kuoppala Link: http://patchwork.freedesktop.org/patch/msgid/1487166779-26945-1-git-send-email-mika.kuoppala@intel.com (cherry picked from commit 6067a27d1f0184596d51decbac1c1fdc4acb012f) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/intel_pm.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index e7c18519274a..e4031fcac4bf 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -4376,6 +4376,12 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) break; } + /* When byt can survive without system hang with dynamic + * sw freq adjustments, this restriction can be lifted. + */ + if (IS_VALLEYVIEW(dev_priv)) + goto skip_hw_write; + I915_WRITE(GEN6_RP_UP_EI, GT_INTERVAL_FROM_US(dev_priv, ei_up)); I915_WRITE(GEN6_RP_UP_THRESHOLD, @@ -4394,6 +4400,7 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) GEN6_RP_UP_BUSY_AVG | GEN6_RP_DOWN_IDLE_AVG); +skip_hw_write: dev_priv->rps.power = new_power; dev_priv->rps.up_threshold = threshold_up; dev_priv->rps.down_threshold = threshold_down; From 8cfaf0ae1f566ddfcda661bd81b625a71b16459a Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Mon, 13 Mar 2017 17:06:17 +0000 Subject: [PATCH 344/521] drm/i915: Stop using RP_DOWN_EI on Baytrail commit 8f68d591d4765b2e1ce9d916ac7bc5583285c4ad upstream. On Baytrail, we manually calculate busyness over the evaluation interval to avoid issues with miscaluations with RC6 enabled. However, it turns out that the DOWN_EI interrupt generator is completely bust - it operates in two modes, continuous or never. Neither of which are conducive to good behaviour. Stop unmask the DOWN_EI interrupt and just compute everything from the UP_EI which does seem to correspond to the desired interval. v2: Fixup gen6_rps_pm_mask() as well v3: Inline vlv_c0_above() to combine the now identical elapsed calculation for up/down and simplify the threshold testing Fixes: 43cf3bf084ba ("drm/i915: Improved w/a for rps on Baytrail") Signed-off-by: Chris Wilson Cc: Mika Kuoppala Link: http://patchwork.freedesktop.org/patch/msgid/20170309211232.28878-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala Link: http://patchwork.freedesktop.org/patch/msgid/20170313170617.31564-1-chris@chris-wilson.co.uk (cherry picked from commit e0e8c7cb6eb68e9256de2d8cbeb481d3701c05ac) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_irq.c | 75 +++++++++++++-------------------- drivers/gpu/drm/i915/intel_pm.c | 5 ++- 3 files changed, 33 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index fb9f647bb5cd..5044f2257e89 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1159,7 +1159,7 @@ struct intel_gen6_power_mgmt { struct intel_rps_client semaphores, mmioflips; /* manual wa residency calculations */ - struct intel_rps_ei up_ei, down_ei; + struct intel_rps_ei ei; /* * Protects RPS/RC6 register access and PCU communication. diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 0f42a2782afc..b7b0a38acd67 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -994,68 +994,51 @@ static void vlv_c0_read(struct drm_i915_private *dev_priv, ei->media_c0 = I915_READ(VLV_MEDIA_C0_COUNT); } -static bool vlv_c0_above(struct drm_i915_private *dev_priv, - const struct intel_rps_ei *old, - const struct intel_rps_ei *now, - int threshold) -{ - u64 time, c0; - unsigned int mul = 100; - - if (old->cz_clock == 0) - return false; - - if (I915_READ(VLV_COUNTER_CONTROL) & VLV_COUNT_RANGE_HIGH) - mul <<= 8; - - time = now->cz_clock - old->cz_clock; - time *= threshold * dev_priv->czclk_freq; - - /* Workload can be split between render + media, e.g. SwapBuffers - * being blitted in X after being rendered in mesa. To account for - * this we need to combine both engines into our activity counter. - */ - c0 = now->render_c0 - old->render_c0; - c0 += now->media_c0 - old->media_c0; - c0 *= mul * VLV_CZ_CLOCK_TO_MILLI_SEC; - - return c0 >= time; -} - void gen6_rps_reset_ei(struct drm_i915_private *dev_priv) { - vlv_c0_read(dev_priv, &dev_priv->rps.down_ei); - dev_priv->rps.up_ei = dev_priv->rps.down_ei; + memset(&dev_priv->rps.ei, 0, sizeof(dev_priv->rps.ei)); } static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir) { + const struct intel_rps_ei *prev = &dev_priv->rps.ei; struct intel_rps_ei now; u32 events = 0; - if ((pm_iir & (GEN6_PM_RP_DOWN_EI_EXPIRED | GEN6_PM_RP_UP_EI_EXPIRED)) == 0) + if ((pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) == 0) return 0; vlv_c0_read(dev_priv, &now); if (now.cz_clock == 0) return 0; - if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) { - if (!vlv_c0_above(dev_priv, - &dev_priv->rps.down_ei, &now, - dev_priv->rps.down_threshold)) - events |= GEN6_PM_RP_DOWN_THRESHOLD; - dev_priv->rps.down_ei = now; - } - - if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) { - if (vlv_c0_above(dev_priv, - &dev_priv->rps.up_ei, &now, - dev_priv->rps.up_threshold)) - events |= GEN6_PM_RP_UP_THRESHOLD; - dev_priv->rps.up_ei = now; + if (prev->cz_clock) { + u64 time, c0; + unsigned int mul; + + mul = VLV_CZ_CLOCK_TO_MILLI_SEC * 100; /* scale to threshold% */ + if (I915_READ(VLV_COUNTER_CONTROL) & VLV_COUNT_RANGE_HIGH) + mul <<= 8; + + time = now.cz_clock - prev->cz_clock; + time *= dev_priv->czclk_freq; + + /* Workload can be split between render + media, + * e.g. SwapBuffers being blitted in X after being rendered in + * mesa. To account for this we need to combine both engines + * into our activity counter. + */ + c0 = now.render_c0 - prev->render_c0; + c0 += now.media_c0 - prev->media_c0; + c0 *= mul; + + if (c0 > time * dev_priv->rps.up_threshold) + events = GEN6_PM_RP_UP_THRESHOLD; + else if (c0 < time * dev_priv->rps.down_threshold) + events = GEN6_PM_RP_DOWN_THRESHOLD; } + dev_priv->rps.ei = now; return events; } @@ -4390,7 +4373,7 @@ void intel_irq_init(struct drm_i915_private *dev_priv) /* Let's track the enabled rps events */ if (IS_VALLEYVIEW(dev_priv) && !IS_CHERRYVIEW(dev_priv)) /* WaGsvRC0ResidencyMethod:vlv */ - dev_priv->pm_rps_events = GEN6_PM_RP_DOWN_EI_EXPIRED | GEN6_PM_RP_UP_EI_EXPIRED; + dev_priv->pm_rps_events = GEN6_PM_RP_UP_EI_EXPIRED; else dev_priv->pm_rps_events = GEN6_PM_RPS_EVENTS; diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index e4031fcac4bf..fd4690ed93c0 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -4411,8 +4411,9 @@ static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val) { u32 mask = 0; + /* We use UP_EI_EXPIRED interupts for both up/down in manual mode */ if (val > dev_priv->rps.min_freq_softlimit) - mask |= GEN6_PM_RP_DOWN_EI_EXPIRED | GEN6_PM_RP_DOWN_THRESHOLD | GEN6_PM_RP_DOWN_TIMEOUT; + mask |= GEN6_PM_RP_UP_EI_EXPIRED | GEN6_PM_RP_DOWN_THRESHOLD | GEN6_PM_RP_DOWN_TIMEOUT; if (val < dev_priv->rps.max_freq_softlimit) mask |= GEN6_PM_RP_UP_EI_EXPIRED | GEN6_PM_RP_UP_THRESHOLD; @@ -4516,7 +4517,7 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv) { mutex_lock(&dev_priv->rps.hw_lock); if (dev_priv->rps.enabled) { - if (dev_priv->pm_rps_events & (GEN6_PM_RP_DOWN_EI_EXPIRED | GEN6_PM_RP_UP_EI_EXPIRED)) + if (dev_priv->pm_rps_events & GEN6_PM_RP_UP_EI_EXPIRED) gen6_rps_reset_ei(dev_priv); I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, dev_priv->rps.cur_freq)); From 297f55bcb62ad0b6b290b01177d9395305d57020 Mon Sep 17 00:00:00 2001 From: Janusz Dziedzic Date: Mon, 13 Mar 2017 14:11:32 +0200 Subject: [PATCH 345/521] usb: dwc3: gadget: delay unmap of bounced requests commit de288e36fe33f7e06fa272bc8e2f85aa386d99aa upstream. In the case of bounced ep0 requests, we must delay DMA operation until after ->complete() otherwise we might overwrite contents of req->buf. This caused problems with RNDIS gadget. Signed-off-by: Janusz Dziedzic Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- drivers/usb/dwc3/gadget.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 210ff64857e1..ec7a50f98f57 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -235,6 +235,7 @@ void dwc3_gadget_giveback(struct dwc3_ep *dep, struct dwc3_request *req, int status) { struct dwc3 *dwc = dep->dwc; + unsigned int unmap_after_complete = false; int i; if (req->queued) { @@ -259,11 +260,19 @@ void dwc3_gadget_giveback(struct dwc3_ep *dep, struct dwc3_request *req, if (req->request.status == -EINPROGRESS) req->request.status = status; - if (dwc->ep0_bounced && dep->number <= 1) + /* + * NOTICE we don't want to unmap before calling ->complete() if we're + * dealing with a bounced ep0 request. If we unmap it here, we would end + * up overwritting the contents of req->buf and this could confuse the + * gadget driver. + */ + if (dwc->ep0_bounced && dep->number <= 1) { dwc->ep0_bounced = false; - - usb_gadget_unmap_request(&dwc->gadget, &req->request, - req->direction); + unmap_after_complete = true; + } else { + usb_gadget_unmap_request(&dwc->gadget, + &req->request, req->direction); + } dev_dbg(dwc->dev, "request %p from %s completed %d/%d ===> %d\n", req, dep->name, req->request.actual, @@ -273,6 +282,10 @@ void dwc3_gadget_giveback(struct dwc3_ep *dep, struct dwc3_request *req, spin_unlock(&dwc->lock); usb_gadget_giveback_request(&dep->endpoint, &req->request); spin_lock(&dwc->lock); + + if (unmap_after_complete) + usb_gadget_unmap_request(&dwc->gadget, + &req->request, req->direction); } int dwc3_send_gadget_generic_command(struct dwc3 *dwc, unsigned cmd, u32 param) From 5a527d80836e9ad0dc3dceee7de72f16c817fb8b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= Date: Sun, 20 Nov 2016 16:09:30 +0100 Subject: [PATCH 346/521] mtd: bcm47xxpart: fix parsing first block after aligned TRX MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit bd5d21310133921021d78995ad6346f908483124 upstream. After parsing TRX we should skip to the first block placed behind it. Our code was working only with TRX with length not aligned to the blocksize. In other cases (length aligned) it was missing the block places right after TRX. This fixes calculation and simplifies the comment. Signed-off-by: Rafał Miłecki Signed-off-by: Brian Norris Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/bcm47xxpart.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/mtd/bcm47xxpart.c b/drivers/mtd/bcm47xxpart.c index c0720c1ee4c9..5abab8800891 100644 --- a/drivers/mtd/bcm47xxpart.c +++ b/drivers/mtd/bcm47xxpart.c @@ -225,12 +225,10 @@ static int bcm47xxpart_parse(struct mtd_info *master, last_trx_part = curr_part - 1; - /* - * We have whole TRX scanned, skip to the next part. Use - * roundown (not roundup), as the loop will increase - * offset in next step. - */ - offset = rounddown(offset + trx->length, blocksize); + /* Jump to the end of TRX */ + offset = roundup(offset + trx->length, blocksize); + /* Next loop iteration will increase the offset */ + offset -= blocksize; continue; } From d8b8b5528ea5a394074a91e37571bcca081b27e1 Mon Sep 17 00:00:00 2001 From: Matt Redfearn Date: Mon, 19 Dec 2016 14:20:56 +0000 Subject: [PATCH 347/521] MIPS: Introduce irq_stack commit fe8bd18ffea5327344d4ec2bf11f47951212abd0 upstream. Allocate a per-cpu irq stack for use within interrupt handlers. Also add a utility function on_irq_stack to determine if a given stack pointer is within the irq stack for that cpu. Signed-off-by: Matt Redfearn Acked-by: Jason A. Donenfeld Cc: Thomas Gleixner Cc: Paolo Bonzini Cc: Chris Metcalf Cc: Petr Mladek Cc: James Hogan Cc: Paul Burton Cc: Aaron Tomlin Cc: Andrew Morton Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14740/ Signed-off-by: Ralf Baechle Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- arch/mips/include/asm/irq.h | 12 ++++++++++++ arch/mips/kernel/asm-offsets.c | 1 + arch/mips/kernel/irq.c | 11 +++++++++++ 3 files changed, 24 insertions(+) diff --git a/arch/mips/include/asm/irq.h b/arch/mips/include/asm/irq.h index 15e0fecbc300..ebb9efb02502 100644 --- a/arch/mips/include/asm/irq.h +++ b/arch/mips/include/asm/irq.h @@ -17,6 +17,18 @@ #include +#define IRQ_STACK_SIZE THREAD_SIZE + +extern void *irq_stack[NR_CPUS]; + +static inline bool on_irq_stack(int cpu, unsigned long sp) +{ + unsigned long low = (unsigned long)irq_stack[cpu]; + unsigned long high = low + IRQ_STACK_SIZE; + + return (low <= sp && sp <= high); +} + #ifdef CONFIG_I8259 static inline int irq_canonicalize(int irq) { diff --git a/arch/mips/kernel/asm-offsets.c b/arch/mips/kernel/asm-offsets.c index 154e2039ea5e..ec053ce7bb38 100644 --- a/arch/mips/kernel/asm-offsets.c +++ b/arch/mips/kernel/asm-offsets.c @@ -101,6 +101,7 @@ void output_thread_info_defines(void) OFFSET(TI_REGS, thread_info, regs); DEFINE(_THREAD_SIZE, THREAD_SIZE); DEFINE(_THREAD_MASK, THREAD_MASK); + DEFINE(_IRQ_STACK_SIZE, IRQ_STACK_SIZE); BLANK(); } diff --git a/arch/mips/kernel/irq.c b/arch/mips/kernel/irq.c index 8eb5af805964..dc1180a8bfa1 100644 --- a/arch/mips/kernel/irq.c +++ b/arch/mips/kernel/irq.c @@ -25,6 +25,8 @@ #include #include +void *irq_stack[NR_CPUS]; + /* * 'what should we do if we get a hw irq event on an illegal vector'. * each architecture has to answer this themselves. @@ -55,6 +57,15 @@ void __init init_IRQ(void) irq_set_noprobe(i); arch_init_irq(); + + for_each_possible_cpu(i) { + int irq_pages = IRQ_STACK_SIZE / PAGE_SIZE; + void *s = (void *)__get_free_pages(GFP_KERNEL, irq_pages); + + irq_stack[i] = s; + pr_debug("CPU%d IRQ stack at 0x%p - 0x%p\n", i, + irq_stack[i], irq_stack[i] + IRQ_STACK_SIZE); + } } #ifdef CONFIG_DEBUG_STACKOVERFLOW From 3363653512853754fcc7592d2c68c4769a4825c9 Mon Sep 17 00:00:00 2001 From: Matt Redfearn Date: Mon, 19 Dec 2016 14:20:57 +0000 Subject: [PATCH 348/521] MIPS: Stack unwinding while on IRQ stack commit d42d8d106b0275b027c1e8992c42aecf933436ea upstream. Within unwind stack, check if the stack pointer being unwound is within the CPU's irq_stack and if so use that page rather than the task's stack page. Signed-off-by: Matt Redfearn Acked-by: Jason A. Donenfeld Cc: Thomas Gleixner Cc: Adam Buchbinder Cc: Maciej W. Rozycki Cc: Marcin Nowakowski Cc: Chris Metcalf Cc: James Hogan Cc: Paul Burton Cc: Jiri Slaby Cc: Andrew Morton Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14741/ Signed-off-by: Ralf Baechle Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/process.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c index fc537d1b649d..8c26ecac930d 100644 --- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -552,7 +553,19 @@ EXPORT_SYMBOL(unwind_stack_by_address); unsigned long unwind_stack(struct task_struct *task, unsigned long *sp, unsigned long pc, unsigned long *ra) { - unsigned long stack_page = (unsigned long)task_stack_page(task); + unsigned long stack_page = 0; + int cpu; + + for_each_possible_cpu(cpu) { + if (on_irq_stack(cpu, *sp)) { + stack_page = (unsigned long)irq_stack[cpu]; + break; + } + } + + if (!stack_page) + stack_page = (unsigned long)task_stack_page(task); + return unwind_stack_by_address(stack_page, sp, pc, ra); } #endif From 93a82f8dbef8ee421fac80a1bd0564124a8ac41c Mon Sep 17 00:00:00 2001 From: Matt Redfearn Date: Mon, 19 Dec 2016 14:20:58 +0000 Subject: [PATCH 349/521] MIPS: Only change $28 to thread_info if coming from user mode commit 510d86362a27577f5ee23f46cfb354ad49731e61 upstream. The SAVE_SOME macro is used to save the execution context on all exceptions. If an exception occurs while executing user code, the stack is switched to the kernel's stack for the current task, and register $28 is switched to point to the current_thread_info, which is at the bottom of the stack region. If the exception occurs while executing kernel code, the stack is left, and this change ensures that register $28 is not updated. This is the correct behaviour when the kernel can be executing on the separate irq stack, because the thread_info will not be at the base of it. With this change, register $28 is only switched to it's kernel conventional usage of the currrent thread info pointer at the point at which execution enters kernel space. Doing it on every exception was redundant, but OK without an IRQ stack, but will be erroneous once that is introduced. Signed-off-by: Matt Redfearn Acked-by: Jason A. Donenfeld Cc: Thomas Gleixner Cc: James Hogan Cc: Paul Burton Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14742/ Signed-off-by: Ralf Baechle Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- arch/mips/include/asm/stackframe.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/mips/include/asm/stackframe.h b/arch/mips/include/asm/stackframe.h index a71da576883c..5347f130f536 100644 --- a/arch/mips/include/asm/stackframe.h +++ b/arch/mips/include/asm/stackframe.h @@ -216,12 +216,19 @@ LONG_S $25, PT_R25(sp) LONG_S $28, PT_R28(sp) LONG_S $31, PT_R31(sp) + + /* Set thread_info if we're coming from user mode */ + mfc0 k0, CP0_STATUS + sll k0, 3 /* extract cu0 bit */ + bltz k0, 9f + ori $28, sp, _THREAD_MASK xori $28, _THREAD_MASK #ifdef CONFIG_CPU_CAVIUM_OCTEON .set mips64 pref 0, 0($28) /* Prefetch the current pointer */ #endif +9: .set pop .endm From b39b263816687fd71b10c31b3eb916defe8176f0 Mon Sep 17 00:00:00 2001 From: Matt Redfearn Date: Mon, 19 Dec 2016 14:20:59 +0000 Subject: [PATCH 350/521] MIPS: Switch to the irq_stack in interrupts commit dda45f701c9d7ad4ac0bb446e3a96f6df9a468d9 upstream. When enterring interrupt context via handle_int or except_vec_vi, switch to the irq_stack of the current CPU if it is not already in use. The current stack pointer is masked with the thread size and compared to the base or the irq stack. If it does not match then the stack pointer is set to the top of that stack, otherwise this is a nested irq being handled on the irq stack so the stack pointer should be left as it was. The in-use stack pointer is placed in the callee saved register s1. It will be saved to the stack when plat_irq_dispatch is invoked and can be restored once control returns here. Signed-off-by: Matt Redfearn Acked-by: Jason A. Donenfeld Cc: Thomas Gleixner Cc: James Hogan Cc: Paul Burton Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14743/ Signed-off-by: Ralf Baechle Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/genex.S | 81 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 76 insertions(+), 5 deletions(-) diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S index baa7b6fc0a60..2c7cd622673f 100644 --- a/arch/mips/kernel/genex.S +++ b/arch/mips/kernel/genex.S @@ -188,9 +188,44 @@ NESTED(handle_int, PT_SIZE, sp) LONG_L s0, TI_REGS($28) LONG_S sp, TI_REGS($28) - PTR_LA ra, ret_from_irq - PTR_LA v0, plat_irq_dispatch - jr v0 + + /* + * SAVE_ALL ensures we are using a valid kernel stack for the thread. + * Check if we are already using the IRQ stack. + */ + move s1, sp # Preserve the sp + + /* Get IRQ stack for this CPU */ + ASM_CPUID_MFC0 k0, ASM_SMP_CPUID_REG +#if defined(CONFIG_32BIT) || defined(KBUILD_64BIT_SYM32) + lui k1, %hi(irq_stack) +#else + lui k1, %highest(irq_stack) + daddiu k1, %higher(irq_stack) + dsll k1, 16 + daddiu k1, %hi(irq_stack) + dsll k1, 16 +#endif + LONG_SRL k0, SMP_CPUID_PTRSHIFT + LONG_ADDU k1, k0 + LONG_L t0, %lo(irq_stack)(k1) + + # Check if already on IRQ stack + PTR_LI t1, ~(_THREAD_SIZE-1) + and t1, t1, sp + beq t0, t1, 2f + + /* Switch to IRQ stack */ + li t1, _IRQ_STACK_SIZE + PTR_ADD sp, t0, t1 + +2: + jal plat_irq_dispatch + + /* Restore sp */ + move sp, s1 + + j ret_from_irq #ifdef CONFIG_CPU_MICROMIPS nop #endif @@ -263,8 +298,44 @@ NESTED(except_vec_vi_handler, 0, sp) LONG_L s0, TI_REGS($28) LONG_S sp, TI_REGS($28) - PTR_LA ra, ret_from_irq - jr v0 + + /* + * SAVE_ALL ensures we are using a valid kernel stack for the thread. + * Check if we are already using the IRQ stack. + */ + move s1, sp # Preserve the sp + + /* Get IRQ stack for this CPU */ + ASM_CPUID_MFC0 k0, ASM_SMP_CPUID_REG +#if defined(CONFIG_32BIT) || defined(KBUILD_64BIT_SYM32) + lui k1, %hi(irq_stack) +#else + lui k1, %highest(irq_stack) + daddiu k1, %higher(irq_stack) + dsll k1, 16 + daddiu k1, %hi(irq_stack) + dsll k1, 16 +#endif + LONG_SRL k0, SMP_CPUID_PTRSHIFT + LONG_ADDU k1, k0 + LONG_L t0, %lo(irq_stack)(k1) + + # Check if already on IRQ stack + PTR_LI t1, ~(_THREAD_SIZE-1) + and t1, t1, sp + beq t0, t1, 2f + + /* Switch to IRQ stack */ + li t1, _IRQ_STACK_SIZE + PTR_ADD sp, t0, t1 + +2: + jal plat_irq_dispatch + + /* Restore sp */ + move sp, s1 + + j ret_from_irq END(except_vec_vi_handler) /* From f017e58da4aba293e4a6ab62ca5d4801f79cc929 Mon Sep 17 00:00:00 2001 From: Matt Redfearn Date: Mon, 19 Dec 2016 14:21:00 +0000 Subject: [PATCH 351/521] MIPS: Select HAVE_IRQ_EXIT_ON_IRQ_STACK commit 3cc3434fd6307d06b53b98ce83e76bf9807689b9 upstream. Since do_IRQ is now invoked on a separate IRQ stack, we select HAVE_IRQ_EXIT_ON_IRQ_STACK so that softirq's may be invoked directly from irq_exit(), rather than requiring do_softirq_own_stack. Signed-off-by: Matt Redfearn Acked-by: Jason A. Donenfeld Cc: Thomas Gleixner Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14744/ Signed-off-by: Ralf Baechle Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- arch/mips/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 75bfca69e418..d5cfa937d622 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -9,6 +9,7 @@ config MIPS select HAVE_CONTEXT_TRACKING select HAVE_GENERIC_DMA_COHERENT select HAVE_IDE + select HAVE_IRQ_EXIT_ON_IRQ_STACK select HAVE_OPROFILE select HAVE_PERF_EVENTS select PERF_USE_VMALLOC From ba7681e4eee6739e4f23a1ba21fb7737fe4ce4f4 Mon Sep 17 00:00:00 2001 From: Matt Redfearn Date: Wed, 25 Jan 2017 17:00:25 +0000 Subject: [PATCH 352/521] MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch commit c25f8064c1d5731a2ce5664def890140dcdd3e5c upstream. Commit dda45f701c9d ("MIPS: Switch to the irq_stack in interrupts") changed both the normal and vectored interrupt handlers. Unfortunately the vectored version, "except_vec_vi_handler", was incorrectly modified to unconditionally jal to plat_irq_dispatch, rather than doing a jalr to the vectored handler that has been set up. This is ok for many platforms which set the vectored handler to plat_irq_dispatch anyway, but will cause problems with platforms that use other handlers. Fixes: dda45f701c9d ("MIPS: Switch to the irq_stack in interrupts") Signed-off-by: Matt Redfearn Cc: Ralf Baechle Cc: Paul Burton Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15110/ Signed-off-by: James Hogan Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/genex.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S index 2c7cd622673f..619e30e2c4f0 100644 --- a/arch/mips/kernel/genex.S +++ b/arch/mips/kernel/genex.S @@ -330,7 +330,7 @@ NESTED(except_vec_vi_handler, 0, sp) PTR_ADD sp, t0, t1 2: - jal plat_irq_dispatch + jalr v0 /* Restore sp */ move sp, s1 From fd8bae310684b557c0b30ae9105420956a41494f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Horia=20Geant=C4=83?= Date: Wed, 5 Apr 2017 11:41:03 +0300 Subject: [PATCH 353/521] crypto: caam - fix RNG deinstantiation error checking MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 40c98cb57cdbc377456116ad4582c89e329721b0 upstream. RNG instantiation was previously fixed by commit 62743a4145bb9 ("crypto: caam - fix RNG init descriptor ret. code checking") while deinstantiation was not addressed. Since the descriptors used are similar, in the sense that they both end with a JUMP HALT command, checking for errors should be similar too, i.e. status code 7000_0000h should be considered successful. Fixes: 1005bccd7a4a6 ("crypto: caam - enable instantiation of all RNG4 state handles") Signed-off-by: Horia Geantă Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/caam/ctrl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c index 69d4a1326fee..53e61459c69f 100644 --- a/drivers/crypto/caam/ctrl.c +++ b/drivers/crypto/caam/ctrl.c @@ -278,7 +278,8 @@ static int deinstantiate_rng(struct device *ctrldev, int state_handle_mask) /* Try to run it through DECO0 */ ret = run_descriptor_deco0(ctrldev, desc, &status); - if (ret || status) { + if (ret || + (status && status != JRSTA_SSRC_JUMP_HALT_CC)) { dev_err(ctrldev, "Failed to deinstantiate RNG4 SH%d\n", sh_idx); From d35f8fa0b93e61dd95b8f86928a783c4d8a32d3e Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Wed, 29 Mar 2017 16:11:20 +0200 Subject: [PATCH 354/521] net/packet: fix overflow in check for priv area size commit 2b6867c2ce76c596676bec7d2d525af525fdc6e2 upstream. Subtracting tp_sizeof_priv from tp_block_size and casting to int to check whether one is less then the other doesn't always work (both of them are unsigned ints). Compare them as is instead. Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as it can overflow inside BLK_PLUS_PRIV otherwise. Signed-off-by: Andrey Konovalov Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 3975ac809934..d76800108ddb 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -4138,8 +4138,8 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u, if (unlikely(!PAGE_ALIGNED(req->tp_block_size))) goto out; if (po->tp_version >= TPACKET_V3 && - (int)(req->tp_block_size - - BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0) + req->tp_block_size <= + BLK_PLUS_PRIV((u64)req_u->req3.tp_sizeof_priv)) goto out; if (unlikely(req->tp_frame_size < po->tp_hdrlen + po->tp_reserve)) From f4522e36edaa9ec0cada0daa5c2628db762dd3d9 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Tue, 6 Dec 2016 13:31:44 -0200 Subject: [PATCH 355/521] blk-mq: Avoid memory reclaim when remapping queues commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream. While stressing memory and IO at the same time we changed SMT settings, we were able to consistently trigger deadlocks in the mm system, which froze the entire machine. I think that under memory stress conditions, the large allocations performed by blk_mq_init_rq_map may trigger a reclaim, which stalls waiting on the block layer remmaping completion, thus deadlocking the system. The trace below was collected after the machine stalled, waiting for the hotplug event completion. The simplest fix for this is to make allocations in this path non-reclaimable, with GFP_NOIO. With this patch, We couldn't hit the issue anymore. This should apply on top of Jens's for-next branch cleanly. Changes since v1: - Use GFP_NOIO instead of GFP_NOWAIT. Call Trace: [c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable) [c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430 [c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0 [c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0 [c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30 [c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0 [c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0 [c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs] [c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs] [c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs] [c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210 [c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0 [c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0 [c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520 [c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250 [c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0 [c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380 [c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360 [c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0 [c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250 [c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100 [c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0 [c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0 [c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250 [c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120 [c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0 [c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150 [c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0 [c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120 [c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0 [c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0 [c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0 [c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250 [c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0 [c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270 [c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110 [c000000f0160be30] [c000000000009204] system_call+0x38/0xec Signed-off-by: Gabriel Krisman Bertazi Cc: Brian King Cc: Douglas Miller Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- block/blk-mq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index d8d63c38bf29..0d1af3e44efb 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1470,7 +1470,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set, INIT_LIST_HEAD(&tags->page_list); tags->rqs = kzalloc_node(set->queue_depth * sizeof(struct request *), - GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY, + GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY, set->numa_node); if (!tags->rqs) { blk_mq_free_tags(tags); @@ -1496,7 +1496,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set, do { page = alloc_pages_node(set->numa_node, - GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO, + GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO, this_order); if (page) break; @@ -1517,7 +1517,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set, * Allow kmemleak to scan these pages as they contain pointers * to additional allocations like via ops->init_request(). */ - kmemleak_alloc(p, order_to_size(this_order), 1, GFP_KERNEL); + kmemleak_alloc(p, order_to_size(this_order), 1, GFP_NOIO); entries_per_page = order_to_size(this_order) / rq_size; to_do = min(entries_per_page, set->queue_depth - i); left -= to_do * rq_size; From 0a007f74b826836074de8bfcb1e197cada993718 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Thu, 1 Dec 2016 13:49:59 -0800 Subject: [PATCH 356/521] usb: hub: Wait for connection to be reestablished after port reset commit 22547c4cc4fe20698a6a85a55b8788859134b8e4 upstream. On a system with a defective USB device connected to an USB hub, an endless sequence of port connect events was observed. The sequence of events as observed is as follows: - Port reports connected event (port status=USB_PORT_STAT_CONNECTION). - Event handler debounces port and resets it by calling hub_port_reset(). - hub_port_reset() calls hub_port_wait_reset() to wait for the reset to complete. - The reset completes, but USB_PORT_STAT_CONNECTION is not immediately set in the port status register. - hub_port_wait_reset() returns -ENOTCONN. - Port initialization sequence is aborted. - A few milliseconds later, the port again reports a connected event, and the sequence repeats. This continues either forever or, randomly, stops if the connection is already re-established when the port status is read. It results in a high rate of udev events. This in turn destabilizes userspace since the above sequence holds the device mutex pretty much continuously and prevents userspace from actually reading the device status. To prevent the problem from happening, let's wait for the connection to be re-established after a port reset. If the device was actually disconnected, the code will still return an error, but it will do so only after the long reset timeout. Cc: Douglas Anderson Signed-off-by: Guenter Roeck Acked-by: Alan Stern Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hub.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 9e62c93af96e..7c2d87befb51 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -2602,8 +2602,15 @@ static int hub_port_wait_reset(struct usb_hub *hub, int port1, if (ret < 0) return ret; - /* The port state is unknown until the reset completes. */ - if (!(portstatus & USB_PORT_STAT_RESET)) + /* + * The port state is unknown until the reset completes. + * + * On top of that, some chips may require additional time + * to re-establish a connection after the reset is complete, + * so also wait for the connection to be re-established. + */ + if (!(portstatus & USB_PORT_STAT_RESET) && + (portstatus & USB_PORT_STAT_CONNECTION)) break; /* switch to the long delay after two short delay failures */ From f1e6b1149e497dc61ceff290c1d3db259ebf7938 Mon Sep 17 00:00:00 2001 From: Eugenia Emantayev Date: Thu, 29 Dec 2016 18:37:10 +0200 Subject: [PATCH 357/521] net/mlx4_en: Fix bad WQE issue commit 6496bbf0ec481966ef9ffe5b6660d8d1b55c60cc upstream. Single send WQE in RX buffer should be stamped with software ownership in order to prevent the flow of QP in error in FW once UPDATE_QP is called. Fixes: 9f519f68cfff ('mlx4_en: Not using Shared Receive Queues') Signed-off-by: Eugenia Emantayev Signed-off-by: Tariq Toukan Signed-off-by: David S. Miller Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 28a4b34310b2..82bf1b539d87 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -439,8 +439,14 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv) ring->cqn = priv->rx_cq[ring_ind]->mcq.cqn; ring->stride = stride; - if (ring->stride <= TXBB_SIZE) + if (ring->stride <= TXBB_SIZE) { + /* Stamp first unused send wqe */ + __be32 *ptr = (__be32 *)ring->buf; + __be32 stamp = cpu_to_be32(1 << STAMP_SHIFT); + *ptr = stamp; + /* Move pointer to start of rx section */ ring->buf += TXBB_SIZE; + } ring->log_stride = ffs(ring->stride) - 1; ring->buf_size = ring->size * ring->stride; From 710f793a15de0213d4e15f123f327b2075a0c62b Mon Sep 17 00:00:00 2001 From: Jack Morgenstein Date: Mon, 16 Jan 2017 18:31:37 +0200 Subject: [PATCH 358/521] net/mlx4_core: Fix racy CQ (Completion Queue) free commit 291c566a28910614ce42d0ffe82196eddd6346f4 upstream. In function mlx4_cq_completion() and mlx4_cq_event(), the radix_tree_lookup requires a rcu_read_lock. This is mandatory: if another core frees the CQ, it could run the radix_tree_node_rcu_free() call_rcu() callback while its being used by the radix tree lookup function. Additionally, in function mlx4_cq_event(), since we are adding the rcu lock around the radix-tree lookup, we no longer need to take the spinlock. Also, the synchronize_irq() call for the async event eliminates the need for incrementing the cq reference count in mlx4_cq_event(). Other changes: 1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock: we no longer take this spinlock in the interrupt context. The spinlock here, therefore, simply protects against different threads simultaneously invoking mlx4_cq_free() for different cq's. 2. In function mlx4_cq_free(), we move the radix tree delete to before the synchronize_irq() calls. This guarantees that we will not access this cq during any subsequent interrupts, and therefore can safely free the CQ after the synchronize_irq calls. The rcu_read_lock in the interrupt handlers only needs to protect against corrupting the radix tree; the interrupt handlers may access the cq outside the rcu_read_lock due to the synchronize_irq calls which protect against premature freeing of the cq. 3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg. 4. We leave the cq reference count mechanism in place, because it is still needed for the cq completion tasklet mechanism. Fixes: 6d90aa5cf17b ("net/mlx4_core: Make sure there are no pending async events when freeing CQ") Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Signed-off-by: Jack Morgenstein Signed-off-by: Matan Barak Signed-off-by: Tariq Toukan Signed-off-by: David S. Miller Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx4/cq.c | 38 +++++++++++++------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c index 3348e646db70..6eba58044456 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c @@ -101,13 +101,19 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn) { struct mlx4_cq *cq; + rcu_read_lock(); cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree, cqn & (dev->caps.num_cqs - 1)); + rcu_read_unlock(); + if (!cq) { mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn); return; } + /* Acessing the CQ outside of rcu_read_lock is safe, because + * the CQ is freed only after interrupt handling is completed. + */ ++cq->arm_sn; cq->comp(cq); @@ -118,23 +124,19 @@ void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type) struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table; struct mlx4_cq *cq; - spin_lock(&cq_table->lock); - + rcu_read_lock(); cq = radix_tree_lookup(&cq_table->tree, cqn & (dev->caps.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); - - spin_unlock(&cq_table->lock); + rcu_read_unlock(); if (!cq) { - mlx4_warn(dev, "Async event for bogus CQ %08x\n", cqn); + mlx4_dbg(dev, "Async event for bogus CQ %08x\n", cqn); return; } + /* Acessing the CQ outside of rcu_read_lock is safe, because + * the CQ is freed only after interrupt handling is completed. + */ cq->event(cq, event_type); - - if (atomic_dec_and_test(&cq->refcount)) - complete(&cq->free); } static int mlx4_SW2HW_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox, @@ -301,9 +303,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, if (err) return err; - spin_lock_irq(&cq_table->lock); + spin_lock(&cq_table->lock); err = radix_tree_insert(&cq_table->tree, cq->cqn, cq); - spin_unlock_irq(&cq_table->lock); + spin_unlock(&cq_table->lock); if (err) goto err_icm; @@ -347,9 +349,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, return 0; err_radix: - spin_lock_irq(&cq_table->lock); + spin_lock(&cq_table->lock); radix_tree_delete(&cq_table->tree, cq->cqn); - spin_unlock_irq(&cq_table->lock); + spin_unlock(&cq_table->lock); err_icm: mlx4_cq_free_icm(dev, cq->cqn); @@ -368,15 +370,15 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) if (err) mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn); + spin_lock(&cq_table->lock); + radix_tree_delete(&cq_table->tree, cq->cqn); + spin_unlock(&cq_table->lock); + synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq); if (priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq != priv->eq_table.eq[MLX4_EQ_ASYNC].irq) synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq); - spin_lock_irq(&cq_table->lock); - radix_tree_delete(&cq_table->tree, cq->cqn); - spin_unlock_irq(&cq_table->lock); - if (atomic_dec_and_test(&cq->refcount)) complete(&cq->free); wait_for_completion(&cq->free); From ac0cbfbb1e4b84d426f210849492afadbc4b6bb9 Mon Sep 17 00:00:00 2001 From: Jack Morgenstein Date: Mon, 16 Jan 2017 18:31:38 +0200 Subject: [PATCH 359/521] net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions commit 7c3945bc2073554bb2ecf983e073dee686679c53 upstream. Save the qp context flags byte containing the flag disabling vlan stripping in the RESET to INIT qp transition, rather than in the INIT to RTR transition. Per the firmware spec, the flags in this byte are active in the RESET to INIT transition. As a result of saving the flags in the incorrect qp transition, when switching dynamically from VGT to VST and back to VGT, the vlan remained stripped (as is required for VST) and did not return to not-stripped (as is required for VGT). Fixes: f0f829bf42cd ("net/mlx4_core: Add immediate activate for VGT->VST->VGT") Signed-off-by: Jack Morgenstein Signed-off-by: Tariq Toukan Signed-off-by: David S. Miller Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c index d314d96dcb1c..d1fc7fa87b05 100644 --- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c +++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c @@ -2955,6 +2955,9 @@ int mlx4_RST2INIT_QP_wrapper(struct mlx4_dev *dev, int slave, put_res(dev, slave, srqn, RES_SRQ); qp->srq = srq; } + + /* Save param3 for dynamic changes from VST back to VGT */ + qp->param3 = qpc->param3; put_res(dev, slave, rcqn, RES_CQ); put_res(dev, slave, mtt_base, RES_MTT); res_end_move(dev, slave, RES_QP, qpn); @@ -3747,7 +3750,6 @@ int mlx4_INIT2RTR_QP_wrapper(struct mlx4_dev *dev, int slave, int qpn = vhcr->in_modifier & 0x7fffff; struct res_qp *qp; u8 orig_sched_queue; - __be32 orig_param3 = qpc->param3; u8 orig_vlan_control = qpc->pri_path.vlan_control; u8 orig_fvl_rx = qpc->pri_path.fvl_rx; u8 orig_pri_path_fl = qpc->pri_path.fl; @@ -3789,7 +3791,6 @@ int mlx4_INIT2RTR_QP_wrapper(struct mlx4_dev *dev, int slave, */ if (!err) { qp->sched_queue = orig_sched_queue; - qp->param3 = orig_param3; qp->vlan_control = orig_vlan_control; qp->fvl_rx = orig_fvl_rx; qp->pri_path_fl = orig_pri_path_fl; From 7d170f270a95639192cfd53dcb15e6d8530b4577 Mon Sep 17 00:00:00 2001 From: Thomas Falcon Date: Thu, 8 Dec 2016 16:40:03 -0600 Subject: [PATCH 360/521] ibmveth: set correct gso_size and gso_type commit 7b5967389f5a8dfb9d32843830f5e2717e20995d upstream. This patch is based on an earlier one submitted by Jon Maxwell with the following commit message: "We recently encountered a bug where a few customers using ibmveth on the same LPAR hit an issue where a TCP session hung when large receive was enabled. Closer analysis revealed that the session was stuck because the one side was advertising a zero window repeatedly. We narrowed this down to the fact the ibmveth driver did not set gso_size which is translated by TCP into the MSS later up the stack. The MSS is used to calculate the TCP window size and as that was abnormally large, it was calculating a zero window, even although the sockets receive buffer was completely empty." We rely on the Virtual I/O Server partition in a pseries environment to provide the MSS through the TCP header checksum field. The stipulation is that users should not disable checksum offloading if rx packet aggregation is enabled through VIOS. Some firmware offerings provide the MSS in the RX buffer. This is signalled by a bit in the RX queue descriptor. Reviewed-by: Brian King Reviewed-by: Pradeep Satyanarayana Reviewed-by: Marcelo Ricardo Leitner Reviewed-by: Jonathan Maxwell Reviewed-by: David Dai Signed-off-by: Thomas Falcon Signed-off-by: David S. Miller Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/ibm/ibmveth.c | 65 +++++++++++++++++++++++++++++- drivers/net/ethernet/ibm/ibmveth.h | 1 + 2 files changed, 64 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 7af870a3c549..855c43d8f7e0 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -58,7 +58,7 @@ static struct kobj_type ktype_veth_pool; static const char ibmveth_driver_name[] = "ibmveth"; static const char ibmveth_driver_string[] = "IBM Power Virtual Ethernet Driver"; -#define ibmveth_driver_version "1.05" +#define ibmveth_driver_version "1.06" MODULE_AUTHOR("Santiago Leon "); MODULE_DESCRIPTION("IBM Power Virtual Ethernet Driver"); @@ -137,6 +137,11 @@ static inline int ibmveth_rxq_frame_offset(struct ibmveth_adapter *adapter) return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_OFF_MASK; } +static inline int ibmveth_rxq_large_packet(struct ibmveth_adapter *adapter) +{ + return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_LRG_PKT; +} + static inline int ibmveth_rxq_frame_length(struct ibmveth_adapter *adapter) { return be32_to_cpu(adapter->rx_queue.queue_addr[adapter->rx_queue.index].length); @@ -1172,6 +1177,45 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, goto retry_bounce; } +static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt) +{ + int offset = 0; + + /* only TCP packets will be aggregated */ + if (skb->protocol == htons(ETH_P_IP)) { + struct iphdr *iph = (struct iphdr *)skb->data; + + if (iph->protocol == IPPROTO_TCP) { + offset = iph->ihl * 4; + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + } else { + return; + } + } else if (skb->protocol == htons(ETH_P_IPV6)) { + struct ipv6hdr *iph6 = (struct ipv6hdr *)skb->data; + + if (iph6->nexthdr == IPPROTO_TCP) { + offset = sizeof(struct ipv6hdr); + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6; + } else { + return; + } + } else { + return; + } + /* if mss is not set through Large Packet bit/mss in rx buffer, + * expect that the mss will be written to the tcp header checksum. + */ + if (lrg_pkt) { + skb_shinfo(skb)->gso_size = mss; + } else if (offset) { + struct tcphdr *tcph = (struct tcphdr *)(skb->data + offset); + + skb_shinfo(skb)->gso_size = ntohs(tcph->check); + tcph->check = 0; + } +} + static int ibmveth_poll(struct napi_struct *napi, int budget) { struct ibmveth_adapter *adapter = @@ -1180,6 +1224,7 @@ static int ibmveth_poll(struct napi_struct *napi, int budget) int frames_processed = 0; unsigned long lpar_rc; struct iphdr *iph; + u16 mss = 0; restart_poll: while (frames_processed < budget) { @@ -1197,9 +1242,21 @@ static int ibmveth_poll(struct napi_struct *napi, int budget) int length = ibmveth_rxq_frame_length(adapter); int offset = ibmveth_rxq_frame_offset(adapter); int csum_good = ibmveth_rxq_csum_good(adapter); + int lrg_pkt = ibmveth_rxq_large_packet(adapter); skb = ibmveth_rxq_get_buffer(adapter); + /* if the large packet bit is set in the rx queue + * descriptor, the mss will be written by PHYP eight + * bytes from the start of the rx buffer, which is + * skb->data at this stage + */ + if (lrg_pkt) { + __be64 *rxmss = (__be64 *)(skb->data + 8); + + mss = (u16)be64_to_cpu(*rxmss); + } + new_skb = NULL; if (length < rx_copybreak) new_skb = netdev_alloc_skb(netdev, length); @@ -1233,11 +1290,15 @@ static int ibmveth_poll(struct napi_struct *napi, int budget) if (iph->check == 0xffff) { iph->check = 0; iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl); - adapter->rx_large_packets++; } } } + if (length > netdev->mtu + ETH_HLEN) { + ibmveth_rx_mss_helper(skb, mss, lrg_pkt); + adapter->rx_large_packets++; + } + napi_gro_receive(napi, skb); /* send it up */ netdev->stats.rx_packets++; diff --git a/drivers/net/ethernet/ibm/ibmveth.h b/drivers/net/ethernet/ibm/ibmveth.h index 4eade67fe30c..7acda04d034e 100644 --- a/drivers/net/ethernet/ibm/ibmveth.h +++ b/drivers/net/ethernet/ibm/ibmveth.h @@ -209,6 +209,7 @@ struct ibmveth_rx_q_entry { #define IBMVETH_RXQ_TOGGLE 0x80000000 #define IBMVETH_RXQ_TOGGLE_SHIFT 31 #define IBMVETH_RXQ_VALID 0x40000000 +#define IBMVETH_RXQ_LRG_PKT 0x04000000 #define IBMVETH_RXQ_NO_CSUM 0x02000000 #define IBMVETH_RXQ_CSUM_GOOD 0x01000000 #define IBMVETH_RXQ_OFF_MASK 0x0000FFFF From a80c068fbf43e22f099c0587b9e1a2337378a505 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 18 Apr 2017 07:15:37 +0200 Subject: [PATCH 361/521] Linux 4.4.62 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index ef5045b8201d..0309acc34472 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 61 +SUBLEVEL = 62 EXTRAVERSION = NAME = Blurry Fish Butt From 3144d81a77352a3934ff0f60dccb38dbf462da39 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 16 Mar 2017 16:54:24 -0400 Subject: [PATCH 362/521] cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups commit 77f88796cee819b9c4562b0b6b44691b3b7755b1 upstream. Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo Suggested-by: Oleg Nesterov Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra (Intel) Cc: Thomas Gleixner Reported-and-debugged-by: Chris Mason Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman --- include/linux/cgroup.h | 21 +++++++++++++++++++++ include/linux/sched.h | 4 ++++ kernel/cgroup.c | 9 +++++---- kernel/kthread.c | 3 +++ 4 files changed, 33 insertions(+), 4 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index cb91b44f5f78..ad2bcf647b9a 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -528,6 +528,25 @@ static inline void pr_cont_cgroup_path(struct cgroup *cgrp) pr_cont_kernfs_path(cgrp->kn); } +static inline void cgroup_init_kthreadd(void) +{ + /* + * kthreadd is inherited by all kthreads, keep it in the root so + * that the new kthreads are guaranteed to stay in the root until + * initialization is finished. + */ + current->no_cgroup_migration = 1; +} + +static inline void cgroup_kthread_ready(void) +{ + /* + * This kthread finished initialization. The creator should have + * set PF_NO_SETAFFINITY if this kthread should stay in the root. + */ + current->no_cgroup_migration = 0; +} + #else /* !CONFIG_CGROUPS */ struct cgroup_subsys_state; @@ -551,6 +570,8 @@ static inline void cgroup_free(struct task_struct *p) {} static inline int cgroup_init_early(void) { return 0; } static inline int cgroup_init(void) { return 0; } +static inline void cgroup_init_kthreadd(void) {} +static inline void cgroup_kthread_ready(void) {} #endif /* !CONFIG_CGROUPS */ diff --git a/include/linux/sched.h b/include/linux/sched.h index ce0f61dcd887..352213b360d7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1475,6 +1475,10 @@ struct task_struct { #ifdef CONFIG_COMPAT_BRK unsigned brk_randomized:1; #endif +#ifdef CONFIG_CGROUPS + /* disallow userland-initiated cgroup migration */ + unsigned no_cgroup_migration:1; +#endif unsigned long atomic_flags; /* Flags needing atomic access. */ diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 127c63e02d52..4cb94b678e9f 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2752,11 +2752,12 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, tsk = tsk->group_leader; /* - * Workqueue threads may acquire PF_NO_SETAFFINITY and become - * trapped in a cpuset, or RT worker may be born in a cgroup - * with no rt_runtime allocated. Just say no. + * kthreads may acquire PF_NO_SETAFFINITY during initialization. + * If userland migrates such a kthread to a non-root cgroup, it can + * become trapped in a cpuset, or RT kthread may be born in a + * cgroup with no rt_runtime allocated. Just say no. */ - if (tsk == kthreadd_task || (tsk->flags & PF_NO_SETAFFINITY)) { + if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) { ret = -EINVAL; goto out_unlock_rcu; } diff --git a/kernel/kthread.c b/kernel/kthread.c index 9ff173dca1ae..850b255649a2 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -18,6 +18,7 @@ #include #include #include +#include #include static DEFINE_SPINLOCK(kthread_create_lock); @@ -205,6 +206,7 @@ static int kthread(void *_create) ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self.flags)) { + cgroup_kthread_ready(); __kthread_parkme(&self); ret = threadfn(data); } @@ -510,6 +512,7 @@ int kthreadd(void *unused) set_mems_allowed(node_states[N_MEMORY]); current->flags |= PF_NOFREEZE; + cgroup_init_kthreadd(); for (;;) { set_current_state(TASK_INTERRUPTIBLE); From ef4c962825c08609d8077c00cf73f26fbdc638cc Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Thu, 13 Apr 2017 14:56:28 -0700 Subject: [PATCH 363/521] thp: fix MADV_DONTNEED vs clear soft dirty race commit 5b7abeae3af8c08c577e599dd0578b9e3ee6687b upstream. Yet another instance of the same race. Fix is identical to change_huge_pmd(). See "thp: fix MADV_DONTNEED vs. numa balancing race" for more details. Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Hillf Danton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/proc/task_mmu.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index d598b9c809c1..db1a1427c27a 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -803,7 +803,14 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma, static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp) { - pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, addr, pmdp); + pmd_t pmd = *pmdp; + + /* See comment in change_huge_pmd() */ + pmdp_invalidate(vma, addr, pmdp); + if (pmd_dirty(*pmdp)) + pmd = pmd_mkdirty(pmd); + if (pmd_young(*pmdp)) + pmd = pmd_mkyoung(pmd); pmd = pmd_wrprotect(pmd); pmd = pmd_clear_soft_dirty(pmd); From a737abe4d09af3d461f0661ccde8ccec007a2db9 Mon Sep 17 00:00:00 2001 From: Ilia Mirkin Date: Sat, 18 Mar 2017 21:53:05 -0400 Subject: [PATCH 364/521] drm/nouveau/mpeg: mthd returns true on success now commit 83bce9c2baa51e439480a713119a73d3c8b61083 upstream. Signed-off-by: Ilia Mirkin Fixes: 590801c1a3 ("drm/nouveau/mpeg: remove dependence on namedb/engctx lookup") Signed-off-by: Ben Skeggs Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c | 2 +- drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv44.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c index d4d8942b1347..e55f8302d08a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c @@ -198,7 +198,7 @@ nv31_mpeg_intr(struct nvkm_engine *engine) } if (type == 0x00000010) { - if (!nv31_mpeg_mthd(mpeg, mthd, data)) + if (nv31_mpeg_mthd(mpeg, mthd, data)) show &= ~0x01000000; } } diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv44.c b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv44.c index d433cfa4a8ab..36af0a8927fc 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv44.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv44.c @@ -172,7 +172,7 @@ nv44_mpeg_intr(struct nvkm_engine *engine) } if (type == 0x00000010) { - if (!nv44_mpeg_mthd(subdev->device, mthd, data)) + if (nv44_mpeg_mthd(subdev->device, mthd, data)) show &= ~0x01000000; } } From a11ab9dd4b789f5b7ecfc069a73cde2bd826f6ec Mon Sep 17 00:00:00 2001 From: Ilia Mirkin Date: Sat, 18 Mar 2017 16:23:10 -0400 Subject: [PATCH 365/521] drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one commit f94773b9f5ecd1df7c88c2e921924dd41d2020cc upstream. The NV4A (aka NV44A) is an oddity in the family. It only comes in AGP and PCI varieties, rather than a core PCIE chip with a bridge for AGP/PCI as necessary. As a result, it appears that the MMU is also non-functional. For AGP cards, the vast majority of the NV4A lineup, this worked out since we force AGP cards to use the nv04 mmu. However for PCI variants, this did not work. Switching to the NV04 MMU makes it work like a charm. Thanks to mwk for the suggestion. This should be a no-op for NV4A AGP boards, as they were using it already. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70388 Signed-off-by: Ilia Mirkin Signed-off-by: Ben Skeggs Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c index ece9f4102c0e..7f8acb3ebfcd 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c @@ -714,7 +714,7 @@ nv4a_chipset = { .i2c = nv04_i2c_new, .imem = nv40_instmem_new, .mc = nv44_mc_new, - .mmu = nv44_mmu_new, + .mmu = nv04_mmu_new, .pci = nv40_pci_new, .therm = nv40_therm_new, .timer = nv41_timer_new, From f0899d0e1e9ea7b71a3b05889c047d74b729dbf6 Mon Sep 17 00:00:00 2001 From: Germano Percossi Date: Fri, 7 Apr 2017 12:29:38 +0100 Subject: [PATCH 366/521] CIFS: store results of cifs_reopen_file to avoid infinite wait commit 1fa839b4986d648b907d117275869a0e46c324b9 upstream. This fixes Continuous Availability when errors during file reopen are encountered. cifs_user_readv and cifs_user_writev would wait for ever if results of cifs_reopen_file are not stored and for later inspection. In fact, results are checked and, in case of errors, a chain of function calls leading to reads and writes to be scheduled in a separate thread is skipped. These threads will wake up the corresponding waiters once reads and writes are done. However, given the return value is not stored, when rc is checked for errors a previous one (always zero) is inspected instead. This leads to pending reads/writes added to the list, making cifs_user_readv and cifs_user_writev wait for ever. Signed-off-by: Germano Percossi Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/file.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 72f270d4bd17..a0c0a49b6620 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2545,7 +2545,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, wdata->credits = credits; if (!wdata->cfile->invalidHandle || - !cifs_reopen_file(wdata->cfile, false)) + !(rc = cifs_reopen_file(wdata->cfile, false))) rc = server->ops->async_writev(wdata, cifs_uncached_writedata_release); if (rc) { @@ -2958,7 +2958,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, rdata->credits = credits; if (!rdata->cfile->invalidHandle || - !cifs_reopen_file(rdata->cfile, true)) + !(rc = cifs_reopen_file(rdata->cfile, true))) rc = server->ops->async_readv(rdata); error: if (rc) { @@ -3544,7 +3544,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, } if (!rdata->cfile->invalidHandle || - !cifs_reopen_file(rdata->cfile, true)) + !(rc = cifs_reopen_file(rdata->cfile, true))) rc = server->ops->async_readv(rdata); if (rc) { add_credits_and_wake_if(server, rdata->credits, 0); From a5e2f803b891f00d6019d727a6eb548c91a70b62 Mon Sep 17 00:00:00 2001 From: Cameron Gutman Date: Mon, 10 Apr 2017 20:44:25 -0700 Subject: [PATCH 367/521] Input: xpad - add support for Razer Wildcat gamepad commit 5376366886251e2f8f248704adb620a4bc4c0937 upstream. Signed-off-by: Cameron Gutman Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/joystick/xpad.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/input/joystick/xpad.c b/drivers/input/joystick/xpad.c index 16f000a76de5..3258baf3282e 100644 --- a/drivers/input/joystick/xpad.c +++ b/drivers/input/joystick/xpad.c @@ -189,6 +189,7 @@ static const struct xpad_device { { 0x1430, 0x8888, "TX6500+ Dance Pad (first generation)", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX }, { 0x146b, 0x0601, "BigBen Interactive XBOX 360 Controller", 0, XTYPE_XBOX360 }, { 0x1532, 0x0037, "Razer Sabertooth", 0, XTYPE_XBOX360 }, + { 0x1532, 0x0a03, "Razer Wildcat", 0, XTYPE_XBOXONE }, { 0x15e4, 0x3f00, "Power A Mini Pro Elite", 0, XTYPE_XBOX360 }, { 0x15e4, 0x3f0a, "Xbox Airflo wired controller", 0, XTYPE_XBOX360 }, { 0x15e4, 0x3f10, "Batarang Xbox 360 controller", 0, XTYPE_XBOX360 }, @@ -310,6 +311,7 @@ static struct usb_device_id xpad_table[] = { XPAD_XBOX360_VENDOR(0x1689), /* Razer Onza */ XPAD_XBOX360_VENDOR(0x24c6), /* PowerA Controllers */ XPAD_XBOX360_VENDOR(0x1532), /* Razer Sabertooth */ + XPAD_XBOXONE_VENDOR(0x1532), /* Razer Wildcat */ XPAD_XBOX360_VENDOR(0x15e4), /* Numark X-Box 360 controllers */ XPAD_XBOX360_VENDOR(0x162e), /* Joytech X-Box 360 controllers */ { } From f42be33fe976d4c0812fb2f697543e1b5ac073be Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 11 Apr 2017 10:10:28 +0200 Subject: [PATCH 368/521] perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32() commit f2200ac311302fcdca6556fd0c5127eab6c65a3e upstream. When the perf_branch_entry::{in_tx,abort,cycles} fields were added, intel_pmu_lbr_read_32() wasn't updated to initialize them. Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Fixes: 135c5612c460 ("perf/x86/intel: Support Haswell/v4 LBR format") Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/perf_event_intel_lbr.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index 659f01e165d5..8900400230c6 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -410,6 +410,9 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc) cpuc->lbr_entries[i].to = msr_lastbranch.to; cpuc->lbr_entries[i].mispred = 0; cpuc->lbr_entries[i].predicted = 0; + cpuc->lbr_entries[i].in_tx = 0; + cpuc->lbr_entries[i].abort = 0; + cpuc->lbr_entries[i].cycles = 0; cpuc->lbr_entries[i].reserved = 0; } cpuc->lbr_stack.nr = i; From f1c5d01635862fb7b8b394fc1b6a1187751ce98e Mon Sep 17 00:00:00 2001 From: Mathias Krause Date: Mon, 10 Apr 2017 17:14:27 +0200 Subject: [PATCH 369/521] x86/vdso: Ensure vdso32_enabled gets set to valid values only commit c06989da39cdb10604d572c8c7ea8c8c97f3c483 upstream. vdso_enabled can be set to arbitrary integer values via the kernel command line 'vdso32=' parameter or via 'sysctl abi.vsyscall32'. load_vdso32() only maps VDSO if vdso_enabled == 1, but ARCH_DLINFO_IA32 merily checks for vdso_enabled != 0. As a consequence the AT_SYSINFO_EHDR auxiliary vector for the VDSO_ENTRY is emitted with a NULL pointer which causes a segfault when the application tries to use the VDSO. Restrict the valid arguments on the command line and the sysctl to 0 and 1. Fixes: b0b49f2673f0 ("x86, vdso: Remove compat vdso support") Signed-off-by: Mathias Krause Acked-by: Andy Lutomirski Cc: Peter Zijlstra Cc: Roland McGrath Link: http://lkml.kernel.org/r/1491424561-7187-1-git-send-email-minipli@googlemail.com Link: http://lkml.kernel.org/r/20170410151723.518412863@linutronix.de Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/vdso/vdso32-setup.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vdso/vdso32-setup.c b/arch/x86/entry/vdso/vdso32-setup.c index 08a317a9ae4b..a7508d7e20b7 100644 --- a/arch/x86/entry/vdso/vdso32-setup.c +++ b/arch/x86/entry/vdso/vdso32-setup.c @@ -31,8 +31,10 @@ static int __init vdso32_setup(char *s) { vdso32_enabled = simple_strtoul(s, NULL, 0); - if (vdso32_enabled > 1) + if (vdso32_enabled > 1) { pr_warn("vdso32 values other than 0 and 1 are no longer allowed; vdso disabled\n"); + vdso32_enabled = 0; + } return 1; } @@ -63,13 +65,18 @@ subsys_initcall(sysenter_setup); /* Register vsyscall32 into the ABI table */ #include +static const int zero; +static const int one = 1; + static struct ctl_table abi_table2[] = { { .procname = "vsyscall32", .data = &vdso32_enabled, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = (int *)&zero, + .extra2 = (int *)&one, }, {} }; From ec3978e10ecc61834c9f57dd9d00492b137fa01c Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 10 Apr 2017 17:14:28 +0200 Subject: [PATCH 370/521] x86/vdso: Plug race between mapping and ELF header setup commit 6fdc6dd90272ce7e75d744f71535cfbd8d77da81 upstream. The vsyscall32 sysctl can racy against a concurrent fork when it switches from disabled to enabled: arch_setup_additional_pages() if (vdso32_enabled) --> No mapping sysctl.vsysscall32() --> vdso32_enabled = true create_elf_tables() ARCH_DLINFO_IA32 if (vdso32_enabled) { --> Add VDSO entry with NULL pointer Make ARCH_DLINFO_IA32 check whether the VDSO mapping has been set up for the newly forked process or not. Signed-off-by: Thomas Gleixner Acked-by: Andy Lutomirski Cc: Peter Zijlstra Cc: Mathias Krause Link: http://lkml.kernel.org/r/20170410151723.602367196@linutronix.de Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/elf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 1514753fd435..d262f985bbc8 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -278,7 +278,7 @@ struct task_struct; #define ARCH_DLINFO_IA32 \ do { \ - if (vdso32_enabled) { \ + if (VDSO_CURRENT_BASE) { \ NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \ NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_CURRENT_BASE); \ } \ From 074bcc1302fd4357fa30c167bb20f684998b025f Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 27 Mar 2017 21:53:38 -0700 Subject: [PATCH 371/521] acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison) commit b03b99a329a14b7302f37c3ea6da3848db41c8c5 upstream. While reviewing the -stable patch for commit 86ef58a4e35e "nfit, libnvdimm: fix interleave set cookie calculation" Ben noted: "This is returning an int, thus it's effectively doing a 32-bit comparison and not the 64-bit comparison you say is needed." Update the compare operation to be immune to this integer demotion problem. Cc: Nicholas Moulin Fixes: 86ef58a4e35e ("nfit, libnvdimm: fix interleave set cookie calculation") Reported-by: Ben Hutchings Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/nfit.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index 14c2a07c9f3f..67d7489ced01 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -979,7 +979,11 @@ static int cmp_map(const void *m0, const void *m1) const struct nfit_set_info_map *map0 = m0; const struct nfit_set_info_map *map1 = m1; - return map0->region_offset - map1->region_offset; + if (map0->region_offset < map1->region_offset) + return -1; + else if (map0->region_offset > map1->region_offset) + return 1; + return 0; } /* Retrieve the nth entry referencing this spa */ From 05c5dd75d77c8176c2beef56471868e29c19b47c Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger Date: Thu, 23 Mar 2017 17:19:24 -0700 Subject: [PATCH 372/521] iscsi-target: Fix TMR reference leak during session shutdown commit efb2ea770bb3b0f40007530bc8b0c22f36e1c5eb upstream. This patch fixes a iscsi-target specific TMR reference leak during session shutdown, that could occur when a TMR was quiesced before the hand-off back to iscsi-target code via transport_cmd_check_stop_to_fabric(). The reference leak happens because iscsit_free_cmd() was incorrectly skipping the final target_put_sess_cmd() for TMRs when transport_generic_free_cmd() returned zero because the se_cmd->cmd_kref did not reach zero, due to the missing se_cmd assignment in original code. The result was iscsi_cmd and it's associated se_cmd memory would be freed once se_sess->sess_cmd_map where released, but the associated se_tmr_req was leaked and remained part of se_device->dev_tmr_list. This bug would manfiest itself as kernel paging request OOPsen in core_tmr_lun_reset(), when a left-over se_tmr_req attempted to dereference it's se_cmd pointer that had already been released during normal session shutdown. To address this bug, go ahead and treat ISCSI_OP_SCSI_CMD and ISCSI_OP_SCSI_TMFUNC the same when there is an extra se_cmd->cmd_kref to drop in iscsit_free_cmd(), and use op_scsi to signal __iscsit_free_cmd() when the former needs to clear any further iscsi related I/O state. Reported-by: Rob Millner Cc: Rob Millner Reported-by: Chu Yuan Lin Cc: Chu Yuan Lin Tested-by: Chu Yuan Lin Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman --- drivers/target/iscsi/iscsi_target_util.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c index 428b0d9e3dba..93590521ae33 100644 --- a/drivers/target/iscsi/iscsi_target_util.c +++ b/drivers/target/iscsi/iscsi_target_util.c @@ -731,21 +731,23 @@ void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown) { struct se_cmd *se_cmd = NULL; int rc; + bool op_scsi = false; /* * Determine if a struct se_cmd is associated with * this struct iscsi_cmd. */ switch (cmd->iscsi_opcode) { case ISCSI_OP_SCSI_CMD: - se_cmd = &cmd->se_cmd; - __iscsit_free_cmd(cmd, true, shutdown); + op_scsi = true; /* * Fallthrough */ case ISCSI_OP_SCSI_TMFUNC: - rc = transport_generic_free_cmd(&cmd->se_cmd, shutdown); - if (!rc && shutdown && se_cmd && se_cmd->se_sess) { - __iscsit_free_cmd(cmd, true, shutdown); + se_cmd = &cmd->se_cmd; + __iscsit_free_cmd(cmd, op_scsi, shutdown); + rc = transport_generic_free_cmd(se_cmd, shutdown); + if (!rc && shutdown && se_cmd->se_sess) { + __iscsit_free_cmd(cmd, op_scsi, shutdown); target_put_sess_cmd(se_cmd); } break; From 1e1de2e841e141991250d7c0ac79c5877a43b6a4 Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger Date: Sun, 2 Apr 2017 13:36:44 -0700 Subject: [PATCH 373/521] iscsi-target: Drop work-around for legacy GlobalSAN initiator commit 1c99de981f30b3e7868b8d20ce5479fa1c0fea46 upstream. Once upon a time back in 2009, a work-around was added to support the GlobalSAN iSCSI initiator v3.3 for MacOSX, which during login did not propose nor respond to MaxBurstLength, FirstBurstLength, DefaultTime2Wait and DefaultTime2Retain keys. The work-around in iscsi_check_proposer_for_optional_reply() allowed the missing keys to be proposed, but did not require waiting for a response before moving to full feature phase operation. This allowed GlobalSAN v3.3 to work out-of-the box, and for many years we didn't run into login interopt issues with any other initiators.. Until recently, when Martin tried a QLogic 57840S iSCSI Offload HBA on Windows 2016 which completed login, but subsequently failed with: Got unknown iSCSI OpCode: 0x43 The issue was QLogic MSFT side did not propose DefaultTime2Wait + DefaultTime2Retain, so LIO proposes them itself, and immediately transitions to full feature phase because of the GlobalSAN hack. However, the QLogic MSFT side still attempts to respond to DefaultTime2Retain + DefaultTime2Wait, even though LIO has set ISCSI_FLAG_LOGIN_NEXT_STAGE3 + ISCSI_FLAG_LOGIN_TRANSIT in last login response. So while the QLogic MSFT side should have been proposing these two keys to start, it was doing the correct thing per RFC-3720 attempting to respond to proposed keys before transitioning to full feature phase. All that said, recent versions of GlobalSAN iSCSI (v5.3.0.541) does correctly propose the four keys during login, making the original work-around moot. So in order to allow QLogic MSFT to run unmodified as-is, go ahead and drop this long standing work-around. Reported-by: Martin Svec Cc: Martin Svec Cc: Himanshu Madhani Cc: Arun Easi Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman --- drivers/target/iscsi/iscsi_target_parameters.c | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_parameters.c b/drivers/target/iscsi/iscsi_target_parameters.c index 2cbea2af7cd0..6d1b0acbc5b3 100644 --- a/drivers/target/iscsi/iscsi_target_parameters.c +++ b/drivers/target/iscsi/iscsi_target_parameters.c @@ -780,22 +780,6 @@ static void iscsi_check_proposer_for_optional_reply(struct iscsi_param *param) } else if (IS_TYPE_NUMBER(param)) { if (!strcmp(param->name, MAXRECVDATASEGMENTLENGTH)) SET_PSTATE_REPLY_OPTIONAL(param); - /* - * The GlobalSAN iSCSI Initiator for MacOSX does - * not respond to MaxBurstLength, FirstBurstLength, - * DefaultTime2Wait or DefaultTime2Retain parameter keys. - * So, we set them to 'reply optional' here, and assume the - * the defaults from iscsi_parameters.h if the initiator - * is not RFC compliant and the keys are not negotiated. - */ - if (!strcmp(param->name, MAXBURSTLENGTH)) - SET_PSTATE_REPLY_OPTIONAL(param); - if (!strcmp(param->name, FIRSTBURSTLENGTH)) - SET_PSTATE_REPLY_OPTIONAL(param); - if (!strcmp(param->name, DEFAULTTIME2WAIT)) - SET_PSTATE_REPLY_OPTIONAL(param); - if (!strcmp(param->name, DEFAULTTIME2RETAIN)) - SET_PSTATE_REPLY_OPTIONAL(param); /* * Required for gPXE iSCSI boot client */ From 925adae6664c0b9f5193876e9aeb2640a7e977d5 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Fri, 17 Mar 2017 08:47:14 -0400 Subject: [PATCH 374/521] scsi: sr: Sanity check returned mode data commit a00a7862513089f17209b732f230922f1942e0b9 upstream. Kefeng Wang discovered that old versions of the QEMU CD driver would return mangled mode data causing us to walk off the end of the buffer in an attempt to parse it. Sanity check the returned mode sense data. Reported-by: Kefeng Wang Tested-by: Kefeng Wang Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sr.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 64c867405ad4..804586aeaffe 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -834,6 +834,7 @@ static void get_capabilities(struct scsi_cd *cd) unsigned char *buffer; struct scsi_mode_data data; struct scsi_sense_hdr sshdr; + unsigned int ms_len = 128; int rc, n; static const char *loadmech[] = @@ -860,10 +861,11 @@ static void get_capabilities(struct scsi_cd *cd) scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr); /* ask for mode page 0x2a */ - rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, 128, + rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, ms_len, SR_TIMEOUT, 3, &data, NULL); - if (!scsi_status_is_good(rc)) { + if (!scsi_status_is_good(rc) || data.length > ms_len || + data.header_length + data.block_descriptor_length > data.length) { /* failed, drive doesn't have capabilities mode page */ cd->cdi.speed = 1; cd->cdi.mask |= (CDC_CD_R | CDC_CD_RW | CDC_DVD_R | From 448961955592c46f1490fb6ca8d3e52ce17e6222 Mon Sep 17 00:00:00 2001 From: Fam Zheng Date: Tue, 28 Mar 2017 12:41:26 +0800 Subject: [PATCH 375/521] scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable commit 6780414519f91c2a84da9baa963a940ac916f803 upstream. If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size may get error. [mkp: tweaked to avoid setting rw_max twice and added typecast] Fixes: ca369d51b3e ("block/sd: Fix device-imposed transfer length limits") Signed-off-by: Fam Zheng Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 78430ef28ea4..d2877d713b62 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2888,7 +2888,8 @@ static int sd_revalidate_disk(struct gendisk *disk) q->limits.io_opt = logical_to_bytes(sdp, sdkp->opt_xfer_blocks); rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks); } else - rw_max = BLK_DEF_MAX_SECTORS; + rw_max = min_not_zero(logical_to_sectors(sdp, dev_max), + (sector_t)BLK_DEF_MAX_SECTORS); /* Combine with controller limits */ q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); From b689dfbed8c8432a18c73fc261c030d8b3e24e00 Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Tue, 4 Apr 2017 10:42:30 -0400 Subject: [PATCH 376/521] scsi: sd: Fix capacity calculation with 32-bit sector_t commit 7c856152cb92f8eee2df29ef325a1b1f43161aff upstream. We previously made sure that the reported disk capacity was less than 0xffffffff blocks when the kernel was not compiled with large sector_t support (CONFIG_LBDAF). However, this check assumed that the capacity was reported in units of 512 bytes. Add a sanity check function to ensure that we only enable disks if the entire reported capacity can be expressed in terms of sector_t. Reported-by: Steve Magnani Cc: Bart Van Assche Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sd.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index d2877d713b62..4d5207dff960 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2051,6 +2051,22 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp, #define READ_CAPACITY_RETRIES_ON_RESET 10 +/* + * Ensure that we don't overflow sector_t when CONFIG_LBDAF is not set + * and the reported logical block size is bigger than 512 bytes. Note + * that last_sector is a u64 and therefore logical_to_sectors() is not + * applicable. + */ +static bool sd_addressable_capacity(u64 lba, unsigned int sector_size) +{ + u64 last_sector = (lba + 1ULL) << (ilog2(sector_size) - 9); + + if (sizeof(sector_t) == 4 && last_sector > U32_MAX) + return false; + + return true; +} + static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, unsigned char *buffer) { @@ -2116,7 +2132,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, return -ENODEV; } - if ((sizeof(sdkp->capacity) == 4) && (lba >= 0xffffffffULL)) { + if (!sd_addressable_capacity(lba, sector_size)) { sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a " "kernel compiled with support for large block " "devices.\n"); @@ -2202,7 +2218,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, return sector_size; } - if ((sizeof(sdkp->capacity) == 4) && (lba == 0xffffffff)) { + if (!sd_addressable_capacity(lba, sector_size)) { sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a " "kernel compiled with support for large block " "devices.\n"); From 6058cf9929d9bbeb4a781c51e91866716cb5277f Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Fri, 7 Apr 2017 17:28:23 +0200 Subject: [PATCH 377/521] xen, fbfront: fix connecting to backend commit 9121b15b5628b38b4695282dc18c553440e0f79b upstream. Connecting to the backend isn't working reliably in xen-fbfront: in case XenbusStateInitWait of the backend has been missed the backend transition to XenbusStateConnected will trigger the connected state only without doing the actions required when the backend has connected. Signed-off-by: Juergen Gross Reviewed-by: Boris Ostrovsky Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Greg Kroah-Hartman --- drivers/video/fbdev/xen-fbfront.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/video/fbdev/xen-fbfront.c b/drivers/video/fbdev/xen-fbfront.c index 0567d517eed3..ea2f19f5fbde 100644 --- a/drivers/video/fbdev/xen-fbfront.c +++ b/drivers/video/fbdev/xen-fbfront.c @@ -644,7 +644,6 @@ static void xenfb_backend_changed(struct xenbus_device *dev, break; case XenbusStateInitWait: -InitWait: xenbus_switch_state(dev, XenbusStateConnected); break; @@ -655,7 +654,8 @@ static void xenfb_backend_changed(struct xenbus_device *dev, * get Connected twice here. */ if (dev->state != XenbusStateConnected) - goto InitWait; /* no InitWait seen yet, fudge it */ + /* no InitWait seen yet, fudge it */ + xenbus_switch_state(dev, XenbusStateConnected); if (xenbus_scanf(XBT_NIL, info->xbdev->otherend, "request-update", "%d", &val) < 0) From 66b531d3ff113d1e440bde5cd167ed49063fd070 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 7 Apr 2017 09:47:24 -0700 Subject: [PATCH 378/521] libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat commit 0beb2012a1722633515c8aaa263c73449636c893 upstream. Holding the reconfig_mutex over a potential userspace fault sets up a lockdep dependency chain between filesystem-DAX and the libnvdimm ioctl path. Move the user access outside of the lock. [ INFO: possible circular locking dependency detected ] 4.11.0-rc3+ #13 Tainted: G W O ------------------------------------------------------- fallocate/16656 is trying to acquire lock: (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [] nvdimm_bus_lock+0x21/0x30 [libnvdimm] but task is already holding lock: (jbd2_handle){++++..}, at: [] start_this_handle+0x104/0x460 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (jbd2_handle){++++..}: lock_acquire+0xbd/0x200 start_this_handle+0x16a/0x460 jbd2__journal_start+0xe9/0x2d0 __ext4_journal_start_sb+0x89/0x1c0 ext4_dirty_inode+0x32/0x70 __mark_inode_dirty+0x235/0x670 generic_update_time+0x87/0xd0 touch_atime+0xa9/0xd0 ext4_file_mmap+0x90/0xb0 mmap_region+0x370/0x5b0 do_mmap+0x415/0x4f0 vm_mmap_pgoff+0xd7/0x120 SyS_mmap_pgoff+0x1c5/0x290 SyS_mmap+0x22/0x30 entry_SYSCALL_64_fastpath+0x1f/0xc2 -> #1 (&mm->mmap_sem){++++++}: lock_acquire+0xbd/0x200 __might_fault+0x70/0xa0 __nd_ioctl+0x683/0x720 [libnvdimm] nvdimm_ioctl+0x8b/0xe0 [libnvdimm] do_vfs_ioctl+0xa8/0x740 SyS_ioctl+0x79/0x90 do_syscall_64+0x6c/0x200 return_from_SYSCALL_64+0x0/0x7a -> #0 (&nvdimm_bus->reconfig_mutex){+.+.+.}: __lock_acquire+0x16b6/0x1730 lock_acquire+0xbd/0x200 __mutex_lock+0x88/0x9b0 mutex_lock_nested+0x1b/0x20 nvdimm_bus_lock+0x21/0x30 [libnvdimm] nvdimm_forget_poison+0x25/0x50 [libnvdimm] nvdimm_clear_poison+0x106/0x140 [libnvdimm] pmem_do_bvec+0x1c2/0x2b0 [nd_pmem] pmem_make_request+0xf9/0x270 [nd_pmem] generic_make_request+0x118/0x3b0 submit_bio+0x75/0x150 Fixes: 62232e45f4a2 ("libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices") Cc: Dave Jiang Reported-by: Vishal Verma Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman --- drivers/nvdimm/bus.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 5f47356d6942..254b0ee37039 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -590,8 +590,14 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm, rc = nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len); if (rc < 0) goto out_unlock; + nvdimm_bus_unlock(&nvdimm_bus->dev); + if (copy_to_user(p, buf, buf_len)) rc = -EFAULT; + + vfree(buf); + return rc; + out_unlock: nvdimm_bus_unlock(&nvdimm_bus->dev); out: From c51451e43bf19bb36fa50d81ae736dd9e7d66d4a Mon Sep 17 00:00:00 2001 From: Tyler Baker Date: Thu, 13 Apr 2017 15:27:31 -0700 Subject: [PATCH 379/521] irqchip/irq-imx-gpcv2: Fix spinlock initialization commit 75eb5e1e7b4edbc8e8f930de59004d21cb46961f upstream. The raw_spinlock in the IMX GPCV2 interupt chip is not initialized before usage. That results in a lockdep splat: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. Add the missing raw_spin_lock_init() to the setup code. Fixes: e324c4dc4a59 ("irqchip/imx-gpcv2: IMX GPCv2 driver for wakeup sources") Signed-off-by: Tyler Baker Reviewed-by: Fabio Estevam Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: shawnguo@kernel.org Cc: andrew.smirnov@gmail.com Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20170413222731.5917-1-tyler.baker@linaro.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- drivers/irqchip/irq-imx-gpcv2.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/irqchip/irq-imx-gpcv2.c b/drivers/irqchip/irq-imx-gpcv2.c index 15af9a9753e5..2d203b422129 100644 --- a/drivers/irqchip/irq-imx-gpcv2.c +++ b/drivers/irqchip/irq-imx-gpcv2.c @@ -230,6 +230,8 @@ static int __init imx_gpcv2_irqchip_init(struct device_node *node, return -ENOMEM; } + raw_spin_lock_init(&cd->rlock); + cd->gpc_base = of_iomap(node, 0); if (!cd->gpc_base) { pr_err("fsl-gpcv2: unable to map gpc registers\n"); From 7fe57118a7c002c59e4087806f8f5a9f4a0b037f Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Fri, 14 Apr 2017 17:45:45 -0400 Subject: [PATCH 380/521] ftrace: Fix removing of second function probe commit 82cc4fc2e70ec5baeff8f776f2773abc8b2cc0ae upstream. When two function probes are added to set_ftrace_filter, and then one of them is removed, the update to the function locations is not performed, and the record keeping of the function states are corrupted, and causes an ftrace_bug() to occur. This is easily reproducable by adding two probes, removing one, and then adding it back again. # cd /sys/kernel/debug/tracing # echo schedule:traceoff > set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter Causes: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220 Modules linked in: [...] CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 Call Trace: dump_stack+0x68/0x9f __warn+0x111/0x130 ? trace_irq_work_interrupt+0xa0/0xa0 warn_slowpath_null+0x1d/0x20 ftrace_get_addr_curr+0x143/0x220 ? __fentry__+0x10/0x10 ftrace_replace_code+0xe3/0x4f0 ? ftrace_int3_handler+0x90/0x90 ? printk+0x99/0xb5 ? 0xffffffff81000000 ftrace_modify_all_code+0x97/0x110 arch_ftrace_update_code+0x10/0x20 ftrace_run_update_code+0x1c/0x60 ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0 register_ftrace_function_probe+0x4b6/0x590 ? ftrace_startup+0x310/0x310 ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30 ? update_stack_state+0x88/0x110 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? preempt_count_sub+0x18/0xd0 ? mutex_lock_nested+0x104/0x800 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? __unwind_start+0x1c0/0x1c0 ? _mutex_lock_nest_lock+0x800/0x800 ftrace_trace_probe_callback.isra.3+0xc0/0x130 ? func_set_flag+0xe0/0xe0 ? __lock_acquire+0x642/0x1790 ? __might_fault+0x1e/0x20 ? trace_get_user+0x398/0x470 ? strcmp+0x35/0x60 ftrace_trace_onoff_callback+0x48/0x70 ftrace_regex_write.isra.43.part.44+0x251/0x320 ? match_records+0x420/0x420 ftrace_filter_write+0x2b/0x30 __vfs_write+0xd7/0x330 ? do_loop_readv_writev+0x120/0x120 ? locks_remove_posix+0x90/0x2f0 ? do_lock_file_wait+0x160/0x160 ? __lock_is_held+0x93/0x100 ? rcu_read_lock_sched_held+0x5c/0xb0 ? preempt_count_sub+0x18/0xd0 ? __sb_start_write+0x10a/0x230 ? vfs_write+0x222/0x240 vfs_write+0xef/0x240 SyS_write+0xab/0x130 ? SyS_read+0x130/0x130 ? trace_hardirqs_on_caller+0x182/0x280 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7fe61c157c30 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c ? trace_hardirqs_off_caller+0xc0/0x110 ---[ end trace 99fa09b3d9869c2c ]--- Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150) Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/ftrace.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 3f743b147247..34b2a0d5cf1a 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3677,23 +3677,24 @@ static void __enable_ftrace_function_probe(struct ftrace_ops_hash *old_hash) ftrace_probe_registered = 1; } -static void __disable_ftrace_function_probe(void) +static bool __disable_ftrace_function_probe(void) { int i; if (!ftrace_probe_registered) - return; + return false; for (i = 0; i < FTRACE_FUNC_HASHSIZE; i++) { struct hlist_head *hhd = &ftrace_func_hash[i]; if (hhd->first) - return; + return false; } /* no more funcs left */ ftrace_shutdown(&trace_probe_ops, 0); ftrace_probe_registered = 0; + return true; } @@ -3820,6 +3821,7 @@ static void __unregister_ftrace_function_probe(char *glob, struct ftrace_probe_ops *ops, void *data, int flags) { + struct ftrace_ops_hash old_hash_ops; struct ftrace_func_entry *rec_entry; struct ftrace_func_probe *entry; struct ftrace_func_probe *p; @@ -3831,6 +3833,7 @@ __unregister_ftrace_function_probe(char *glob, struct ftrace_probe_ops *ops, struct hlist_node *tmp; char str[KSYM_SYMBOL_LEN]; int i, ret; + bool disabled; if (glob && (strcmp(glob, "*") == 0 || !strlen(glob))) func_g.search = NULL; @@ -3849,6 +3852,10 @@ __unregister_ftrace_function_probe(char *glob, struct ftrace_probe_ops *ops, mutex_lock(&trace_probe_ops.func_hash->regex_lock); + old_hash_ops.filter_hash = old_hash; + /* Probes only have filters */ + old_hash_ops.notrace_hash = NULL; + hash = alloc_and_copy_ftrace_hash(FTRACE_HASH_DEFAULT_BITS, *orig_hash); if (!hash) /* Hmm, should report this somehow */ @@ -3886,12 +3893,17 @@ __unregister_ftrace_function_probe(char *glob, struct ftrace_probe_ops *ops, } } mutex_lock(&ftrace_lock); - __disable_ftrace_function_probe(); + disabled = __disable_ftrace_function_probe(); /* * Remove after the disable is called. Otherwise, if the last * probe is removed, a null hash means *all enabled*. */ ret = ftrace_hash_move(&trace_probe_ops, 1, orig_hash, hash); + + /* still need to update the function call sites */ + if (ftrace_enabled && !disabled) + ftrace_run_modify_code(&trace_probe_ops, FTRACE_UPDATE_CALLS, + &old_hash_ops); synchronize_sched(); if (!ret) free_ftrace_hash_rcu(old_hash); From 0a6aa0d1cf27e9ca7b309cc86aa6b100754f88a4 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 11 Apr 2016 10:40:55 +0200 Subject: [PATCH 381/521] char: Drop bogus dependency of DEVPORT on !M68K commit 309124e2648d668a0c23539c5078815660a4a850 upstream. According to full-history-linux commit d3794f4fa7c3edc3 ("[PATCH] M68k update (part 25)"), port operations are allowed on m68k if CONFIG_ISA is defined. However, commit 153dcc54df826d2f ("[PATCH] mem driver: fix conditional on isa i/o support") accidentally changed an "||" into an "&&", disabling it completely on m68k. This logic was retained when introducing the DEVPORT symbol in commit 4f911d64e04a44c4 ("Make /dev/port conditional on config symbol"). Drop the bogus dependency on !M68K to fix this. Fixes: 153dcc54df826d2f ("[PATCH] mem driver: fix conditional on isa i/o support") Signed-off-by: Geert Uytterhoeven Tested-by: Al Stone Signed-off-by: Greg Kroah-Hartman --- drivers/char/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index a043107da2af..b130d38cf55c 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -584,7 +584,6 @@ config TELCLOCK config DEVPORT bool - depends on !M68K depends on ISA || PCI default y From a32c5331b462670093ec809ec063ad7d28f47126 Mon Sep 17 00:00:00 2001 From: Max Bires Date: Tue, 3 Jan 2017 08:18:07 -0800 Subject: [PATCH 382/521] char: lack of bool string made CONFIG_DEVPORT always on commit f2cfa58b136e4b06a9b9db7af5ef62fbb5992f62 upstream. Without a bool string present, using "# CONFIG_DEVPORT is not set" in defconfig files would not actually unset devport. This esnured that /dev/port was always on, but there are reasons a user may wish to disable it (smaller kernel, attack surface reduction) if it's not being used. Adding a message here in order to make this user visible. Signed-off-by: Max Bires Acked-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- drivers/char/Kconfig | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index b130d38cf55c..3143db57ce44 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -583,9 +583,12 @@ config TELCLOCK controlling the behavior of this hardware. config DEVPORT - bool + bool "/dev/port character device" depends on ISA || PCI default y + help + Say Y here if you want to support the /dev/port device. The /dev/port + device is similar to /dev/mem, but for I/O ports. source "drivers/s390/char/Kconfig" From 98c953a0a51fffa0904e143694222b213fa3c68f Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 18 Apr 2017 16:16:57 +0200 Subject: [PATCH 383/521] Revert "MIPS: Lantiq: Fix cascaded IRQ setup" This reverts commit 6280ac931a23d3fa40cd26057576abcf90a4f22d which is commit 6c356eda225e3ee134ed4176b9ae3a76f793f4dd upstream. It shouldn't have been included in a stable release. Reported-by: Amit Pundir Cc: Felix Fietkau Cc: John Crispin Cc: James Hogan Signed-off-by: Greg Kroah-Hartman --- arch/mips/lantiq/irq.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/arch/mips/lantiq/irq.c b/arch/mips/lantiq/irq.c index 51cdc46a87e2..2e7f60c9fc5d 100644 --- a/arch/mips/lantiq/irq.c +++ b/arch/mips/lantiq/irq.c @@ -269,11 +269,6 @@ static void ltq_hw5_irqdispatch(void) DEFINE_HWx_IRQDISPATCH(5) #endif -static void ltq_hw_irq_handler(struct irq_desc *desc) -{ - ltq_hw_irqdispatch(irq_desc_get_irq(desc) - 2); -} - #ifdef CONFIG_MIPS_MT_SMP void __init arch_init_ipiirq(int irq, struct irqaction *action) { @@ -318,19 +313,23 @@ static struct irqaction irq_call = { asmlinkage void plat_irq_dispatch(void) { unsigned int pending = read_c0_status() & read_c0_cause() & ST0_IM; - int irq; + unsigned int i; - if (!pending) { - spurious_interrupt(); - return; + if ((MIPS_CPU_TIMER_IRQ == 7) && (pending & CAUSEF_IP7)) { + do_IRQ(MIPS_CPU_TIMER_IRQ); + goto out; + } else { + for (i = 0; i < MAX_IM; i++) { + if (pending & (CAUSEF_IP2 << i)) { + ltq_hw_irqdispatch(i); + goto out; + } + } } + pr_alert("Spurious IRQ: CAUSE=0x%08x\n", read_c0_status()); - pending >>= CAUSEB_IP; - while (pending) { - irq = fls(pending) - 1; - do_IRQ(MIPS_CPU_IRQ_BASE + irq); - pending &= ~BIT(irq); - } +out: + return; } static int icu_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hw) @@ -355,6 +354,11 @@ static const struct irq_domain_ops irq_domain_ops = { .map = icu_map, }; +static struct irqaction cascade = { + .handler = no_action, + .name = "cascade", +}; + int __init icu_of_init(struct device_node *node, struct device_node *parent) { struct device_node *eiu_node; @@ -386,7 +390,7 @@ int __init icu_of_init(struct device_node *node, struct device_node *parent) mips_cpu_irq_init(); for (i = 0; i < MAX_IM; i++) - irq_set_chained_handler(i + 2, ltq_hw_irq_handler); + setup_irq(i + 2, &cascade); if (cpu_has_vint) { pr_info("Setting up vectored interrupts\n"); From c1fc1d2f214e33f91565a65ad1b4c09dae618d84 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Tue, 24 Jan 2017 11:56:21 +0100 Subject: [PATCH 384/521] kvm: fix page struct leak in handle_vmon commit 06ce521af9558814b8606c0476c54497cf83a653 upstream. handle_vmon gets a reference on VMXON region page, but does not release it. Release the reference. Found by syzkaller; based on a patch by Dmitry. Reported-by: Dmitry Vyukov Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: use skip_emulated_instruction()] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3a7ae80dc49d..0a472e9865c5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6678,14 +6678,20 @@ static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason, } page = nested_get_page(vcpu, vmptr); - if (page == NULL || - *(u32 *)kmap(page) != VMCS12_REVISION) { + if (page == NULL) { nested_vmx_failInvalid(vcpu); + skip_emulated_instruction(vcpu); + return 1; + } + if (*(u32 *)kmap(page) != VMCS12_REVISION) { kunmap(page); + nested_release_page_clean(page); + nested_vmx_failInvalid(vcpu); skip_emulated_instruction(vcpu); return 1; } kunmap(page); + nested_release_page_clean(page); vmx->nested.vmxon_ptr = vmptr; break; case EXIT_REASON_VMCLEAR: From 9286385a3452d7eeb01bfb94676389bba6f59ebd Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Thu, 13 Apr 2017 14:56:37 -0700 Subject: [PATCH 385/521] zram: do not use copy_page with non-page aligned address commit d72e9a7a93e4f8e9e52491921d99e0c8aa89eb4e upstream. The copy_page is optimized memcpy for page-alinged address. If it is used with non-page aligned address, it can corrupt memory which means system corruption. With zram, it can happen with 1. 64K architecture 2. partial IO 3. slub debug Partial IO need to allocate a page and zram allocates it via kmalloc. With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned address. And finally, copy_page(mem, cmem) corrupts memory. So, this patch changes it to memcpy. Actuaully, we don't need to change zram_bvec_write part because zsmalloc returns page-aligned address in case of PAGE_SIZE class but it's not good to rely on the internal of zsmalloc. Note: When this patch is merged to stable, clear_page should be fixed, too. Unfortunately, recent zram removes it by "same page merge" feature so it's hard to backport this patch to -stable tree. I will handle it when I receive the mail from stable tree maintainer to merge this patch to backport. Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()") Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim Cc: Sergey Senozhatsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- drivers/block/zram/zram_drv.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 1648de80e230..62a93b685c54 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -574,13 +574,13 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index) if (!handle || zram_test_flag(meta, index, ZRAM_ZERO)) { bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value); - clear_page(mem); + memset(mem, 0, PAGE_SIZE); return 0; } cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO); if (size == PAGE_SIZE) - copy_page(mem, cmem); + memcpy(mem, cmem, PAGE_SIZE); else ret = zcomp_decompress(zram->comp, cmem, size, mem); zs_unmap_object(meta->mem_pool, handle); @@ -738,7 +738,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { src = kmap_atomic(page); - copy_page(cmem, src); + memcpy(cmem, src, PAGE_SIZE); kunmap_atomic(src); } else { memcpy(cmem, src, clen); From 70e55aaf9f8cb4a74ca2744457b1d817353090e3 Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt Date: Mon, 20 Mar 2017 17:49:03 +1100 Subject: [PATCH 386/521] powerpc: Disable HFSCR[TM] if TM is not supported commit 7ed23e1bae8bf7e37fd555066550a00b95a3a98b upstream. On Power8 & Power9 the early CPU inititialisation in __init_HFSCR() turns on HFSCR[TM] (Hypervisor Facility Status and Control Register [Transactional Memory]), but that doesn't take into account that TM might be disabled by CPU features, or disabled by the kernel being built with CONFIG_PPC_TRANSACTIONAL_MEM=n. So later in boot, when we have setup the CPU features, clear HSCR[TM] if the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account for the CONFIG_PPC_TRANSACTIONAL_MEM=n case. Without this a KVM guest might try use TM, even if told not to, and cause an oops in the host kernel. Typically the oops is seen in __kvmppc_vcore_entry() and may or may not be fatal to the host, but is always bad news. In practice all shipping CPU revisions do support TM, and all host kernels we are aware of build with TM support enabled, so no one should actually be able to hit this in the wild. Fixes: 2a3563b023e5 ("powerpc: Setup in HFSCR for POWER8") Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Benjamin Herrenschmidt Tested-by: Sam Bobroff [mpe: Rewrite change log with input from Sam, add Fixes/stable] Signed-off-by: Michael Ellerman [sb: Backported to linux-4.4.y: adjusted context] Signed-off-by: Sam Bobroff Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/setup_64.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 5c03a6a9b054..a20823210ac0 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -220,6 +220,15 @@ static void cpu_ready_for_interrupts(void) unsigned long lpcr = mfspr(SPRN_LPCR); mtspr(SPRN_LPCR, lpcr | LPCR_AIL_3); } + + /* + * Fixup HFSCR:TM based on CPU features. The bit is set by our + * early asm init because at that point we haven't updated our + * CPU features from firmware and device-tree. Here we have, + * so let's do it. + */ + if (cpu_has_feature(CPU_FTR_HVMODE) && !cpu_has_feature(CPU_FTR_TM_COMP)) + mtspr(SPRN_HFSCR, mfspr(SPRN_HFSCR) & ~HFSCR_TM); } /* From 2673d1c5122ee2492e24d9a135e230b2d0b2e630 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Mon, 10 Apr 2017 17:27:57 +0800 Subject: [PATCH 387/521] crypto: ahash - Fix EINPROGRESS notification callback commit ef0579b64e93188710d48667cb5e014926af9f1b upstream. The ahash API modifies the request's callback function in order to clean up after itself in some corner cases (unaligned final and missing finup). When the request is complete ahash will restore the original callback and everything is fine. However, when the request gets an EBUSY on a full queue, an EINPROGRESS callback is made while the request is still ongoing. In this case the ahash API will incorrectly call its own callback. This patch fixes the problem by creating a temporary request object on the stack which is used to relay EINPROGRESS back to the original completion function. This patch also adds code to preserve the original flags value. Fixes: ab6bf4e5e5e4 ("crypto: hash - Fix the pointer voodoo in...") Reported-by: Sabrina Dubroca Tested-by: Sabrina Dubroca Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/ahash.c | 81 +++++++++++++++++++++------------- include/crypto/internal/hash.h | 10 +++++ 2 files changed, 61 insertions(+), 30 deletions(-) diff --git a/crypto/ahash.c b/crypto/ahash.c index dac1c24e9c3e..f9caf0f74199 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -31,6 +31,7 @@ struct ahash_request_priv { crypto_completion_t complete; void *data; u8 *result; + u32 flags; void *ubuf[] CRYPTO_MINALIGN_ATTR; }; @@ -270,6 +271,8 @@ static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt) priv->result = req->result; priv->complete = req->base.complete; priv->data = req->base.data; + priv->flags = req->base.flags; + /* * WARNING: We do not backup req->priv here! The req->priv * is for internal use of the Crypto API and the @@ -284,38 +287,44 @@ static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt) return 0; } -static void ahash_restore_req(struct ahash_request *req) +static void ahash_restore_req(struct ahash_request *req, int err) { struct ahash_request_priv *priv = req->priv; + if (!err) + memcpy(priv->result, req->result, + crypto_ahash_digestsize(crypto_ahash_reqtfm(req))); + /* Restore the original crypto request. */ req->result = priv->result; - req->base.complete = priv->complete; - req->base.data = priv->data; + + ahash_request_set_callback(req, priv->flags, + priv->complete, priv->data); req->priv = NULL; /* Free the req->priv.priv from the ADJUSTED request. */ kzfree(priv); } -static void ahash_op_unaligned_finish(struct ahash_request *req, int err) +static void ahash_notify_einprogress(struct ahash_request *req) { struct ahash_request_priv *priv = req->priv; + struct crypto_async_request oreq; - if (err == -EINPROGRESS) - return; + oreq.data = priv->data; - if (!err) - memcpy(priv->result, req->result, - crypto_ahash_digestsize(crypto_ahash_reqtfm(req))); - - ahash_restore_req(req); + priv->complete(&oreq, -EINPROGRESS); } static void ahash_op_unaligned_done(struct crypto_async_request *req, int err) { struct ahash_request *areq = req->data; + if (err == -EINPROGRESS) { + ahash_notify_einprogress(areq); + return; + } + /* * Restore the original request, see ahash_op_unaligned() for what * goes where. @@ -326,7 +335,7 @@ static void ahash_op_unaligned_done(struct crypto_async_request *req, int err) */ /* First copy req->result into req->priv.result */ - ahash_op_unaligned_finish(areq, err); + ahash_restore_req(areq, err); /* Complete the ORIGINAL request. */ areq->base.complete(&areq->base, err); @@ -342,7 +351,12 @@ static int ahash_op_unaligned(struct ahash_request *req, return err; err = op(req); - ahash_op_unaligned_finish(req, err); + if (err == -EINPROGRESS || + (err == -EBUSY && (ahash_request_flags(req) & + CRYPTO_TFM_REQ_MAY_BACKLOG))) + return err; + + ahash_restore_req(req, err); return err; } @@ -377,25 +391,14 @@ int crypto_ahash_digest(struct ahash_request *req) } EXPORT_SYMBOL_GPL(crypto_ahash_digest); -static void ahash_def_finup_finish2(struct ahash_request *req, int err) -{ - struct ahash_request_priv *priv = req->priv; - - if (err == -EINPROGRESS) - return; - - if (!err) - memcpy(priv->result, req->result, - crypto_ahash_digestsize(crypto_ahash_reqtfm(req))); - - ahash_restore_req(req); -} - static void ahash_def_finup_done2(struct crypto_async_request *req, int err) { struct ahash_request *areq = req->data; - ahash_def_finup_finish2(areq, err); + if (err == -EINPROGRESS) + return; + + ahash_restore_req(areq, err); areq->base.complete(&areq->base, err); } @@ -406,11 +409,15 @@ static int ahash_def_finup_finish1(struct ahash_request *req, int err) goto out; req->base.complete = ahash_def_finup_done2; - req->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; + err = crypto_ahash_reqtfm(req)->final(req); + if (err == -EINPROGRESS || + (err == -EBUSY && (ahash_request_flags(req) & + CRYPTO_TFM_REQ_MAY_BACKLOG))) + return err; out: - ahash_def_finup_finish2(req, err); + ahash_restore_req(req, err); return err; } @@ -418,7 +425,16 @@ static void ahash_def_finup_done1(struct crypto_async_request *req, int err) { struct ahash_request *areq = req->data; + if (err == -EINPROGRESS) { + ahash_notify_einprogress(areq); + return; + } + + areq->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; + err = ahash_def_finup_finish1(areq, err); + if (areq->priv) + return; areq->base.complete(&areq->base, err); } @@ -433,6 +449,11 @@ static int ahash_def_finup(struct ahash_request *req) return err; err = tfm->update(req); + if (err == -EINPROGRESS || + (err == -EBUSY && (ahash_request_flags(req) & + CRYPTO_TFM_REQ_MAY_BACKLOG))) + return err; + return ahash_def_finup_finish1(req, err); } diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h index 3b4af1d7c7e9..a25414ce2898 100644 --- a/include/crypto/internal/hash.h +++ b/include/crypto/internal/hash.h @@ -173,6 +173,16 @@ static inline struct ahash_instance *ahash_alloc_instance( return crypto_alloc_instance2(name, alg, ahash_instance_headroom()); } +static inline void ahash_request_complete(struct ahash_request *req, int err) +{ + req->base.complete(&req->base, err); +} + +static inline u32 ahash_request_flags(struct ahash_request *req) +{ + return req->base.flags; +} + static inline struct crypto_ahash *crypto_spawn_ahash( struct crypto_ahash_spawn *spawn) { From ea6d8d67001a40c74f4a732f897c28440a5e8dfd Mon Sep 17 00:00:00 2001 From: Miaoqing Pan Date: Wed, 16 Nov 2016 17:23:08 +0800 Subject: [PATCH 388/521] ath9k: fix NULL pointer dereference commit 40bea976c72b9ee60f8d097852deb53ccbeaffbe upstream. relay_open() may return NULL, check the return value to avoid the crash. BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: [] ath_cmn_process_fft+0xd5/0x700 [ath9k_common] PGD 41cf28067 PUD 41be92067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6+ #35 Hardware name: Hewlett-Packard h8-1080t/2A86, BIOS 6.15 07/04/2011 task: ffffffff81e0c4c0 task.stack: ffffffff81e00000 RIP: 0010:[] [] ath_cmn_process_fft+0xd5/0x700 [ath9k_common] RSP: 0018:ffff88041f203ca0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 000000000000059f RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffffffff81f0ca98 RBP: ffff88041f203dc8 R08: ffffffffffffffff R09: 00000000000000ff R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff81f0ca98 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88041f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 000000041b6ec000 CR4: 00000000000006f0 Stack: 0000000000000363 00000000000003f3 00000000000003f3 00000000000001f9 000000000000049a 0000000001252c04 ffff88041f203e44 ffff880417b4bfd0 0000000000000008 ffff88041785b9c0 0000000000000002 ffff88041613dc60 Call Trace: [] ath9k_tasklet+0x1b1/0x220 [ath9k] [] tasklet_action+0x4d/0xf0 [] __do_softirq+0x92/0x2a0 Reported-by: Devin Tuchsen Tested-by: Devin Tuchsen Signed-off-by: Miaoqing Pan Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/ath9k/common-spectral.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath9k/common-spectral.c b/drivers/net/wireless/ath/ath9k/common-spectral.c index a8762711ad74..03945731eb65 100644 --- a/drivers/net/wireless/ath/ath9k/common-spectral.c +++ b/drivers/net/wireless/ath/ath9k/common-spectral.c @@ -528,6 +528,9 @@ int ath_cmn_process_fft(struct ath_spec_scan_priv *spec_priv, struct ieee80211_h if (!(radar_info->pulse_bw_info & SPECTRAL_SCAN_BITMASK)) return 0; + if (!spec_priv->rfs_chan_spec_scan) + return 1; + /* Output buffers are full, no need to process anything * since there is no space to put the result anyway */ @@ -1072,7 +1075,7 @@ static struct rchan_callbacks rfs_spec_scan_cb = { void ath9k_cmn_spectral_deinit_debug(struct ath_spec_scan_priv *spec_priv) { - if (config_enabled(CONFIG_ATH9K_DEBUGFS)) { + if (config_enabled(CONFIG_ATH9K_DEBUGFS) && spec_priv->rfs_chan_spec_scan) { relay_close(spec_priv->rfs_chan_spec_scan); spec_priv->rfs_chan_spec_scan = NULL; } @@ -1086,6 +1089,9 @@ void ath9k_cmn_spectral_init_debug(struct ath_spec_scan_priv *spec_priv, debugfs_phy, 1024, 256, &rfs_spec_scan_cb, NULL); + if (!spec_priv->rfs_chan_spec_scan) + return; + debugfs_create_file("spectral_scan_ctl", S_IRUSR | S_IWUSR, debugfs_phy, spec_priv, From 0cb03b6e7086e59647cf6eb79fec646cdec69691 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 2 Feb 2017 12:36:01 -0200 Subject: [PATCH 389/521] dvb-usb-v2: avoid use-after-free commit 005145378c9ad7575a01b6ce1ba118fb427f583a upstream. I ran into a stack frame size warning because of the on-stack copy of the USB device structure: drivers/media/usb/dvb-usb-v2/dvb_usb_core.c: In function 'dvb_usbv2_disconnect': drivers/media/usb/dvb-usb-v2/dvb_usb_core.c:1029:1: error: the frame size of 1104 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Copying a device structure like this is wrong for a number of other reasons too aside from the possible stack overflow. One of them is that the dev_info() call will print the name of the device later, but AFAICT we have only copied a pointer to the name earlier and the actual name has been freed by the time it gets printed. This removes the on-stack copy of the device and instead copies the device name using kstrdup(). I'm ignoring the possible failure here as both printk() and kfree() are able to deal with NULL pointers. Signed-off-by: Arnd Bergmann Signed-off-by: Mauro Carvalho Chehab Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- drivers/media/usb/dvb-usb-v2/dvb_usb_core.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/media/usb/dvb-usb-v2/dvb_usb_core.c b/drivers/media/usb/dvb-usb-v2/dvb_usb_core.c index f5df9eaba04f..9757f35cd5f5 100644 --- a/drivers/media/usb/dvb-usb-v2/dvb_usb_core.c +++ b/drivers/media/usb/dvb-usb-v2/dvb_usb_core.c @@ -1010,8 +1010,8 @@ EXPORT_SYMBOL(dvb_usbv2_probe); void dvb_usbv2_disconnect(struct usb_interface *intf) { struct dvb_usb_device *d = usb_get_intfdata(intf); - const char *name = d->name; - struct device dev = d->udev->dev; + const char *devname = kstrdup(dev_name(&d->udev->dev), GFP_KERNEL); + const char *drvname = d->name; dev_dbg(&d->udev->dev, "%s: bInterfaceNumber=%d\n", __func__, intf->cur_altsetting->desc.bInterfaceNumber); @@ -1021,8 +1021,9 @@ void dvb_usbv2_disconnect(struct usb_interface *intf) dvb_usbv2_exit(d); - dev_info(&dev, "%s: '%s' successfully deinitialized and disconnected\n", - KBUILD_MODNAME, name); + pr_info("%s: '%s:%s' successfully deinitialized and disconnected\n", + KBUILD_MODNAME, drvname, devname); + kfree(devname); } EXPORT_SYMBOL(dvb_usbv2_disconnect); From 51f8d95c89b4aa74e591015d076d2775d9286704 Mon Sep 17 00:00:00 2001 From: Daeho Jeong Date: Thu, 1 Dec 2016 11:49:12 -0500 Subject: [PATCH 390/521] ext4: fix inode checksum calculation problem if i_extra_size is small commit 05ac5aa18abd7db341e54df4ae2b4c98ea0e43b7 upstream. We've fixed the race condition problem in calculating ext4 checksum value in commit b47820edd163 ("ext4: avoid modifying checksum fields directly during checksum veficationon"). However, by this change, when calculating the checksum value of inode whose i_extra_size is less than 4, we couldn't calculate the checksum value in a proper way. This problem was found and reported by Nix, Thank you. Reported-by: Nix Signed-off-by: Daeho Jeong Signed-off-by: Youngjin Gil Signed-off-by: Darrick J. Wong Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inode.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 7dcc97eadb12..817a937de733 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -71,10 +71,9 @@ static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw, csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum, csum_size); offset += csum_size; - csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset, - EXT4_INODE_SIZE(inode->i_sb) - - offset); } + csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset, + EXT4_INODE_SIZE(inode->i_sb) - offset); } return csum; From ccf0904c49b1bc2234dbde4978eaf1c384da11bf Mon Sep 17 00:00:00 2001 From: "Lee, Chun-Yi" Date: Thu, 3 Nov 2016 08:18:52 +0800 Subject: [PATCH 391/521] platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 98d610c3739ac354319a6590b915f4624d9151e6 upstream. The accelerometer event relies on the ACERWMID_EVENT_GUID notify. So, this patch changes the codes to setup accelerometer input device when detected ACERWMID_EVENT_GUID. It avoids that the accel input device created on every Acer machines. In addition, patch adds a clearly parsing logic of accelerometer hid to acer_wmi_get_handle_cb callback function. It is positive matching the "SENR" name with "BST0001" device to avoid non-supported hardware. Reported-by: Bjørn Mork Cc: Darren Hart Signed-off-by: Lee, Chun-Yi [andy: slightly massage commit message] Signed-off-by: Andy Shevchenko Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- drivers/platform/x86/acer-wmi.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/platform/x86/acer-wmi.c b/drivers/platform/x86/acer-wmi.c index 1062fa42ff26..b2cdc1a1ad4f 100644 --- a/drivers/platform/x86/acer-wmi.c +++ b/drivers/platform/x86/acer-wmi.c @@ -1816,11 +1816,24 @@ static int __init acer_wmi_enable_lm(void) return status; } +#define ACER_WMID_ACCEL_HID "BST0001" + static acpi_status __init acer_wmi_get_handle_cb(acpi_handle ah, u32 level, void *ctx, void **retval) { + struct acpi_device *dev; + + if (!strcmp(ctx, "SENR")) { + if (acpi_bus_get_device(ah, &dev)) + return AE_OK; + if (!strcmp(ACER_WMID_ACCEL_HID, acpi_device_hid(dev))) + return AE_OK; + } else + return AE_OK; + *(acpi_handle *)retval = ah; - return AE_OK; + + return AE_CTRL_TERMINATE; } static int __init acer_wmi_get_handle(const char *name, const char *prop, @@ -1847,7 +1860,7 @@ static int __init acer_wmi_accel_setup(void) { int err; - err = acer_wmi_get_handle("SENR", "BST0001", &gsensor_handle); + err = acer_wmi_get_handle("SENR", ACER_WMID_ACCEL_HID, &gsensor_handle); if (err) return err; @@ -2185,10 +2198,11 @@ static int __init acer_wmi_init(void) err = acer_wmi_input_setup(); if (err) return err; + err = acer_wmi_accel_setup(); + if (err) + return err; } - acer_wmi_accel_setup(); - err = platform_driver_register(&acer_platform_driver); if (err) { pr_err("Unable to register platform driver\n"); From ba02781392fa1b934e41785a5301ca21ad44708b Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Thu, 12 Jan 2017 17:07:43 +0100 Subject: [PATCH 392/521] rtc: tegra: Implement clock handling commit 5fa4086987506b2ab8c92f8f99f2295db9918856 upstream. Accessing the registers of the RTC block on Tegra requires the module clock to be enabled. This only works because the RTC module clock will be enabled by default during early boot. However, because the clock is unused, the CCF will disable it at late_init time. This causes the RTC to become unusable afterwards. This can easily be reproduced by trying to use the RTC: $ hwclock --rtc /dev/rtc1 This will hang the system. I ran into this by following up on a report by Martin Michlmayr that reboot wasn't working on Tegra210 systems. It turns out that the rtc-tegra driver's ->shutdown() implementation will hang the CPU, because of the disabled clock, before the system can be rebooted. What confused me for a while is that the same driver is used on prior Tegra generations where the hang can not be observed. However, as Peter De Schrijver pointed out, this is because on 32-bit Tegra chips the RTC clock is enabled by the tegra20_timer.c clocksource driver, which uses the RTC to provide a persistent clock. This code is never enabled on 64-bit Tegra because the persistent clock infrastructure does not exist on 64-bit ARM. The proper fix for this is to add proper clock handling to the RTC driver in order to ensure that the clock is enabled when the driver requires it. All device trees contain the clock already, therefore no additional changes are required. Reported-by: Martin Michlmayr Acked-By Peter De Schrijver Signed-off-by: Thierry Reding Signed-off-by: Alexandre Belloni [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-tegra.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/drivers/rtc/rtc-tegra.c b/drivers/rtc/rtc-tegra.c index 60232bd366ef..71216aa68905 100644 --- a/drivers/rtc/rtc-tegra.c +++ b/drivers/rtc/rtc-tegra.c @@ -18,6 +18,7 @@ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #include +#include #include #include #include @@ -59,6 +60,7 @@ struct tegra_rtc_info { struct platform_device *pdev; struct rtc_device *rtc_dev; void __iomem *rtc_base; /* NULL if not initialized. */ + struct clk *clk; int tegra_rtc_irq; /* alarm and periodic irq */ spinlock_t tegra_rtc_lock; }; @@ -332,6 +334,14 @@ static int __init tegra_rtc_probe(struct platform_device *pdev) if (info->tegra_rtc_irq <= 0) return -EBUSY; + info->clk = devm_clk_get(&pdev->dev, NULL); + if (IS_ERR(info->clk)) + return PTR_ERR(info->clk); + + ret = clk_prepare_enable(info->clk); + if (ret < 0) + return ret; + /* set context info. */ info->pdev = pdev; spin_lock_init(&info->tegra_rtc_lock); @@ -352,7 +362,7 @@ static int __init tegra_rtc_probe(struct platform_device *pdev) ret = PTR_ERR(info->rtc_dev); dev_err(&pdev->dev, "Unable to register device (err=%d).\n", ret); - return ret; + goto disable_clk; } ret = devm_request_irq(&pdev->dev, info->tegra_rtc_irq, @@ -362,11 +372,24 @@ static int __init tegra_rtc_probe(struct platform_device *pdev) dev_err(&pdev->dev, "Unable to request interrupt for device (err=%d).\n", ret); - return ret; + goto disable_clk; } dev_notice(&pdev->dev, "Tegra internal Real Time Clock\n"); + return 0; + +disable_clk: + clk_disable_unprepare(info->clk); + return ret; +} + +static int tegra_rtc_remove(struct platform_device *pdev) +{ + struct tegra_rtc_info *info = platform_get_drvdata(pdev); + + clk_disable_unprepare(info->clk); + return 0; } @@ -419,6 +442,7 @@ static void tegra_rtc_shutdown(struct platform_device *pdev) MODULE_ALIAS("platform:tegra_rtc"); static struct platform_driver tegra_rtc_driver = { + .remove = tegra_rtc_remove, .shutdown = tegra_rtc_shutdown, .driver = { .name = "tegra_rtc", From 6739cc12f3dbd7e4b3795f6e809d44ea6b490bb6 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 5 Apr 2017 09:39:08 -0700 Subject: [PATCH 393/521] mm: Tighten x86 /dev/mem with zeroing reads commit a4866aa812518ed1a37d8ea0c881dc946409de94 upstream. Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is disallowed. However, on x86, the first 1MB was always allowed for BIOS and similar things, regardless of it actually being System RAM. It was possible for heap to end up getting allocated in low 1MB RAM, and then read by things like x86info or dd, which would trip hardened usercopy: usercopy: kernel memory exposure attempt detected from ffff880000090000 (dma-kmalloc-256) (4096 bytes) This changes the x86 exception for the low 1MB by reading back zeros for System RAM areas instead of blindly allowing them. More work is needed to extend this to mmap, but currently mmap doesn't go through usercopy, so hardened usercopy won't Oops the kernel. Reported-by: Tommi Rantala Tested-by: Tommi Rantala Signed-off-by: Kees Cook Cc: Brad Spengler Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/init.c | 41 ++++++++++++++++------- drivers/char/mem.c | 82 +++++++++++++++++++++++++++++----------------- 2 files changed, 82 insertions(+), 41 deletions(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 493f54172b4a..3aebbd6c6f5f 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -628,21 +628,40 @@ void __init init_mem_mapping(void) * devmem_is_allowed() checks to see if /dev/mem access to a certain address * is valid. The argument is a physical page number. * - * - * On x86, access has to be given to the first megabyte of ram because that area - * contains BIOS code and data regions used by X and dosemu and similar apps. - * Access has to be given to non-kernel-ram areas as well, these contain the PCI - * mmio resources as well as potential bios/acpi data regions. + * On x86, access has to be given to the first megabyte of RAM because that + * area traditionally contains BIOS code and data regions used by X, dosemu, + * and similar apps. Since they map the entire memory range, the whole range + * must be allowed (for mapping), but any areas that would otherwise be + * disallowed are flagged as being "zero filled" instead of rejected. + * Access has to be given to non-kernel-ram areas as well, these contain the + * PCI mmio resources as well as potential bios/acpi data regions. */ int devmem_is_allowed(unsigned long pagenr) { - if (pagenr < 256) - return 1; - if (iomem_is_exclusive(pagenr << PAGE_SHIFT)) + if (page_is_ram(pagenr)) { + /* + * For disallowed memory regions in the low 1MB range, + * request that the page be shown as all zeros. + */ + if (pagenr < 256) + return 2; + return 0; - if (!page_is_ram(pagenr)) - return 1; - return 0; + } + + /* + * This must follow RAM test, since System RAM is considered a + * restricted resource under CONFIG_STRICT_IOMEM. + */ + if (iomem_is_exclusive(pagenr << PAGE_SHIFT)) { + /* Low 1MB bypasses iomem restrictions. */ + if (pagenr < 256) + return 1; + + return 0; + } + + return 1; } void free_init_pages(char *what, unsigned long begin, unsigned long end) diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 6b1721f978c2..e901463d4972 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -59,6 +59,10 @@ static inline int valid_mmap_phys_addr_range(unsigned long pfn, size_t size) #endif #ifdef CONFIG_STRICT_DEVMEM +static inline int page_is_allowed(unsigned long pfn) +{ + return devmem_is_allowed(pfn); +} static inline int range_is_allowed(unsigned long pfn, unsigned long size) { u64 from = ((u64)pfn) << PAGE_SHIFT; @@ -78,6 +82,10 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size) return 1; } #else +static inline int page_is_allowed(unsigned long pfn) +{ + return 1; +} static inline int range_is_allowed(unsigned long pfn, unsigned long size) { return 1; @@ -125,23 +133,31 @@ static ssize_t read_mem(struct file *file, char __user *buf, while (count > 0) { unsigned long remaining; + int allowed; sz = size_inside_page(p, count); - if (!range_is_allowed(p >> PAGE_SHIFT, count)) + allowed = page_is_allowed(p >> PAGE_SHIFT); + if (!allowed) return -EPERM; + if (allowed == 2) { + /* Show zeros for restricted memory. */ + remaining = clear_user(buf, sz); + } else { + /* + * On ia64 if a page has been mapped somewhere as + * uncached, then it must also be accessed uncached + * by the kernel or data corruption may occur. + */ + ptr = xlate_dev_mem_ptr(p); + if (!ptr) + return -EFAULT; - /* - * On ia64 if a page has been mapped somewhere as uncached, then - * it must also be accessed uncached by the kernel or data - * corruption may occur. - */ - ptr = xlate_dev_mem_ptr(p); - if (!ptr) - return -EFAULT; + remaining = copy_to_user(buf, ptr, sz); + + unxlate_dev_mem_ptr(p, ptr); + } - remaining = copy_to_user(buf, ptr, sz); - unxlate_dev_mem_ptr(p, ptr); if (remaining) return -EFAULT; @@ -184,30 +200,36 @@ static ssize_t write_mem(struct file *file, const char __user *buf, #endif while (count > 0) { + int allowed; + sz = size_inside_page(p, count); - if (!range_is_allowed(p >> PAGE_SHIFT, sz)) + allowed = page_is_allowed(p >> PAGE_SHIFT); + if (!allowed) return -EPERM; - /* - * On ia64 if a page has been mapped somewhere as uncached, then - * it must also be accessed uncached by the kernel or data - * corruption may occur. - */ - ptr = xlate_dev_mem_ptr(p); - if (!ptr) { - if (written) - break; - return -EFAULT; - } + /* Skip actual writing when a page is marked as restricted. */ + if (allowed == 1) { + /* + * On ia64 if a page has been mapped somewhere as + * uncached, then it must also be accessed uncached + * by the kernel or data corruption may occur. + */ + ptr = xlate_dev_mem_ptr(p); + if (!ptr) { + if (written) + break; + return -EFAULT; + } - copied = copy_from_user(ptr, buf, sz); - unxlate_dev_mem_ptr(p, ptr); - if (copied) { - written += sz - copied; - if (written) - break; - return -EFAULT; + copied = copy_from_user(ptr, buf, sz); + unxlate_dev_mem_ptr(p, ptr); + if (copied) { + written += sz - copied; + if (written) + break; + return -EFAULT; + } } buf += sz; From 502157457f52654595d28a555327e84b3e35c268 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 24 Jan 2017 08:13:11 -0200 Subject: [PATCH 394/521] dvb-usb: don't use stack for firmware load commit 43fab9793c1f44e665b4f98035a14942edf03ddc upstream. As reported by Marc Duponcheel , firmware load on dvb-usb is using the stack, with is not allowed anymore on default Kernel configurations: [ 1025.958836] dvb-usb: found a 'WideView WT-220U PenType Receiver (based on ZL353)' in cold state, will try to load a firmware [ 1025.958853] dvb-usb: downloading firmware from file 'dvb-usb-wt220u-zl0353-01.fw' [ 1025.958855] dvb-usb: could not stop the USB controller CPU. [ 1025.958856] dvb-usb: error while transferring firmware (transferred size: -11, block size: 3) [ 1025.958856] dvb-usb: firmware download failed at 8 with -22 [ 1025.958867] usbcore: registered new interface driver dvb_usb_dtt200u [ 2.789902] dvb-usb: downloading firmware from file 'dvb-usb-wt220u-zl0353-01.fw' [ 2.789905] ------------[ cut here ]------------ [ 2.789911] WARNING: CPU: 3 PID: 2196 at drivers/usb/core/hcd.c:1584 usb_hcd_map_urb_for_dma+0x430/0x560 [usbcore] [ 2.789912] transfer buffer not dma capable [ 2.789912] Modules linked in: btusb dvb_usb_dtt200u(+) dvb_usb_af9035(+) btrtl btbcm dvb_usb dvb_usb_v2 btintel dvb_core bluetooth rc_core rfkill x86_pkg_temp_thermal intel_powerclamp coretemp crc32_pclmul aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd drm_kms_helper syscopyarea sysfillrect pcspkr i2c_i801 sysimgblt fb_sys_fops drm i2c_smbus i2c_core r8169 lpc_ich mfd_core mii thermal fan rtc_cmos video button acpi_cpufreq processor snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd crc32c_intel ahci libahci libata xhci_pci ehci_pci xhci_hcd ehci_hcd usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod [ 2.789936] CPU: 3 PID: 2196 Comm: systemd-udevd Not tainted 4.9.0-gentoo #1 [ 2.789937] Hardware name: ASUS All Series/H81I-PLUS, BIOS 0401 07/23/2013 [ 2.789938] ffffc9000339b690 ffffffff812bd397 ffffc9000339b6e0 0000000000000000 [ 2.789939] ffffc9000339b6d0 ffffffff81055c86 000006300339b6a0 ffff880116c0c000 [ 2.789941] 0000000000000000 0000000000000000 0000000000000001 ffff880116c08000 [ 2.789942] Call Trace: [ 2.789945] [] dump_stack+0x4d/0x66 [ 2.789947] [] __warn+0xc6/0xe0 [ 2.789948] [] warn_slowpath_fmt+0x4a/0x50 [ 2.789952] [] usb_hcd_map_urb_for_dma+0x430/0x560 [usbcore] [ 2.789954] [] ? io_schedule_timeout+0xd8/0x110 [ 2.789956] [] usb_hcd_submit_urb+0x9c/0x980 [usbcore] [ 2.789958] [] ? copy_page_to_iter+0x14f/0x2b0 [ 2.789960] [] ? pagecache_get_page+0x28/0x240 [ 2.789962] [] ? touch_atime+0x20/0xa0 [ 2.789964] [] usb_submit_urb+0x2c4/0x520 [usbcore] [ 2.789967] [] usb_start_wait_urb+0x5a/0xe0 [usbcore] [ 2.789969] [] usb_control_msg+0xbc/0xf0 [usbcore] [ 2.789970] [] usb_cypress_writemem+0x3d/0x40 [dvb_usb] [ 2.789972] [] usb_cypress_load_firmware+0x4f/0x130 [dvb_usb] [ 2.789973] [] ? console_unlock+0x2fe/0x5d0 [ 2.789974] [] ? vprintk_emit+0x27c/0x410 [ 2.789975] [] ? vprintk_default+0x1a/0x20 [ 2.789976] [] ? printk+0x43/0x4b [ 2.789977] [] dvb_usb_download_firmware+0x60/0xd0 [dvb_usb] [ 2.789979] [] dvb_usb_device_init+0x3d8/0x610 [dvb_usb] [ 2.789981] [] dtt200u_usb_probe+0x92/0xd0 [dvb_usb_dtt200u] [ 2.789984] [] usb_probe_interface+0xfc/0x270 [usbcore] [ 2.789985] [] driver_probe_device+0x215/0x2d0 [ 2.789986] [] __driver_attach+0x96/0xa0 [ 2.789987] [] ? driver_probe_device+0x2d0/0x2d0 [ 2.789988] [] bus_for_each_dev+0x5b/0x90 [ 2.789989] [] driver_attach+0x19/0x20 [ 2.789990] [] bus_add_driver+0x11c/0x220 [ 2.789991] [] driver_register+0x5b/0xd0 [ 2.789994] [] usb_register_driver+0x7c/0x130 [usbcore] [ 2.789994] [] ? 0xffffffffa06a5000 [ 2.789996] [] dtt200u_usb_driver_init+0x1e/0x20 [dvb_usb_dtt200u] [ 2.789997] [] do_one_initcall+0x38/0x140 [ 2.789998] [] ? __vunmap+0x7c/0xc0 [ 2.789999] [] ? do_init_module+0x22/0x1d2 [ 2.790000] [] do_init_module+0x5a/0x1d2 [ 2.790002] [] load_module+0x1e11/0x2580 [ 2.790003] [] ? show_taint+0x30/0x30 [ 2.790004] [] ? kernel_read_file+0x100/0x190 [ 2.790005] [] SyS_finit_module+0xba/0xc0 [ 2.790007] [] entry_SYSCALL_64_fastpath+0x13/0x94 [ 2.790008] ---[ end trace c78a74e78baec6fc ]--- So, allocate the structure dynamically. Signed-off-by: Mauro Carvalho Chehab [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c index 733a7ff7b207..0cd9b02739c6 100644 --- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c +++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c @@ -35,29 +35,34 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 addr,u8 *data, u8 le int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw, int type) { - struct hexline hx; + struct hexline *hx; u8 reset; int ret,pos=0; + hx = kmalloc(sizeof(*hx), GFP_KERNEL); + if (!hx) + return -ENOMEM; + /* stop the CPU */ reset = 1; if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,&reset,1)) != 1) err("could not stop the USB controller CPU."); - while ((ret = dvb_usb_get_hexline(fw,&hx,&pos)) > 0) { - deb_fw("writing to address 0x%04x (buffer: 0x%02x %02x)\n",hx.addr,hx.len,hx.chk); - ret = usb_cypress_writemem(udev,hx.addr,hx.data,hx.len); + while ((ret = dvb_usb_get_hexline(fw, hx, &pos)) > 0) { + deb_fw("writing to address 0x%04x (buffer: 0x%02x %02x)\n", hx->addr, hx->len, hx->chk); + ret = usb_cypress_writemem(udev, hx->addr, hx->data, hx->len); - if (ret != hx.len) { + if (ret != hx->len) { err("error while transferring firmware " "(transferred size: %d, block size: %d)", - ret,hx.len); + ret, hx->len); ret = -EINVAL; break; } } if (ret < 0) { err("firmware download failed at %d with %d",pos,ret); + kfree(hx); return ret; } @@ -71,6 +76,8 @@ int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw } else ret = -EIO; + kfree(hx); + return ret; } EXPORT_SYMBOL(usb_cypress_load_firmware); From 6be431f91632504f269b6e8ffcd552a5ca3fd84d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stefan=20Br=C3=BCns?= Date: Sun, 12 Feb 2017 13:02:13 -0200 Subject: [PATCH 395/521] dvb-usb-firmware: don't do DMA on stack MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 67b0503db9c29b04eadfeede6bebbfe5ddad94ef upstream. The buffer allocation for the firmware data was changed in commit 43fab9793c1f ("[media] dvb-usb: don't use stack for firmware load") but the same applies for the reset value. Fixes: 43fab9793c1f ("[media] dvb-usb: don't use stack for firmware load") Signed-off-by: Stefan Brüns Signed-off-by: Mauro Carvalho Chehab Cc: Ben Hutchings Cc: Brad Spengler Signed-off-by: Greg Kroah-Hartman --- drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 22 +++++++++++--------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c index 0cd9b02739c6..caad3b5c01ad 100644 --- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c +++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c @@ -36,16 +36,18 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 addr,u8 *data, u8 le int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw, int type) { struct hexline *hx; - u8 reset; - int ret,pos=0; + u8 *buf; + int ret, pos = 0; + u16 cpu_cs_register = cypress[type].cpu_cs_register; - hx = kmalloc(sizeof(*hx), GFP_KERNEL); - if (!hx) + buf = kmalloc(sizeof(*hx), GFP_KERNEL); + if (!buf) return -ENOMEM; + hx = (struct hexline *)buf; /* stop the CPU */ - reset = 1; - if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,&reset,1)) != 1) + buf[0] = 1; + if (usb_cypress_writemem(udev, cpu_cs_register, buf, 1) != 1) err("could not stop the USB controller CPU."); while ((ret = dvb_usb_get_hexline(fw, hx, &pos)) > 0) { @@ -62,21 +64,21 @@ int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw } if (ret < 0) { err("firmware download failed at %d with %d",pos,ret); - kfree(hx); + kfree(buf); return ret; } if (ret == 0) { /* restart the CPU */ - reset = 0; - if (ret || usb_cypress_writemem(udev,cypress[type].cpu_cs_register,&reset,1) != 1) { + buf[0] = 0; + if (usb_cypress_writemem(udev, cpu_cs_register, buf, 1) != 1) { err("could not restart the USB controller CPU."); ret = -EINVAL; } } else ret = -EIO; - kfree(hx); + kfree(buf); return ret; } From eb5267657d85bfcbb60803dd88fa82c7dede6aab Mon Sep 17 00:00:00 2001 From: Omar Sandoval Date: Wed, 1 Feb 2017 00:02:27 -0800 Subject: [PATCH 396/521] virtio-console: avoid DMA from stack commit c4baad50297d84bde1a7ad45e50c73adae4a2192 upstream. put_chars() stuffs the buffer it gets into an sg, but that buffer may be on the stack. This breaks with CONFIG_VMAP_STACK=y (for me, it manifested as printks getting turned into NUL bytes). Signed-off-by: Omar Sandoval Signed-off-by: Michael S. Tsirkin Reviewed-by: Amit Shah Cc: Ben Hutchings Cc: Brad Spengler Signed-off-by: Greg Kroah-Hartman --- drivers/char/virtio_console.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index 090183f812be..31e8ae916ba0 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -1130,6 +1130,8 @@ static int put_chars(u32 vtermno, const char *buf, int count) { struct port *port; struct scatterlist sg[1]; + void *data; + int ret; if (unlikely(early_put_chars)) return early_put_chars(vtermno, buf, count); @@ -1138,8 +1140,14 @@ static int put_chars(u32 vtermno, const char *buf, int count) if (!port) return -EPIPE; - sg_init_one(sg, buf, count); - return __send_to_port(port, sg, 1, count, (void *)buf, false); + data = kmemdup(buf, count, GFP_ATOMIC); + if (!data) + return -ENOMEM; + + sg_init_one(sg, data, count); + ret = __send_to_port(port, sg, 1, count, data, false); + kfree(data); + return ret; } /* From be570e556deec7466d74a579129671185501a456 Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Sat, 4 Feb 2017 16:56:03 +0000 Subject: [PATCH 397/521] pegasus: Use heap buffers for all register access MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 5593523f968bc86d42a035c6df47d5e0979b5ace upstream. Allocating USB buffers on the stack is not portable, and no longer works on x86_64 (with VMAP_STACK enabled as per default). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") References: https://bugs.debian.org/852556 Reported-by: Lisandro Damián Nicanor Pérez Meyer Tested-by: Lisandro Damián Nicanor Pérez Meyer Signed-off-by: Ben Hutchings Signed-off-by: David S. Miller Cc: Brad Spengler Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/pegasus.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c index f84080215915..17fac0121e56 100644 --- a/drivers/net/usb/pegasus.c +++ b/drivers/net/usb/pegasus.c @@ -126,40 +126,61 @@ static void async_ctrl_callback(struct urb *urb) static int get_registers(pegasus_t *pegasus, __u16 indx, __u16 size, void *data) { + u8 *buf; int ret; + buf = kmalloc(size, GFP_NOIO); + if (!buf) + return -ENOMEM; + ret = usb_control_msg(pegasus->usb, usb_rcvctrlpipe(pegasus->usb, 0), PEGASUS_REQ_GET_REGS, PEGASUS_REQT_READ, 0, - indx, data, size, 1000); + indx, buf, size, 1000); if (ret < 0) netif_dbg(pegasus, drv, pegasus->net, "%s returned %d\n", __func__, ret); + else if (ret <= size) + memcpy(data, buf, ret); + kfree(buf); return ret; } -static int set_registers(pegasus_t *pegasus, __u16 indx, __u16 size, void *data) +static int set_registers(pegasus_t *pegasus, __u16 indx, __u16 size, + const void *data) { + u8 *buf; int ret; + buf = kmemdup(data, size, GFP_NOIO); + if (!buf) + return -ENOMEM; + ret = usb_control_msg(pegasus->usb, usb_sndctrlpipe(pegasus->usb, 0), PEGASUS_REQ_SET_REGS, PEGASUS_REQT_WRITE, 0, - indx, data, size, 100); + indx, buf, size, 100); if (ret < 0) netif_dbg(pegasus, drv, pegasus->net, "%s returned %d\n", __func__, ret); + kfree(buf); return ret; } static int set_register(pegasus_t *pegasus, __u16 indx, __u8 data) { + u8 *buf; int ret; + buf = kmemdup(&data, 1, GFP_NOIO); + if (!buf) + return -ENOMEM; + ret = usb_control_msg(pegasus->usb, usb_sndctrlpipe(pegasus->usb, 0), PEGASUS_REQ_SET_REG, PEGASUS_REQT_WRITE, data, - indx, &data, 1, 1000); + indx, buf, 1, 1000); if (ret < 0) netif_dbg(pegasus, drv, pegasus->net, "%s returned %d\n", __func__, ret); + kfree(buf); return ret; } From a90604be51de4e63f916261a91edd4f67e8b0b2b Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Sat, 4 Feb 2017 16:56:32 +0000 Subject: [PATCH 398/521] rtl8150: Use heap buffers for all register access commit 7926aff5c57b577ab0f43364ff0c59d968f6a414 upstream. Allocating USB buffers on the stack is not portable, and no longer works on x86_64 (with VMAP_STACK enabled as per default). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ben Hutchings Signed-off-by: David S. Miller Cc: Brad Spengler Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/rtl8150.c | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/net/usb/rtl8150.c b/drivers/net/usb/rtl8150.c index d37b7dce2d40..39672984dde1 100644 --- a/drivers/net/usb/rtl8150.c +++ b/drivers/net/usb/rtl8150.c @@ -155,16 +155,36 @@ static const char driver_name [] = "rtl8150"; */ static int get_registers(rtl8150_t * dev, u16 indx, u16 size, void *data) { - return usb_control_msg(dev->udev, usb_rcvctrlpipe(dev->udev, 0), - RTL8150_REQ_GET_REGS, RTL8150_REQT_READ, - indx, 0, data, size, 500); + void *buf; + int ret; + + buf = kmalloc(size, GFP_NOIO); + if (!buf) + return -ENOMEM; + + ret = usb_control_msg(dev->udev, usb_rcvctrlpipe(dev->udev, 0), + RTL8150_REQ_GET_REGS, RTL8150_REQT_READ, + indx, 0, buf, size, 500); + if (ret > 0 && ret <= size) + memcpy(data, buf, ret); + kfree(buf); + return ret; } -static int set_registers(rtl8150_t * dev, u16 indx, u16 size, void *data) +static int set_registers(rtl8150_t * dev, u16 indx, u16 size, const void *data) { - return usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev, 0), - RTL8150_REQ_SET_REGS, RTL8150_REQT_WRITE, - indx, 0, data, size, 500); + void *buf; + int ret; + + buf = kmemdup(data, size, GFP_NOIO); + if (!buf) + return -ENOMEM; + + ret = usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev, 0), + RTL8150_REQ_SET_REGS, RTL8150_REQT_WRITE, + indx, 0, buf, size, 500); + kfree(buf); + return ret; } static void async_set_reg_cb(struct urb *urb) From 40531b26bade950cf9c815d8238be27b009aa197 Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Sat, 4 Feb 2017 16:56:56 +0000 Subject: [PATCH 399/521] catc: Combine failure cleanup code in catc_probe() commit d41149145f98fe26dcd0bfd1d6cc095e6e041418 upstream. Signed-off-by: Ben Hutchings Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/catc.c | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/drivers/net/usb/catc.c b/drivers/net/usb/catc.c index 4e2b26a88b15..298885f81aad 100644 --- a/drivers/net/usb/catc.c +++ b/drivers/net/usb/catc.c @@ -777,7 +777,7 @@ static int catc_probe(struct usb_interface *intf, const struct usb_device_id *id struct net_device *netdev; struct catc *catc; u8 broadcast[ETH_ALEN]; - int i, pktsz; + int i, pktsz, ret; if (usb_set_interface(usbdev, intf->altsetting->desc.bInterfaceNumber, 1)) { @@ -812,12 +812,8 @@ static int catc_probe(struct usb_interface *intf, const struct usb_device_id *id if ((!catc->ctrl_urb) || (!catc->tx_urb) || (!catc->rx_urb) || (!catc->irq_urb)) { dev_err(&intf->dev, "No free urbs available.\n"); - usb_free_urb(catc->ctrl_urb); - usb_free_urb(catc->tx_urb); - usb_free_urb(catc->rx_urb); - usb_free_urb(catc->irq_urb); - free_netdev(netdev); - return -ENOMEM; + ret = -ENOMEM; + goto fail_free; } /* The F5U011 has the same vendor/product as the netmate but a device version of 0x130 */ @@ -914,16 +910,21 @@ static int catc_probe(struct usb_interface *intf, const struct usb_device_id *id usb_set_intfdata(intf, catc); SET_NETDEV_DEV(netdev, &intf->dev); - if (register_netdev(netdev) != 0) { - usb_set_intfdata(intf, NULL); - usb_free_urb(catc->ctrl_urb); - usb_free_urb(catc->tx_urb); - usb_free_urb(catc->rx_urb); - usb_free_urb(catc->irq_urb); - free_netdev(netdev); - return -EIO; - } + ret = register_netdev(netdev); + if (ret) + goto fail_clear_intfdata; + return 0; + +fail_clear_intfdata: + usb_set_intfdata(intf, NULL); +fail_free: + usb_free_urb(catc->ctrl_urb); + usb_free_urb(catc->tx_urb); + usb_free_urb(catc->rx_urb); + usb_free_urb(catc->irq_urb); + free_netdev(netdev); + return ret; } static void catc_disconnect(struct usb_interface *intf) From 65596042c3af1c3578f5e478f512f595d7fa31d0 Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Sat, 4 Feb 2017 16:57:04 +0000 Subject: [PATCH 400/521] catc: Use heap buffer for memory size test commit 2d6a0e9de03ee658a9adc3bfb2f0ca55dff1e478 upstream. Allocating USB buffers on the stack is not portable, and no longer works on x86_64 (with VMAP_STACK enabled as per default). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ben Hutchings Signed-off-by: David S. Miller Cc: Brad Spengler Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/catc.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/drivers/net/usb/catc.c b/drivers/net/usb/catc.c index 298885f81aad..2aa1a1d29cb4 100644 --- a/drivers/net/usb/catc.c +++ b/drivers/net/usb/catc.c @@ -777,7 +777,7 @@ static int catc_probe(struct usb_interface *intf, const struct usb_device_id *id struct net_device *netdev; struct catc *catc; u8 broadcast[ETH_ALEN]; - int i, pktsz, ret; + int pktsz, ret; if (usb_set_interface(usbdev, intf->altsetting->desc.bInterfaceNumber, 1)) { @@ -841,15 +841,24 @@ static int catc_probe(struct usb_interface *intf, const struct usb_device_id *id catc->irq_buf, 2, catc_irq_done, catc, 1); if (!catc->is_f5u011) { + u32 *buf; + int i; + dev_dbg(dev, "Checking memory size\n"); - i = 0x12345678; - catc_write_mem(catc, 0x7a80, &i, 4); - i = 0x87654321; - catc_write_mem(catc, 0xfa80, &i, 4); - catc_read_mem(catc, 0x7a80, &i, 4); + buf = kmalloc(4, GFP_KERNEL); + if (!buf) { + ret = -ENOMEM; + goto fail_free; + } + + *buf = 0x12345678; + catc_write_mem(catc, 0x7a80, buf, 4); + *buf = 0x87654321; + catc_write_mem(catc, 0xfa80, buf, 4); + catc_read_mem(catc, 0x7a80, buf, 4); - switch (i) { + switch (*buf) { case 0x12345678: catc_set_reg(catc, TxBufCount, 8); catc_set_reg(catc, RxBufCount, 32); @@ -864,6 +873,8 @@ static int catc_probe(struct usb_interface *intf, const struct usb_device_id *id dev_dbg(dev, "32k Memory\n"); break; } + + kfree(buf); dev_dbg(dev, "Getting MAC from SEEROM.\n"); From 403a728d1a35111103669aa125dcecfbe04e6872 Mon Sep 17 00:00:00 2001 From: Thomas Falcon Date: Tue, 13 Dec 2016 18:15:09 -0600 Subject: [PATCH 401/521] ibmveth: calculate gso_segs for large packets commit 94acf164dc8f1184e8d0737be7125134c2701dbe upstream. Include calculations to compute the number of segments that comprise an aggregated large packet. Signed-off-by: Thomas Falcon Reviewed-by: Marcelo Ricardo Leitner Reviewed-by: Jonathan Maxwell Signed-off-by: David S. Miller Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/ibm/ibmveth.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 855c43d8f7e0..f9e4988ea30e 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1179,7 +1179,9 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt) { + struct tcphdr *tcph; int offset = 0; + int hdr_len; /* only TCP packets will be aggregated */ if (skb->protocol == htons(ETH_P_IP)) { @@ -1206,14 +1208,20 @@ static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt) /* if mss is not set through Large Packet bit/mss in rx buffer, * expect that the mss will be written to the tcp header checksum. */ + tcph = (struct tcphdr *)(skb->data + offset); if (lrg_pkt) { skb_shinfo(skb)->gso_size = mss; } else if (offset) { - struct tcphdr *tcph = (struct tcphdr *)(skb->data + offset); - skb_shinfo(skb)->gso_size = ntohs(tcph->check); tcph->check = 0; } + + if (skb_shinfo(skb)->gso_size) { + hdr_len = offset + tcph->doff * 4; + skb_shinfo(skb)->gso_segs = + DIV_ROUND_UP(skb->len - hdr_len, + skb_shinfo(skb)->gso_size); + } } static int ibmveth_poll(struct napi_struct *napi, int budget) From 8dc821b9f67d9abf2d5baca3eb92a70d91c0dbe0 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 5 Dec 2016 15:10:11 +1100 Subject: [PATCH 402/521] SUNRPC: fix refcounting problems with auth_gss messages. commit 1cded9d2974fe4fe339fc0ccd6638b80d465ab2c upstream. There are two problems with refcounting of auth_gss messages. First, the reference on the pipe->pipe list (taken by a call to rpc_queue_upcall()) is not counted. It seems to be assumed that a message in pipe->pipe will always also be in pipe->in_downcall, where it is correctly reference counted. However there is no guaranty of this. I have a report of a NULL dereferences in rpc_pipe_read() which suggests a msg that has been freed is still on the pipe->pipe list. One way I imagine this might happen is: - message is queued for uid=U and auth->service=S1 - rpc.gssd reads this message and starts processing. This removes the message from pipe->pipe - message is queued for uid=U and auth->service=S2 - rpc.gssd replies to the first message. gss_pipe_downcall() calls __gss_find_upcall(pipe, U, NULL) and it finds the *second* message, as new messages are placed at the head of ->in_downcall, and the service type is not checked. - This second message is removed from ->in_downcall and freed by gss_release_msg() (even though it is still on pipe->pipe) - rpc.gssd tries to read another message, and dereferences a pointer to this message that has just been freed. I fix this by incrementing the reference count before calling rpc_queue_upcall(), and decrementing it if that fails, or normally in gss_pipe_destroy_msg(). It seems strange that the reply doesn't target the message more precisely, but I don't know all the details. In any case, I think the reference counting irregularity became a measureable bug when the extra arg was added to __gss_find_upcall(), hence the Fixes: line below. The second problem is that if rpc_queue_upcall() fails, the new message is not freed. gss_alloc_msg() set the ->count to 1, gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1, then the pointer is discarded so the memory never gets freed. Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service") Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250 Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- net/sunrpc/auth_gss/auth_gss.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 06095cc8815e..1f0687d8e3d7 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -541,9 +541,13 @@ gss_setup_upcall(struct gss_auth *gss_auth, struct rpc_cred *cred) return gss_new; gss_msg = gss_add_msg(gss_new); if (gss_msg == gss_new) { - int res = rpc_queue_upcall(gss_new->pipe, &gss_new->msg); + int res; + atomic_inc(&gss_msg->count); + res = rpc_queue_upcall(gss_new->pipe, &gss_new->msg); if (res) { gss_unhash_msg(gss_new); + atomic_dec(&gss_msg->count); + gss_release_msg(gss_new); gss_msg = ERR_PTR(res); } } else @@ -836,6 +840,7 @@ gss_pipe_destroy_msg(struct rpc_pipe_msg *msg) warn_gssd(); gss_release_msg(gss_msg); } + gss_release_msg(gss_msg); } static void gss_pipe_dentry_destroy(struct dentry *dir, From 990a142ee0d3b504a0a3c23a16e2cda41c5d45cf Mon Sep 17 00:00:00 2001 From: Richard Genoud Date: Tue, 6 Dec 2016 13:05:33 +0100 Subject: [PATCH 403/521] tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done commit b389f173aaa1204d6dc1f299082a162eb0491545 upstream. When using RS485 in half duplex, RX should be enabled when TX is finished, and stopped when TX starts. Before commit 0058f0871efe7b01c6 ("tty/serial: atmel: fix RS485 half duplex with DMA"), RX was not disabled in atmel_start_tx() if the DMA was used. So, collisions could happened. But disabling RX in atmel_start_tx() uncovered another bug: RX was enabled again in the wrong place (in atmel_tx_dma) instead of being enabled when TX is finished (in atmel_complete_tx_dma), so the transmission simply stopped. This bug was not triggered before commit 0058f0871efe7b01c6 ("tty/serial: atmel: fix RS485 half duplex with DMA") because RX was never disabled before. Moving atmel_start_rx() in atmel_complete_tx_dma() corrects the problem. Reported-by: Gil Weber Fixes: 0058f0871efe7b01c6 Tested-by: Gil Weber Signed-off-by: Richard Genoud Acked-by: Alexandre Belloni Signed-off-by: Alexandre Belloni Tested-by: Bryan Evenson Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/atmel_serial.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c index a15070a7fcd6..53e4d5056db7 100644 --- a/drivers/tty/serial/atmel_serial.c +++ b/drivers/tty/serial/atmel_serial.c @@ -810,6 +810,11 @@ static void atmel_complete_tx_dma(void *arg) */ if (!uart_circ_empty(xmit)) tasklet_schedule(&atmel_port->tasklet); + else if ((port->rs485.flags & SER_RS485_ENABLED) && + !(port->rs485.flags & SER_RS485_RX_DURING_TX)) { + /* DMA done, stop TX, start RX for RS485 */ + atmel_start_rx(port); + } spin_unlock_irqrestore(&port->lock, flags); } @@ -912,12 +917,6 @@ static void atmel_tx_dma(struct uart_port *port) desc->callback = atmel_complete_tx_dma; desc->callback_param = atmel_port; atmel_port->cookie_tx = dmaengine_submit(desc); - - } else { - if (port->rs485.flags & SER_RS485_ENABLED) { - /* DMA done, stop TX, start RX for RS485 */ - atmel_start_rx(port); - } } if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) From f00f18ebb3b23134012a020faad85f33cd5d2e8f Mon Sep 17 00:00:00 2001 From: Mantas M Date: Fri, 16 Dec 2016 10:30:59 +0200 Subject: [PATCH 404/521] net: ipv6: check route protocol when deleting routes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit c2ed1880fd61a998e3ce40254a99a2ad000f1a7d upstream. The protocol field is checked when deleting IPv4 routes, but ignored for IPv6, which causes problems with routing daemons accidentally deleting externally set routes (observed by multiple bird6 users). This can be verified using `ip -6 route del proto something`. Signed-off-by: Mantas Mikulėnas Signed-off-by: David S. Miller Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 36bf4c3fe4f5..9f0aa255e288 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2084,6 +2084,8 @@ static int ip6_route_del(struct fib6_config *cfg) continue; if (cfg->fc_metric && cfg->fc_metric != rt->rt6i_metric) continue; + if (cfg->fc_protocol && cfg->fc_protocol != rt->rt6i_protocol) + continue; dst_hold(&rt->dst); read_unlock_bh(&table->tb6_lock); From e2f5fb9207a6bd7101ad94e73264ac8bb9e3b87a Mon Sep 17 00:00:00 2001 From: Marcelo Ricardo Leitner Date: Thu, 23 Feb 2017 09:31:18 -0300 Subject: [PATCH 405/521] sctp: deny peeloff operation on asocs with threads sleeping on it commit dfcb9f4f99f1e9a49e43398a7bfbf56927544af1 upstream. commit 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf") attempted to avoid a BUG_ON call when the association being used for a sendmsg() is blocked waiting for more sndbuf and another thread did a peeloff operation on such asoc, moving it to another socket. As Ben Hutchings noticed, then in such case it would return without locking back the socket and would cause two unlocks in a row. Further analysis also revealed that it could allow a double free if the application managed to peeloff the asoc that is created during the sendmsg call, because then sctp_sendmsg() would try to free the asoc that was created only for that call. This patch takes another approach. It will deny the peeloff operation if there is a thread sleeping on the asoc, so this situation doesn't exist anymore. This avoids the issues described above and also honors the syscalls that are already being handled (it can be multiple sendmsg calls). Joint work with Xin Long. Fixes: 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf") Cc: Alexander Popov Cc: Ben Hutchings Signed-off-by: Marcelo Ricardo Leitner Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sctp/socket.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 138f2d667212..5758818435f3 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4422,6 +4422,12 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, struct socket **sockp) if (!asoc) return -EINVAL; + /* If there is a thread waiting on more sndbuf space for + * sending on this asoc, it cannot be peeled. + */ + if (waitqueue_active(&asoc->wait)) + return -EBUSY; + /* An association cannot be branched off from an already peeled-off * socket, nor is this supported for tcp style sockets. */ @@ -6960,8 +6966,6 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, */ release_sock(sk); current_timeo = schedule_timeout(current_timeo); - if (sk != asoc->base.sk) - goto do_error; lock_sock(sk); *timeo_p = current_timeo; From d005579766761216526caa8345d1a1993eff8e24 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 19 Apr 2017 15:14:54 +0200 Subject: [PATCH 406/521] MIPS: fix Select HAVE_IRQ_EXIT_ON_IRQ_STACK patch. Commit f017e58da4aba293e4a6ab62ca5d4801f79cc929 which was commit 3cc3434fd6307d06b53b98ce83e76bf9807689b9 upstream, was misapplied to the 4.4 stable kernel. This patch fixes this and moves the chunk to the proper Kconfig area. Reported-by: "Maciej W. Rozycki" Cc: Matt Redfearn Cc: Jason A. Donenfeld Cc: Thomas Gleixner Cc: Ralf Baechle Cc: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- arch/mips/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index d5cfa937d622..8b0424abc84c 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -1413,7 +1413,7 @@ config CPU_MIPS32_R6 select CPU_SUPPORTS_MSA select GENERIC_CSUM select HAVE_KVM - select MIPS_O32_FP64_SUPPORT if 32BIT + select MIPS_O32_FP64_SUPPORT help Choose this option to build a kernel for release 6 or later of the MIPS32 architecture. New MIPS processors, starting with the Warrior @@ -1464,7 +1464,7 @@ config CPU_MIPS64_R6 select CPU_SUPPORTS_HIGHMEM select CPU_SUPPORTS_MSA select GENERIC_CSUM - select MIPS_O32_FP64_SUPPORT if MIPS32_O32 + select MIPS_O32_FP64_SUPPORT if 32BIT || MIPS32_O32 help Choose this option to build a kernel for release 6 or later of the MIPS64 architecture. New MIPS processors, starting with the Warrior From 81af21fe95ba45261c7894b471e5d7698c4db8f1 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 21 Apr 2017 09:30:24 +0200 Subject: [PATCH 407/521] Linux 4.4.63 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 0309acc34472..ec52973043f6 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 62 +SUBLEVEL = 63 EXTRAVERSION = NAME = Blurry Fish Butt From b5737b92560efcb956d2def4dcd3f4b6d4118e58 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 18 Apr 2017 15:31:07 +0100 Subject: [PATCH 408/521] KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings commit ee8f844e3c5a73b999edf733df1c529d6503ec2f upstream. This fixes CVE-2016-9604. Keyrings whose name begin with a '.' are special internal keyrings and so userspace isn't allowed to create keyrings by this name to prevent shadowing. However, the patch that added the guard didn't fix KEYCTL_JOIN_SESSION_KEYRING. Not only can that create dot-named keyrings, it can also subscribe to them as a session keyring if they grant SEARCH permission to the user. This, for example, allows a root process to set .builtin_trusted_keys as its session keyring, at which point it has full access because now the possessor permissions are added. This permits root to add extra public keys, thereby bypassing module verification. This also affects kexec and IMA. This can be tested by (as root): keyctl session .builtin_trusted_keys keyctl add user a a @s keyctl list @s which on my test box gives me: 2 keys in keyring: 180010936: ---lswrv 0 0 asymmetric: Build time autogenerated kernel key: ae3d4a31b82daa8e1a75b49dc2bba949fd992a05 801382539: --alswrv 0 0 user: a Fix this by rejecting names beginning with a '.' in the keyctl. Signed-off-by: David Howells Acked-by: Mimi Zohar cc: linux-ima-devel@lists.sourceforge.net Signed-off-by: Greg Kroah-Hartman --- security/keys/keyctl.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 1c3872aeed14..4ffb51ff0a61 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -271,7 +271,8 @@ long keyctl_get_keyring_ID(key_serial_t id, int create) * Create and join an anonymous session keyring or join a named session * keyring, creating it if necessary. A named session keyring must have Search * permission for it to be joined. Session keyrings without this permit will - * be skipped over. + * be skipped over. It is not permitted for userspace to create or join + * keyrings whose name begin with a dot. * * If successful, the ID of the joined session keyring will be returned. */ @@ -288,12 +289,16 @@ long keyctl_join_session_keyring(const char __user *_name) ret = PTR_ERR(name); goto error; } + + ret = -EPERM; + if (name[0] == '.') + goto error_name; } /* join the session */ ret = join_session_keyring(name); +error_name: kfree(name); - error: return ret; } From eb78d987757967749d0b2e82fce0314697937ee5 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 18 Apr 2017 15:31:08 +0100 Subject: [PATCH 409/521] KEYS: Change the name of the dead type to ".dead" to prevent user access commit c1644fe041ebaf6519f6809146a77c3ead9193af upstream. This fixes CVE-2017-6951. Userspace should not be able to do things with the "dead" key type as it doesn't have some of the helper functions set upon it that the kernel needs. Attempting to use it may cause the kernel to crash. Fix this by changing the name of the type to ".dead" so that it's rejected up front on userspace syscalls by key_get_type_from_user(). Though this doesn't seem to affect recent kernels, it does affect older ones, certainly those prior to: commit c06cfb08b88dfbe13be44a69ae2fdc3a7c902d81 Author: David Howells Date: Tue Sep 16 17:36:06 2014 +0100 KEYS: Remove key_type::match in favour of overriding default by match_preparse which went in before 3.18-rc1. Signed-off-by: David Howells Signed-off-by: Greg Kroah-Hartman --- security/keys/gc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/keys/gc.c b/security/keys/gc.c index addf060399e0..9cb4fe4478a1 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -46,7 +46,7 @@ static unsigned long key_gc_flags; * immediately unlinked. */ struct key_type key_type_dead = { - .name = "dead", + .name = ".dead", }; /* From c9460fbceb2f3efa1d20050cdbffa51ec025745a Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 18 Apr 2017 15:31:09 +0100 Subject: [PATCH 410/521] KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings commit c9f838d104fed6f2f61d68164712e3204bf5271b upstream. This fixes CVE-2017-7472. Running the following program as an unprivileged user exhausts kernel memory by leaking thread keyrings: #include int main() { for (;;) keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING); } Fix it by only creating a new thread keyring if there wasn't one before. To make things more consistent, make install_thread_keyring_to_cred() and install_process_keyring_to_cred() both return 0 if the corresponding keyring is already present. Fixes: d84f4f992cbd ("CRED: Inaugurate COW credentials") Signed-off-by: Eric Biggers Signed-off-by: David Howells Signed-off-by: Greg Kroah-Hartman --- security/keys/keyctl.c | 11 ++++----- security/keys/process_keys.c | 44 ++++++++++++++++++++++-------------- 2 files changed, 31 insertions(+), 24 deletions(-) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 4ffb51ff0a61..442e350c209d 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -1228,8 +1228,8 @@ long keyctl_reject_key(key_serial_t id, unsigned timeout, unsigned error, * Read or set the default keyring in which request_key() will cache keys and * return the old setting. * - * If a process keyring is specified then this will be created if it doesn't - * yet exist. The old setting will be returned if successful. + * If a thread or process keyring is specified then it will be created if it + * doesn't yet exist. The old setting will be returned if successful. */ long keyctl_set_reqkey_keyring(int reqkey_defl) { @@ -1254,11 +1254,8 @@ long keyctl_set_reqkey_keyring(int reqkey_defl) case KEY_REQKEY_DEFL_PROCESS_KEYRING: ret = install_process_keyring_to_cred(new); - if (ret < 0) { - if (ret != -EEXIST) - goto error; - ret = 0; - } + if (ret < 0) + goto error; goto set; case KEY_REQKEY_DEFL_DEFAULT: diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c index e6d50172872f..4ed909142956 100644 --- a/security/keys/process_keys.c +++ b/security/keys/process_keys.c @@ -125,13 +125,18 @@ int install_user_keyrings(void) } /* - * Install a fresh thread keyring directly to new credentials. This keyring is - * allowed to overrun the quota. + * Install a thread keyring to the given credentials struct if it didn't have + * one already. This is allowed to overrun the quota. + * + * Return: 0 if a thread keyring is now present; -errno on failure. */ int install_thread_keyring_to_cred(struct cred *new) { struct key *keyring; + if (new->thread_keyring) + return 0; + keyring = keyring_alloc("_tid", new->uid, new->gid, new, KEY_POS_ALL | KEY_USR_VIEW, KEY_ALLOC_QUOTA_OVERRUN, NULL); @@ -143,7 +148,9 @@ int install_thread_keyring_to_cred(struct cred *new) } /* - * Install a fresh thread keyring, discarding the old one. + * Install a thread keyring to the current task if it didn't have one already. + * + * Return: 0 if a thread keyring is now present; -errno on failure. */ static int install_thread_keyring(void) { @@ -154,8 +161,6 @@ static int install_thread_keyring(void) if (!new) return -ENOMEM; - BUG_ON(new->thread_keyring); - ret = install_thread_keyring_to_cred(new); if (ret < 0) { abort_creds(new); @@ -166,17 +171,17 @@ static int install_thread_keyring(void) } /* - * Install a process keyring directly to a credentials struct. + * Install a process keyring to the given credentials struct if it didn't have + * one already. This is allowed to overrun the quota. * - * Returns -EEXIST if there was already a process keyring, 0 if one installed, - * and other value on any other error + * Return: 0 if a process keyring is now present; -errno on failure. */ int install_process_keyring_to_cred(struct cred *new) { struct key *keyring; if (new->process_keyring) - return -EEXIST; + return 0; keyring = keyring_alloc("_pid", new->uid, new->gid, new, KEY_POS_ALL | KEY_USR_VIEW, @@ -189,11 +194,9 @@ int install_process_keyring_to_cred(struct cred *new) } /* - * Make sure a process keyring is installed for the current process. The - * existing process keyring is not replaced. + * Install a process keyring to the current task if it didn't have one already. * - * Returns 0 if there is a process keyring by the end of this function, some - * error otherwise. + * Return: 0 if a process keyring is now present; -errno on failure. */ static int install_process_keyring(void) { @@ -207,14 +210,18 @@ static int install_process_keyring(void) ret = install_process_keyring_to_cred(new); if (ret < 0) { abort_creds(new); - return ret != -EEXIST ? ret : 0; + return ret; } return commit_creds(new); } /* - * Install a session keyring directly to a credentials struct. + * Install the given keyring as the session keyring of the given credentials + * struct, replacing the existing one if any. If the given keyring is NULL, + * then install a new anonymous session keyring. + * + * Return: 0 on success; -errno on failure. */ int install_session_keyring_to_cred(struct cred *cred, struct key *keyring) { @@ -249,8 +256,11 @@ int install_session_keyring_to_cred(struct cred *cred, struct key *keyring) } /* - * Install a session keyring, discarding the old one. If a keyring is not - * supplied, an empty one is invented. + * Install the given keyring as the session keyring of the current task, + * replacing the existing one if any. If the given keyring is NULL, then + * install a new anonymous session keyring. + * + * Return: 0 on success; -errno on failure. */ static int install_session_keyring(struct key *keyring) { From 1dfb1c7bd63f7ea8975f32ddd04a9c4406f8d64d Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 19 Apr 2017 12:07:08 -0400 Subject: [PATCH 411/521] tracing: Allocate the snapshot buffer before enabling probe commit df62db5be2e5f070ecd1a5ece5945b590ee112e0 upstream. Currently the snapshot trigger enables the probe and then allocates the snapshot. If the probe triggers before the allocation, it could cause the snapshot to fail and turn tracing off. It's best to allocate the snapshot buffer first, and then enable the trigger. If something goes wrong in the enabling of the trigger, the snapshot buffer is still allocated, but it can also be freed by the user by writting zero into the snapshot buffer file. Also add a check of the return status of alloc_snapshot(). Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 059233abcfcf..4c21c0b7dc91 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6060,11 +6060,13 @@ ftrace_trace_snapshot_callback(struct ftrace_hash *hash, return ret; out_reg: + ret = alloc_snapshot(&global_trace); + if (ret < 0) + goto out; + ret = register_ftrace_function_probe(glob, ops, count); - if (ret >= 0) - alloc_snapshot(&global_trace); - + out: return ret < 0 ? ret : 0; } From a2a67e53f92f9a3b5d94671b17b043eb56ec79ab Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 19 Apr 2017 14:29:46 -0400 Subject: [PATCH 412/521] ring-buffer: Have ring_buffer_iter_empty() return true when empty commit 78f7a45dac2a2d2002f98a3a95f7979867868d73 upstream. I noticed that reading the snapshot file when it is empty no longer gives a status. It suppose to show the status of the snapshot buffer as well as how to allocate and use it. For example: ># cat snapshot # tracer: nop # # # * Snapshot is allocated * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') But instead it just showed an empty buffer: ># cat snapshot # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | What happened was that it was using the ring_buffer_iter_empty() function to see if it was empty, and if it was, it showed the status. But that function was returning false when it was empty. The reason was that the iter header page was on the reader page, and the reader page was empty, but so was the buffer itself. The check only tested to see if the iter was on the commit page, but the commit page was no longer pointing to the reader page, but as all pages were empty, the buffer is also. Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/ring_buffer.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 7d7f99b0db47..1275175b0946 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3440,11 +3440,23 @@ EXPORT_SYMBOL_GPL(ring_buffer_iter_reset); int ring_buffer_iter_empty(struct ring_buffer_iter *iter) { struct ring_buffer_per_cpu *cpu_buffer; + struct buffer_page *reader; + struct buffer_page *head_page; + struct buffer_page *commit_page; + unsigned commit; cpu_buffer = iter->cpu_buffer; - return iter->head_page == cpu_buffer->commit_page && - iter->head == rb_commit_index(cpu_buffer); + /* Remember, trace recording is off when iterator is in use */ + reader = cpu_buffer->reader_page; + head_page = cpu_buffer->head_page; + commit_page = cpu_buffer->commit_page; + commit = rb_page_commit(commit_page); + + return ((iter->head_page == commit_page && iter->head == commit) || + (iter->head_page == reader && commit_page == head_page && + head_page->read == commit && + iter->head == rb_page_commit(cpu_buffer->reader_page))); } EXPORT_SYMBOL_GPL(ring_buffer_iter_empty); From f8fe51c86583ccb38263277ee471f77053a9482a Mon Sep 17 00:00:00 2001 From: Sachin Prabhu Date: Sun, 16 Apr 2017 20:37:24 +0100 Subject: [PATCH 413/521] cifs: Do not send echoes before Negotiate is complete commit 62a6cfddcc0a5313e7da3e8311ba16226fe0ac10 upstream. commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect long after socket reconnect") added support for Negotiate requests to be initiated by echo calls. To avoid delays in calling echo after a reconnect, I added the patch introduced by the commit b8c600120fc8 ("Call echo service immediately after socket reconnect"). This has however caused a regression with cifs shares which do not have support for echo calls to trigger Negotiate requests. On connections which need to call Negotiation, the echo calls trigger an error which triggers a reconnect which in turn triggers another echo call. This results in a loop which is only broken when an operation is performed on the cifs share. For an idle share, it can DOS a server. The patch uses the smb_operation can_echo() for cifs so that it is called only if connection has been already been setup. kernel bz: 194531 Signed-off-by: Sachin Prabhu Tested-by: Jonathan Liu Acked-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/smb1ops.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/cifs/smb1ops.c b/fs/cifs/smb1ops.c index fc537c29044e..87b87e091e8e 100644 --- a/fs/cifs/smb1ops.c +++ b/fs/cifs/smb1ops.c @@ -1015,6 +1015,15 @@ cifs_dir_needs_close(struct cifsFileInfo *cfile) return !cfile->srch_inf.endOfSearch && !cfile->invalidHandle; } +static bool +cifs_can_echo(struct TCP_Server_Info *server) +{ + if (server->tcpStatus == CifsGood) + return true; + + return false; +} + struct smb_version_operations smb1_operations = { .send_cancel = send_nt_cancel, .compare_fids = cifs_compare_fids, @@ -1049,6 +1058,7 @@ struct smb_version_operations smb1_operations = { .get_dfs_refer = CIFSGetDFSRefer, .qfs_tcon = cifs_qfs_tcon, .is_path_accessible = cifs_is_path_accessible, + .can_echo = cifs_can_echo, .query_path_info = cifs_query_path_info, .query_file_info = cifs_query_file_info, .get_srv_inum = cifs_get_srv_inum, From 859d615b5be1a6123b68f08233e548ee7f4e4298 Mon Sep 17 00:00:00 2001 From: Germano Percossi Date: Fri, 7 Apr 2017 12:29:37 +0100 Subject: [PATCH 414/521] CIFS: remove bad_network_name flag commit a0918f1ce6a43ac980b42b300ec443c154970979 upstream. STATUS_BAD_NETWORK_NAME can be received during node failover, causing the flag to be set and making the reconnect thread always unsuccessful, thereafter. Once the only place where it is set is removed, the remaining bits are rendered moot. Removing it does not prevent "mount" from failing when a non existent share is passed. What happens when the share really ceases to exist while the share is mounted is undefined now as much as it was before. Signed-off-by: Germano Percossi Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/cifsglob.h | 1 - fs/cifs/smb2pdu.c | 5 ----- 2 files changed, 6 deletions(-) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index b76883606e4b..94906aaa9b7c 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -906,7 +906,6 @@ struct cifs_tcon { bool use_persistent:1; /* use persistent instead of durable handles */ #ifdef CONFIG_CIFS_SMB2 bool print:1; /* set if connection to printer share */ - bool bad_network_name:1; /* set if ret status STATUS_BAD_NETWORK_NAME */ __le32 capabilities; __u32 share_flags; __u32 maximal_access; diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 6cb5c4b30e78..6cb2603f8a5c 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -932,9 +932,6 @@ SMB2_tcon(const unsigned int xid, struct cifs_ses *ses, const char *tree, else return -EIO; - if (tcon && tcon->bad_network_name) - return -ENOENT; - if ((tcon && tcon->seal) && ((ses->server->capabilities & SMB2_GLOBAL_CAP_ENCRYPTION) == 0)) { cifs_dbg(VFS, "encryption requested but no server support"); @@ -1036,8 +1033,6 @@ SMB2_tcon(const unsigned int xid, struct cifs_ses *ses, const char *tree, tcon_error_exit: if (rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) { cifs_dbg(VFS, "BAD_NETWORK_NAME: %s\n", tree); - if (tcon) - tcon->bad_network_name = true; } goto tcon_exit; } From 702db976b8574745b20492fc2a89abfa4ec2bef1 Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Sun, 9 Apr 2017 22:09:38 +0200 Subject: [PATCH 415/521] s390/mm: fix CMMA vs KSM vs others commit a8f60d1fadf7b8b54449fcc9d6b15248917478ba upstream. On heavy paging with KSM I see guest data corruption. Turns out that KSM will add pages to its tree, where the mapping return true for pte_unused (or might become as such later). KSM will unmap such pages and reinstantiate with different attributes (e.g. write protected or special, e.g. in replace_page or write_protect_page)). This uncovered a bug in our pagetable handling: We must remove the unused flag as soon as an entry becomes present again. Signed-of-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/pgtable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 024f85f947ae..e2c0e4eab037 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -829,6 +829,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, { pgste_t pgste; + if (pte_present(entry)) + pte_val(entry) &= ~_PAGE_UNUSED; if (mm_has_pgste(mm)) { pgste = pgste_get_lock(ptep); pgste_val(pgste) &= ~_PGSTE_GPS_ZERO; From 5ab982a01201749f49b5a6a23b45b20a03490ce5 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Fri, 3 Jun 2016 17:09:24 -0700 Subject: [PATCH 416/521] Drivers: hv: don't leak memory in vmbus_establish_gpadl() commit 7cc80c98070ccc7940fc28811c92cca0a681015d upstream. In some cases create_gpadl_header() allocates submessages but we never free them. [sumits] Note for stable: Upstream commit 4d63763296ab7865a98bc29cc7d77145815ef89f: (Drivers: hv: get rid of redundant messagecount in create_gpadl_header()) changes the list usage to initialize list header in all cases; that patch isn't added to stable, so the current patch is modified a little bit from the upstream commit to check if the list is valid or not. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/hv/channel.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index 1ef37c727572..49d244929c4d 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -375,7 +375,7 @@ int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer, struct vmbus_channel_gpadl_header *gpadlmsg; struct vmbus_channel_gpadl_body *gpadl_body; struct vmbus_channel_msginfo *msginfo = NULL; - struct vmbus_channel_msginfo *submsginfo; + struct vmbus_channel_msginfo *submsginfo, *tmp; u32 msgcount; struct list_head *curr; u32 next_gpadl_handle; @@ -437,6 +437,13 @@ int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer, list_del(&msginfo->msglistentry); spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags); + if (msgcount > 1) { + list_for_each_entry_safe(submsginfo, tmp, &msginfo->submsglist, + msglistentry) { + kfree(submsginfo); + } + } + kfree(msginfo); return ret; } From 567dd48c4e71a8d6d3014adb153993ef8608722c Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Thu, 9 Jun 2016 17:08:56 -0700 Subject: [PATCH 417/521] Drivers: hv: get rid of timeout in vmbus_open() commit 396e287fa2ff46e83ae016cdcb300c3faa3b02f6 upstream. vmbus_teardown_gpadl() can result in infinite wait when it is called on 5 second timeout in vmbus_open(). The issue is caused by the fact that gpadl teardown operation won't ever succeed for an opened channel and the timeout isn't always enough. As a guest, we can always trust the host to respond to our request (and there is nothing we can do if it doesn't). Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/hv/channel.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index 49d244929c4d..d037454fe7b8 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -73,7 +73,6 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 send_ringbuffer_size, void *in, *out; unsigned long flags; int ret, err = 0; - unsigned long t; struct page *page; spin_lock_irqsave(&newchannel->lock, flags); @@ -183,11 +182,7 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 send_ringbuffer_size, goto error1; } - t = wait_for_completion_timeout(&open_info->waitevent, 5*HZ); - if (t == 0) { - err = -ETIMEDOUT; - goto error1; - } + wait_for_completion(&open_info->waitevent); spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags); list_del(&open_info->msglistentry); From f803416632b5c31e647cf18861e4c379173a02e2 Mon Sep 17 00:00:00 2001 From: "K. Y. Srinivasan" Date: Fri, 1 Jul 2016 16:26:36 -0700 Subject: [PATCH 418/521] Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg() commit 8de0d7e951826d7592e0ba1da655b175c4aa0923 upstream. The current delay between retries is unnecessarily high and is negatively affecting the time it takes to boot the system. Signed-off-by: K. Y. Srinivasan Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/hv/connection.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c index 4fc2e8836e60..2bbc53025549 100644 --- a/drivers/hv/connection.c +++ b/drivers/hv/connection.c @@ -429,7 +429,7 @@ int vmbus_post_msg(void *buffer, size_t buflen) union hv_connection_id conn_id; int ret = 0; int retries = 0; - u32 msec = 1; + u32 usec = 1; conn_id.asu32 = 0; conn_id.u.id = VMBUS_MESSAGE_CONNECTION_ID; @@ -462,9 +462,9 @@ int vmbus_post_msg(void *buffer, size_t buflen) } retries++; - msleep(msec); - if (msec < 2048) - msec *= 2; + udelay(usec); + if (usec < 2048) + usec *= 2; } return ret; } From 8d5ed79fb2d766e57b5220a3e6054d99f1c985d0 Mon Sep 17 00:00:00 2001 From: Jorgen Hansen Date: Tue, 5 Apr 2016 01:59:32 -0700 Subject: [PATCH 419/521] VSOCK: Detach QP check should filter out non matching QPs. commit 8ab18d71de8b07d2c4d6f984b718418c09ea45c5 upstream. The check in vmci_transport_peer_detach_cb should only allow a detach when the qp handle of the transport matches the one in the detach message. Testing: Before this change, a detach from a peer on a different socket would cause an active stream socket to register a detach. Reviewed-by: George Zhang Signed-off-by: Jorgen Hansen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/vmw_vsock/vmci_transport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index 0a369bb440e7..662bdd20a748 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -842,7 +842,7 @@ static void vmci_transport_peer_detach_cb(u32 sub_id, * qp_handle. */ if (vmci_handle_is_invalid(e_payload->handle) || - vmci_handle_is_equal(trans->qp_handle, e_payload->handle)) + !vmci_handle_is_equal(trans->qp_handle, e_payload->handle)) return; /* We don't ask for delayed CBs when we subscribe to this event (we @@ -2154,7 +2154,7 @@ module_exit(vmci_transport_exit); MODULE_AUTHOR("VMware, Inc."); MODULE_DESCRIPTION("VMCI transport for Virtual Sockets"); -MODULE_VERSION("1.0.2.0-k"); +MODULE_VERSION("1.0.3.0-k"); MODULE_LICENSE("GPL v2"); MODULE_ALIAS("vmware_vsock"); MODULE_ALIAS_NETPROTO(PF_VSOCK); From cdede60d6a308a311b0999b826fa4cc5261632c8 Mon Sep 17 00:00:00 2001 From: Thorsten Leemhuis Date: Tue, 18 Apr 2017 11:14:28 -0700 Subject: [PATCH 420/521] Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled commit 704de489e0e3640a2ee2d0daf173e9f7375582ba upstream. Temporary got a Lifebook E547 into my hands and noticed the touchpad only works after running: echo "1" > /sys/devices/platform/i8042/serio2/crc_enabled Add it to the list of machines that need this workaround. Signed-off-by: Thorsten Leemhuis Reviewed-by: Ulrik De Bie Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elantech.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/input/mouse/elantech.c b/drivers/input/mouse/elantech.c index 43482ae1e049..1a2b2620421e 100644 --- a/drivers/input/mouse/elantech.c +++ b/drivers/input/mouse/elantech.c @@ -1122,6 +1122,7 @@ static int elantech_get_resolution_v4(struct psmouse *psmouse, * Asus UX32VD 0x361f02 00, 15, 0e clickpad * Avatar AVIU-145A2 0x361f00 ? clickpad * Fujitsu LIFEBOOK E544 0x470f00 d0, 12, 09 2 hw buttons + * Fujitsu LIFEBOOK E547 0x470f00 50, 12, 09 2 hw buttons * Fujitsu LIFEBOOK E554 0x570f01 40, 14, 0c 2 hw buttons * Fujitsu T725 0x470f01 05, 12, 09 2 hw buttons * Fujitsu H730 0x570f00 c0, 14, 0c 3 hw buttons (**) @@ -1527,6 +1528,13 @@ static const struct dmi_system_id elantech_dmi_force_crc_enabled[] = { DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK E544"), }, }, + { + /* Fujitsu LIFEBOOK E547 does not work with crc_enabled == 0 */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK E547"), + }, + }, { /* Fujitsu LIFEBOOK E554 does not work with crc_enabled == 0 */ .matches = { From 6986d0d29f3cda9f558461202d86464403454574 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 19 Apr 2017 19:47:04 +0200 Subject: [PATCH 421/521] ACPI / power: Avoid maybe-uninitialized warning commit fe8c470ab87d90e4b5115902dd94eced7e3305c3 upstream. gcc -O2 cannot always prove that the loop in acpi_power_get_inferred_state() is enterered at least once, so it assumes that cur_state might not get initialized: drivers/acpi/power.c: In function 'acpi_power_get_inferred_state': drivers/acpi/power.c:222:9: error: 'cur_state' may be used uninitialized in this function [-Werror=maybe-uninitialized] This sets the variable to zero at the start of the loop, to ensure that there is well-defined behavior even for an empty list. This gets rid of the warning. The warning first showed up when the -Os flag got removed in a bug fix patch in linux-4.11-rc5. I would suggest merging this addon patch on top of that bug fix to avoid introducing a new warning in the stable kernels. Fixes: 61b79e16c68d (ACPI: Fix incompatibility with mcount-based function graph tracing) Signed-off-by: Arnd Bergmann Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/power.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/acpi/power.c b/drivers/acpi/power.c index fcd4ce6f78d5..1c2b846c5776 100644 --- a/drivers/acpi/power.c +++ b/drivers/acpi/power.c @@ -200,6 +200,7 @@ static int acpi_power_get_list_state(struct list_head *list, int *state) return -EINVAL; /* The state of the list is 'on' IFF all resources are 'on'. */ + cur_state = 0; list_for_each_entry(entry, list, node) { struct acpi_power_resource *resource = entry->resource; acpi_handle handle = resource->device.handle; From b74ba9dd91e53a3f182e7a2a9cefe709743c7a5f Mon Sep 17 00:00:00 2001 From: Haibo Chen Date: Wed, 19 Apr 2017 10:53:51 +0800 Subject: [PATCH 422/521] mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card commit 9f327845358d3dd0d8a5a7a5436b0aa5c432e757 upstream. Currently for DDR50 card, it need tuning in default. We meet tuning fail issue for DDR50 card and some data CRC error when DDR50 sd card works. This is because the default pad I/O drive strength can't make sure DDR50 card work stable. So increase the pad I/O drive strength for DDR50 card, and use pins_100mhz. This fixes DDR50 card support for IMX since DDR50 tuning was enabled from commit 9faac7b95ea4 ("mmc: sdhci: enable tuning for DDR50") Tested-and-reported-by: Tim Harvey Signed-off-by: Haibo Chen Acked-by: Dong Aisheng Acked-by: Adrian Hunter Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci-esdhc-imx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c index 1f1582f6cccb..8d838779fd1b 100644 --- a/drivers/mmc/host/sdhci-esdhc-imx.c +++ b/drivers/mmc/host/sdhci-esdhc-imx.c @@ -804,6 +804,7 @@ static int esdhc_change_pinstate(struct sdhci_host *host, switch (uhs) { case MMC_TIMING_UHS_SDR50: + case MMC_TIMING_UHS_DDR50: pinctrl = imx_data->pins_100mhz; break; case MMC_TIMING_UHS_SDR104: From b812c69019e421cf9b4fd7e57747e73e2b83c741 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 20 Apr 2017 21:32:16 +0200 Subject: [PATCH 423/521] mac80211: reject ToDS broadcast data frames commit 3018e947d7fd536d57e2b550c33e456d921fff8c upstream. AP/AP_VLAN modes don't accept any real 802.11 multicast data frames, but since they do need to accept broadcast management frames the same is currently permitted for data frames. This opens a security problem because such frames would be decrypted with the GTK, and could even contain unicast L3 frames. Since the spec says that ToDS frames must always have the BSSID as the RA (addr1), reject any other data frames. The problem was originally reported in "Predicting, Decrypting, and Abusing WPA2/802.11 Group Keys" at usenix https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/vanhoef and brought to my attention by Jouni. Reported-by: Jouni Malinen Signed-off-by: Johannes Berg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman -- --- net/mac80211/rx.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 2b528389409f..9f0915f72702 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -3396,6 +3396,27 @@ static bool ieee80211_accept_frame(struct ieee80211_rx_data *rx) !ether_addr_equal(bssid, hdr->addr1)) return false; } + + /* + * 802.11-2016 Table 9-26 says that for data frames, A1 must be + * the BSSID - we've checked that already but may have accepted + * the wildcard (ff:ff:ff:ff:ff:ff). + * + * It also says: + * The BSSID of the Data frame is determined as follows: + * a) If the STA is contained within an AP or is associated + * with an AP, the BSSID is the address currently in use + * by the STA contained in the AP. + * + * So we should not accept data frames with an address that's + * multicast. + * + * Accepting it also opens a security problem because stations + * could encrypt it with the GTK and inject traffic that way. + */ + if (ieee80211_is_data(hdr->frame_control) && multicast) + return false; + return true; case NL80211_IFTYPE_WDS: if (bssid || !ieee80211_is_data(hdr->frame_control)) From 38be91ce7ea86386242a295230a517772f359df1 Mon Sep 17 00:00:00 2001 From: Sebastian Siewior Date: Wed, 22 Feb 2017 17:15:21 +0100 Subject: [PATCH 424/521] ubi/upd: Always flush after prepared for an update commit 9cd9a21ce070be8a918ffd3381468315a7a76ba6 upstream. In commit 6afaf8a484cb ("UBI: flush wl before clearing update marker") I managed to trigger and fix a similar bug. Now here is another version of which I assumed it wouldn't matter back then but it turns out UBI has a check for it and will error out like this: |ubi0 warning: validate_vid_hdr: inconsistent used_ebs |ubi0 error: validate_vid_hdr: inconsistent VID header at PEB 592 All you need to trigger this is? "ubiupdatevol /dev/ubi0_0 file" + a powercut in the middle of the operation. ubi_start_update() sets the update-marker and puts all EBs on the erase list. After that userland can proceed to write new data while the old EB aren't erased completely. A powercut at this point is usually not that much of a tragedy. UBI won't give read access to the static volume because it has the update marker. It will most likely set the corrupted flag because it misses some EBs. So we are all good. Unless the size of the image that has been written differs from the old image in the magnitude of at least one EB. In that case UBI will find two different values for `used_ebs' and refuse to attach the image with the error message mentioned above. So in order not to get in the situation, the patch will ensure that we wait until everything is removed before it tries to write any data. The alternative would be to detect such a case and remove all EBs at the attached time after we processed the volume-table and see the update-marker set. The patch looks bigger and I doubt it is worth it since usually the write() will wait from time to time for a new EB since usually there not that many spare EB that can be used. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Richard Weinberger Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/ubi/upd.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/mtd/ubi/upd.c b/drivers/mtd/ubi/upd.c index 0134ba32a057..39712560b4c1 100644 --- a/drivers/mtd/ubi/upd.c +++ b/drivers/mtd/ubi/upd.c @@ -148,11 +148,11 @@ int ubi_start_update(struct ubi_device *ubi, struct ubi_volume *vol, return err; } - if (bytes == 0) { - err = ubi_wl_flush(ubi, UBI_ALL, UBI_ALL); - if (err) - return err; + err = ubi_wl_flush(ubi, UBI_ALL, UBI_ALL); + if (err) + return err; + if (bytes == 0) { err = clear_update_marker(ubi, vol, 0); if (err) return err; From 6c107bba66dcc696ec56482e7c07fccfb00d75c4 Mon Sep 17 00:00:00 2001 From: Ravi Bangoria Date: Tue, 11 Apr 2017 10:38:13 +0530 Subject: [PATCH 425/521] powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction commit 9e1ba4f27f018742a1aa95d11e35106feba08ec1 upstream. If we set a kprobe on a 'stdu' instruction on powerpc64, we see a kernel OOPS: Bad kernel stack pointer cd93c840 at c000000000009868 Oops: Bad kernel stack pointer, sig: 6 [#1] ... GPR00: c000001fcd93cb30 00000000cd93c840 c0000000015c5e00 00000000cd93c840 ... NIP [c000000000009868] resume_kernel+0x2c/0x58 LR [c000000000006208] program_check_common+0x108/0x180 On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does not emulate actual store in emulate_step() because it may corrupt the exception frame. So the kernel does the actual store operation in exception return code i.e. resume_kernel(). resume_kernel() loads the saved stack pointer from memory using lwz, which only loads the low 32-bits of the address, causing the kernel crash. Fix this by loading the 64-bit value instead. Fixes: be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()") Signed-off-by: Ravi Bangoria Reviewed-by: Naveen N. Rao Reviewed-by: Ananth N Mavinakayanahalli [mpe: Change log massage, add stable tag] Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/entry_64.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index edba294620db..f6fd0332c3a2 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -716,7 +716,7 @@ resume_kernel: addi r8,r1,INT_FRAME_SIZE /* Get the kprobed function entry */ - lwz r3,GPR1(r1) + ld r3,GPR1(r1) subi r3,r3,INT_FRAME_SIZE /* dst: Allocate a trampoline exception frame */ mr r4,r1 /* src: current exception frame */ mr r1,r3 /* Reroute the trampoline frame to r1 */ @@ -730,8 +730,8 @@ resume_kernel: addi r6,r6,8 bdnz 2b - /* Do real store operation to complete stwu */ - lwz r5,GPR1(r1) + /* Do real store operation to complete stdu */ + ld r5,GPR1(r1) std r8,0(r5) /* Clear _TIF_EMULATE_STACK_STORE flag */ From e2587fba99118f2f4506b37b4766d7a4cca1465e Mon Sep 17 00:00:00 2001 From: Yazen Ghannam Date: Thu, 30 Mar 2017 13:17:14 +0200 Subject: [PATCH 426/521] x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs commit 29f72ce3e4d18066ec75c79c857bee0618a3504b upstream. MCA bank 3 is reserved on systems pre-Fam17h, so it didn't have a name. However, MCA bank 3 is defined on Fam17h systems and can be accessed using legacy MSRs. Without a name we get a stack trace on Fam17h systems when trying to register sysfs files for bank 3 on kernels that don't recognize Scalable MCA. Call MCA bank 3 "decode_unit" since this is what it represents on Fam17h. This will allow kernels without SMCA support to see this bank on Fam17h+ and prevent the stack trace. This will not affect older systems since this bank is reserved on them, i.e. it'll be ignored. Tested on AMD Fam15h and Fam17h systems. WARNING: CPU: 26 PID: 1 at lib/kobject.c:210 kobject_add_internal kobject: (ffff88085bb256c0): attempted to be registered with empty name! ... Call Trace: kobject_add_internal kobject_add kobject_create_and_add threshold_create_device threshold_init_device Signed-off-by: Yazen Ghannam Signed-off-by: Borislav Petkov Link: http://lkml.kernel.org/r/1490102285-3659-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index e99b15077e94..62aca448726a 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -53,7 +53,7 @@ static const char * const th_names[] = { "load_store", "insn_fetch", "combined_unit", - "", + "decode_unit", "northbridge", "execution_unit", }; From 2a60bb635236ead6437b37a5b4085da1e33962b1 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Mon, 3 Apr 2017 15:12:43 +0100 Subject: [PATCH 427/521] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd commit 8b3405e345b5a098101b0c31b264c812bba045d9 upstream. In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling unmap_stage2_range() on the entire memory range for the guest. This could cause problems with other callers (e.g, munmap on a memslot) trying to unmap a range. And since we have to unmap the entire Guest memory range holding a spinlock, make sure we yield the lock if necessary, after we unmap each PUD range. Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") Cc: Paolo Bonzini Cc: Marc Zyngier Cc: Christoffer Dall Cc: Mark Rutland Signed-off-by: Suzuki K Poulose [ Avoid vCPU starvation and lockup detector warnings ] Signed-off-by: Marc Zyngier Signed-off-by: Suzuki K Poulose Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- arch/arm/kvm/mmu.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index f91ee2f27b41..01cf10556081 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -300,6 +300,14 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp, next = kvm_pgd_addr_end(addr, end); if (!pgd_none(*pgd)) unmap_puds(kvm, pgd, addr, next); + /* + * If we are dealing with a large range in + * stage2 table, release the kvm->mmu_lock + * to prevent starvation and lockup detector + * warnings. + */ + if (kvm && (next != end)) + cond_resched_lock(&kvm->mmu_lock); } while (pgd++, addr = next, addr != end); } @@ -738,6 +746,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) */ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) { + assert_spin_locked(&kvm->mmu_lock); unmap_range(kvm, kvm->arch.pgd, start, size); } @@ -824,7 +833,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm) if (kvm->arch.pgd == NULL) return; + spin_lock(&kvm->mmu_lock); unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); + spin_unlock(&kvm->mmu_lock); + kvm_free_hwpgd(kvm_get_hwpgd(kvm)); if (KVM_PREALLOC_LEVEL > 0) kfree(kvm->arch.pgd); From 397488e09bf2670c841b9f9d8652ce5dd1c952f4 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Wed, 6 Jul 2016 18:24:10 -0700 Subject: [PATCH 428/521] Tools: hv: kvp: ensure kvp device fd is closed on exec commit 26840437cbd6d3625ea6ab34e17cd34bb810c861 upstream. KVP daemon does fork()/exec() (with popen()) so we need to close our fds to avoid sharing them with child processes. The immediate implication of not doing so I see is SELinux complaining about 'ip' trying to access '/dev/vmbus/hv_kvp'. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- tools/hv/hv_kvp_daemon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c index 0d9f48ec42bb..bc7adb84e679 100644 --- a/tools/hv/hv_kvp_daemon.c +++ b/tools/hv/hv_kvp_daemon.c @@ -1433,7 +1433,7 @@ int main(int argc, char *argv[]) openlog("KVP", 0, LOG_USER); syslog(LOG_INFO, "KVP starting; pid is:%d", getpid()); - kvp_fd = open("/dev/vmbus/hv_kvp", O_RDWR); + kvp_fd = open("/dev/vmbus/hv_kvp", O_RDWR | O_CLOEXEC); if (kvp_fd < 0) { syslog(LOG_ERR, "open /dev/vmbus/hv_kvp failed; error: %d %s", From 8e7a6dbc3b71f37fc0167dde0e7676b4cdde1963 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Wed, 24 Aug 2016 16:23:09 -0700 Subject: [PATCH 429/521] Drivers: hv: balloon: keep track of where ha_region starts commit 7cf3b79ec85ee1a5bbaaf936bb1d050dc652983b upstream. Windows 2012 (non-R2) does not specify hot add region in hot add requests and the logic in hot_add_req() is trying to find a 128Mb-aligned region covering the request. It may also happen that host's requests are not 128Mb aligned and the created ha_region will start before the first specified PFN. We can't online these non-present pages but we don't remember the real start of the region. This is a regression introduced by the commit 5abbbb75d733 ("Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural"). While the idea of keeping the 'moving window' was wrong (as there is no guarantee that hot add requests come ordered) we should still keep track of covered_start_pfn. This is not a revert, the logic is different. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/hv/hv_balloon.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 43af91362be5..1542d894fd8f 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -430,13 +430,14 @@ struct dm_info_msg { * currently hot added. We hot add in multiples of 128M * chunks; it is possible that we may not be able to bring * online all the pages in the region. The range - * covered_end_pfn defines the pages that can + * covered_start_pfn:covered_end_pfn defines the pages that can * be brough online. */ struct hv_hotadd_state { struct list_head list; unsigned long start_pfn; + unsigned long covered_start_pfn; unsigned long covered_end_pfn; unsigned long ha_end_pfn; unsigned long end_pfn; @@ -682,7 +683,8 @@ static void hv_online_page(struct page *pg) list_for_each(cur, &dm_device.ha_region_list) { has = list_entry(cur, struct hv_hotadd_state, list); - cur_start_pgp = (unsigned long)pfn_to_page(has->start_pfn); + cur_start_pgp = (unsigned long) + pfn_to_page(has->covered_start_pfn); cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn); if (((unsigned long)pg >= cur_start_pgp) && @@ -854,6 +856,7 @@ static unsigned long process_hot_add(unsigned long pg_start, list_add_tail(&ha_region->list, &dm_device.ha_region_list); ha_region->start_pfn = rg_start; ha_region->ha_end_pfn = rg_start; + ha_region->covered_start_pfn = pg_start; ha_region->covered_end_pfn = pg_start; ha_region->end_pfn = rg_start + rg_size; } From 03e2fb9b5ce80aa9ff6384384f5fbde156550971 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Wed, 24 Aug 2016 16:23:10 -0700 Subject: [PATCH 430/521] Drivers: hv: balloon: account for gaps in hot add regions commit cb7a5724c7e1bfb5766ad1c3beba14cc715991cf upstream. I'm observing the following hot add requests from the WS2012 host: hot_add_req: start_pfn = 0x108200 count = 330752 hot_add_req: start_pfn = 0x158e00 count = 193536 hot_add_req: start_pfn = 0x188400 count = 239616 As the host doesn't specify hot add regions we're trying to create 128Mb-aligned region covering the first request, we create the 0x108000 - 0x160000 region and we add 0x108000 - 0x158e00 memory. The second request passes the pfn_covered() check, we enlarge the region to 0x108000 - 0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the third request as it starts at 0x188400 so there is a 0x200 gap which is not covered. As the end of our region is 0x190000 now it again passes the pfn_covered() check were we just adjust the covered_end_pfn and make it 0x188400 instead of 0x188200 which means that we'll try to online 0x188200-0x188400 pages but these pages were never assigned to us and we crash. We can't react to such requests by creating new hot add regions as it may happen that the whole suggested range falls into the previously identified 128Mb-aligned area so we'll end up adding nothing or create intersecting regions and our current logic doesn't allow that. Instead, create a list of such 'gaps' and check for them in the page online callback. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/hv/hv_balloon.c | 131 ++++++++++++++++++++++++++++------------ 1 file changed, 94 insertions(+), 37 deletions(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 1542d894fd8f..354da7f207b7 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -441,6 +441,16 @@ struct hv_hotadd_state { unsigned long covered_end_pfn; unsigned long ha_end_pfn; unsigned long end_pfn; + /* + * A list of gaps. + */ + struct list_head gap_list; +}; + +struct hv_hotadd_gap { + struct list_head list; + unsigned long start_pfn; + unsigned long end_pfn; }; struct balloon_state { @@ -596,18 +606,46 @@ static struct notifier_block hv_memory_nb = { .priority = 0 }; +/* Check if the particular page is backed and can be onlined and online it. */ +static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg) +{ + unsigned long cur_start_pgp; + unsigned long cur_end_pgp; + struct hv_hotadd_gap *gap; -static void hv_bring_pgs_online(unsigned long start_pfn, unsigned long size) + cur_start_pgp = (unsigned long)pfn_to_page(has->covered_start_pfn); + cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn); + + /* The page is not backed. */ + if (((unsigned long)pg < cur_start_pgp) || + ((unsigned long)pg >= cur_end_pgp)) + return; + + /* Check for gaps. */ + list_for_each_entry(gap, &has->gap_list, list) { + cur_start_pgp = (unsigned long) + pfn_to_page(gap->start_pfn); + cur_end_pgp = (unsigned long) + pfn_to_page(gap->end_pfn); + if (((unsigned long)pg >= cur_start_pgp) && + ((unsigned long)pg < cur_end_pgp)) { + return; + } + } + + /* This frame is currently backed; online the page. */ + __online_page_set_limits(pg); + __online_page_increment_counters(pg); + __online_page_free(pg); +} + +static void hv_bring_pgs_online(struct hv_hotadd_state *has, + unsigned long start_pfn, unsigned long size) { int i; - for (i = 0; i < size; i++) { - struct page *pg; - pg = pfn_to_page(start_pfn + i); - __online_page_set_limits(pg); - __online_page_increment_counters(pg); - __online_page_free(pg); - } + for (i = 0; i < size; i++) + hv_page_online_one(has, pfn_to_page(start_pfn + i)); } static void hv_mem_hot_add(unsigned long start, unsigned long size, @@ -684,26 +722,24 @@ static void hv_online_page(struct page *pg) list_for_each(cur, &dm_device.ha_region_list) { has = list_entry(cur, struct hv_hotadd_state, list); cur_start_pgp = (unsigned long) - pfn_to_page(has->covered_start_pfn); - cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn); + pfn_to_page(has->start_pfn); + cur_end_pgp = (unsigned long)pfn_to_page(has->end_pfn); - if (((unsigned long)pg >= cur_start_pgp) && - ((unsigned long)pg < cur_end_pgp)) { - /* - * This frame is currently backed; online the - * page. - */ - __online_page_set_limits(pg); - __online_page_increment_counters(pg); - __online_page_free(pg); - } + /* The page belongs to a different HAS. */ + if (((unsigned long)pg < cur_start_pgp) || + ((unsigned long)pg >= cur_end_pgp)) + continue; + + hv_page_online_one(has, pg); + break; } } -static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt) +static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt) { struct list_head *cur; struct hv_hotadd_state *has; + struct hv_hotadd_gap *gap; unsigned long residual, new_inc; if (list_empty(&dm_device.ha_region_list)) @@ -718,6 +754,24 @@ static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt) */ if (start_pfn < has->start_pfn || start_pfn >= has->end_pfn) continue; + + /* + * If the current start pfn is not where the covered_end + * is, create a gap and update covered_end_pfn. + */ + if (has->covered_end_pfn != start_pfn) { + gap = kzalloc(sizeof(struct hv_hotadd_gap), GFP_ATOMIC); + if (!gap) + return -ENOMEM; + + INIT_LIST_HEAD(&gap->list); + gap->start_pfn = has->covered_end_pfn; + gap->end_pfn = start_pfn; + list_add_tail(&gap->list, &has->gap_list); + + has->covered_end_pfn = start_pfn; + } + /* * If the current hot add-request extends beyond * our current limit; extend it. @@ -734,19 +788,10 @@ static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt) has->end_pfn += new_inc; } - /* - * If the current start pfn is not where the covered_end - * is, update it. - */ - - if (has->covered_end_pfn != start_pfn) - has->covered_end_pfn = start_pfn; - - return true; - + return 1; } - return false; + return 0; } static unsigned long handle_pg_range(unsigned long pg_start, @@ -785,6 +830,8 @@ static unsigned long handle_pg_range(unsigned long pg_start, if (pgs_ol > pfn_cnt) pgs_ol = pfn_cnt; + has->covered_end_pfn += pgs_ol; + pfn_cnt -= pgs_ol; /* * Check if the corresponding memory block is already * online by checking its last previously backed page. @@ -793,10 +840,8 @@ static unsigned long handle_pg_range(unsigned long pg_start, */ if (start_pfn > has->start_pfn && !PageReserved(pfn_to_page(start_pfn - 1))) - hv_bring_pgs_online(start_pfn, pgs_ol); + hv_bring_pgs_online(has, start_pfn, pgs_ol); - has->covered_end_pfn += pgs_ol; - pfn_cnt -= pgs_ol; } if ((has->ha_end_pfn < has->end_pfn) && (pfn_cnt > 0)) { @@ -834,13 +879,19 @@ static unsigned long process_hot_add(unsigned long pg_start, unsigned long rg_size) { struct hv_hotadd_state *ha_region = NULL; + int covered; if (pfn_cnt == 0) return 0; - if (!dm_device.host_specified_ha_region) - if (pfn_covered(pg_start, pfn_cnt)) + if (!dm_device.host_specified_ha_region) { + covered = pfn_covered(pg_start, pfn_cnt); + if (covered < 0) + return 0; + + if (covered) goto do_pg_range; + } /* * If the host has specified a hot-add range; deal with it first. @@ -852,6 +903,7 @@ static unsigned long process_hot_add(unsigned long pg_start, return 0; INIT_LIST_HEAD(&ha_region->list); + INIT_LIST_HEAD(&ha_region->gap_list); list_add_tail(&ha_region->list, &dm_device.ha_region_list); ha_region->start_pfn = rg_start; @@ -1584,6 +1636,7 @@ static int balloon_remove(struct hv_device *dev) struct hv_dynmem_device *dm = hv_get_drvdata(dev); struct list_head *cur, *tmp; struct hv_hotadd_state *has; + struct hv_hotadd_gap *gap, *tmp_gap; if (dm->num_pages_ballooned != 0) pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned); @@ -1600,6 +1653,10 @@ static int balloon_remove(struct hv_device *dev) #endif list_for_each_safe(cur, tmp, &dm->ha_region_list) { has = list_entry(cur, struct hv_hotadd_state, list); + list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) { + list_del(&gap->list); + kfree(gap); + } list_del(&has->list); kfree(has); } From 5693f3fb5a662ab0ab1f8ad3a0e13c820c4c47dc Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Wed, 7 Dec 2016 01:16:27 -0800 Subject: [PATCH 431/521] hv: don't reset hv_context.tsc_page on crash commit 56ef6718a1d8d77745033c5291e025ce18504159 upstream. It may happen that secondary CPUs are still alive and resetting hv_context.tsc_page will cause a consequent crash in read_hv_clock_tsc() as we don't check for it being not NULL there. It is safe as we're not freeing this page anyways. Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan Signed-off-by: Sumit Semwal Signed-off-by: Greg Kroah-Hartman --- drivers/hv/hv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c index ddbf7e7e0d98..8ce1f2e22912 100644 --- a/drivers/hv/hv.c +++ b/drivers/hv/hv.c @@ -305,9 +305,10 @@ void hv_cleanup(bool crash) hypercall_msr.as_uint64 = 0; wrmsrl(HV_X64_MSR_REFERENCE_TSC, hypercall_msr.as_uint64); - if (!crash) + if (!crash) { vfree(hv_context.tsc_page); - hv_context.tsc_page = NULL; + hv_context.tsc_page = NULL; + } } #endif } From d1cc3cdd39e90e70ee5fc9a766ad3a6c61426fa8 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 6 Apr 2017 09:04:31 -0700 Subject: [PATCH 432/521] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions commit 11e63f6d920d6f2dfd3cd421e939a4aec9a58dcd upstream. Before we rework the "pmem api" to stop abusing __copy_user_nocache() for memcpy_to_pmem() we need to fix cases where we may strand dirty data in the cpu cache. The problem occurs when copy_from_iter_pmem() is used for arbitrary data transfers from userspace. There is no guarantee that these transfers, performed by dax_iomap_actor(), will have aligned destinations or aligned transfer lengths. Backstop the usage __copy_user_nocache() with explicit cache management in these unaligned cases. Yes, copy_from_iter_pmem() is now too big for an inline, but addressing that is saved for a later patch that moves the entirety of the "pmem api" into the pmem driver directly. Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()") Cc: Cc: Jan Kara Cc: Jeff Moyer Cc: Ingo Molnar Cc: Christoph Hellwig Cc: "H. Peter Anvin" Cc: Al Viro Cc: Thomas Gleixner Cc: Matthew Wilcox Reviewed-by: Ross Zwisler Signed-off-by: Toshi Kani Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pmem.h | 45 ++++++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h index d8ce3ec816ab..bd8ce6bcdfc9 100644 --- a/arch/x86/include/asm/pmem.h +++ b/arch/x86/include/asm/pmem.h @@ -72,8 +72,8 @@ static inline void arch_wmb_pmem(void) * @size: number of bytes to write back * * Write back a cache range using the CLWB (cache line write back) - * instruction. This function requires explicit ordering with an - * arch_wmb_pmem() call. This API is internal to the x86 PMEM implementation. + * instruction. Note that @size is internally rounded up to be cache + * line size aligned. */ static inline void __arch_wb_cache_pmem(void *vaddr, size_t size) { @@ -87,15 +87,6 @@ static inline void __arch_wb_cache_pmem(void *vaddr, size_t size) clwb(p); } -/* - * copy_from_iter_nocache() on x86 only uses non-temporal stores for iovec - * iterators, so for other types (bvec & kvec) we must do a cache write-back. - */ -static inline bool __iter_needs_pmem_wb(struct iov_iter *i) -{ - return iter_is_iovec(i) == false; -} - /** * arch_copy_from_iter_pmem - copy data from an iterator to PMEM * @addr: PMEM destination address @@ -114,8 +105,36 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes, /* TODO: skip the write-back by always using non-temporal stores */ len = copy_from_iter_nocache(vaddr, bytes, i); - if (__iter_needs_pmem_wb(i)) - __arch_wb_cache_pmem(vaddr, bytes); + /* + * In the iovec case on x86_64 copy_from_iter_nocache() uses + * non-temporal stores for the bulk of the transfer, but we need + * to manually flush if the transfer is unaligned. A cached + * memory copy is used when destination or size is not naturally + * aligned. That is: + * - Require 8-byte alignment when size is 8 bytes or larger. + * - Require 4-byte alignment when size is 4 bytes. + * + * In the non-iovec case the entire destination needs to be + * flushed. + */ + if (iter_is_iovec(i)) { + unsigned long flushed, dest = (unsigned long) addr; + + if (bytes < 8) { + if (!IS_ALIGNED(dest, 4) || (bytes != 4)) + __arch_wb_cache_pmem(addr, 1); + } else { + if (!IS_ALIGNED(dest, 8)) { + dest = ALIGN(dest, boot_cpu_data.x86_clflush_size); + __arch_wb_cache_pmem(addr, 1); + } + + flushed = dest - (unsigned long) addr; + if (bytes > flushed && !IS_ALIGNED(bytes - flushed, 8)) + __arch_wb_cache_pmem(addr + bytes - 1, 1); + } + } else + __arch_wb_cache_pmem(addr, bytes); return len; } From 6ddbac9aa800b99761a05ba520d62a0d8063a5b8 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 29 Dec 2015 14:02:29 -0800 Subject: [PATCH 433/521] block: fix del_gendisk() vs blkdev_ioctl crash commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream. When tearing down a block device early in its lifetime, userspace may still be performing discovery actions like blkdev_ioctl() to re-read partitions. The nvdimm_revalidate_disk() implementation depends on disk->driverfs_dev to be valid at entry. However, it is set to NULL in del_gendisk() and fatally this is happening *before* the disk device is deleted from userspace view. There's no reason for del_gendisk() to clear ->driverfs_dev. That device is the parent of the disk. It is guaranteed to not be freed until the disk, as a child, drops its ->parent reference. We could also fix this issue locally in nvdimm_revalidate_disk() by using disk_to_dev(disk)->parent, but lets fix it globally since ->driverfs_dev follows the lifetime of the parent. Longer term we should probably just add a @parent parameter to add_disk(), and stop carrying this pointer in the gendisk. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm] CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G O 4.4.0-rc5 #2257 [..] Call Trace: [] rescan_partitions+0x87/0x2c0 [] ? __lock_is_held+0x49/0x70 [] __blkdev_reread_part+0x72/0xb0 [] blkdev_reread_part+0x25/0x40 [] blkdev_ioctl+0x4fd/0x9c0 [] ? current_kernel_time64+0x69/0xd0 [] block_ioctl+0x3d/0x50 [] do_vfs_ioctl+0x308/0x560 [] ? __audit_syscall_entry+0xb1/0x100 [] ? do_audit_syscall_entry+0x66/0x70 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x12/0x76 Cc: Jan Kara Cc: Jens Axboe Reported-by: Robert Hu Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman --- block/genhd.c | 1 - 1 file changed, 1 deletion(-) diff --git a/block/genhd.c b/block/genhd.c index a5bed6bc869d..3032453a89e6 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -664,7 +664,6 @@ void del_gendisk(struct gendisk *disk) kobject_put(disk->part0.holder_dir); kobject_put(disk->slave_dir); - disk->driverfs_dev = NULL; if (!sysfs_deprecated) sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk))); pm_runtime_set_memalloc_noio(disk_to_dev(disk), false); From 6862fa9077dea0ec2ba5e6ea7c7f90b786288596 Mon Sep 17 00:00:00 2001 From: Jon Paul Maloy Date: Wed, 24 Feb 2016 11:10:48 -0500 Subject: [PATCH 434/521] tipc: fix crash during node removal commit d25a01257e422a4bdeb426f69529d57c73b235fe upstream. When the TIPC module is unloaded, we have identified a race condition that allows a node reference counter to go to zero and the node instance being freed before the node timer is finished with accessing it. This leads to occasional crashes, especially in multi-namespace environments. The scenario goes as follows: CPU0:(node_stop) CPU1:(node_timeout) // ref == 2 1: if(!mod_timer()) 2: if (del_timer()) 3: tipc_node_put() // ref -> 1 4: tipc_node_put() // ref -> 0 5: kfree_rcu(node); 6: tipc_node_get(node) 7: // BOOM! We now clean up this functionality as follows: 1) We remove the node pointer from the node lookup table before we attempt deactivating the timer. This way, we reduce the risk that tipc_node_find() may obtain a valid pointer to an instance marked for deletion; a harmless but undesirable situation. 2) We use del_timer_sync() instead of del_timer() to safely deactivate the node timer without any risk that it might be reactivated by the timeout handler. There is no risk of deadlock here, since the two functions never touch the same spinlocks. 3: We remove a pointless tipc_node_get() + tipc_node_put() from the timeout handler. Reported-by: Zhijiang Hu Acked-by: Ying Xue Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/node.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/net/tipc/node.c b/net/tipc/node.c index 3926b561f873..d468aad6163e 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -102,9 +102,10 @@ static unsigned int tipc_hashfn(u32 addr) static void tipc_node_kref_release(struct kref *kref) { - struct tipc_node *node = container_of(kref, struct tipc_node, kref); + struct tipc_node *n = container_of(kref, struct tipc_node, kref); - tipc_node_delete(node); + kfree(n->bc_entry.link); + kfree_rcu(n, rcu); } void tipc_node_put(struct tipc_node *node) @@ -216,21 +217,20 @@ static void tipc_node_delete(struct tipc_node *node) { list_del_rcu(&node->list); hlist_del_rcu(&node->hash); - kfree(node->bc_entry.link); - kfree_rcu(node, rcu); + tipc_node_put(node); + + del_timer_sync(&node->timer); + tipc_node_put(node); } void tipc_node_stop(struct net *net) { - struct tipc_net *tn = net_generic(net, tipc_net_id); + struct tipc_net *tn = tipc_net(net); struct tipc_node *node, *t_node; spin_lock_bh(&tn->node_list_lock); - list_for_each_entry_safe(node, t_node, &tn->node_list, list) { - if (del_timer(&node->timer)) - tipc_node_put(node); - tipc_node_put(node); - } + list_for_each_entry_safe(node, t_node, &tn->node_list, list) + tipc_node_delete(node); spin_unlock_bh(&tn->node_list_lock); } @@ -313,9 +313,7 @@ static void tipc_node_timeout(unsigned long data) if (rc & TIPC_LINK_DOWN_EVT) tipc_node_link_down(n, bearer_id, false); } - if (!mod_timer(&n->timer, jiffies + n->keepalive_intv)) - tipc_node_get(n); - tipc_node_put(n); + mod_timer(&n->timer, jiffies + n->keepalive_intv); } /** From 12f4e1f54a1334bcd5f9586fc9eb7baefb14b826 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 27 Apr 2017 09:09:53 +0200 Subject: [PATCH 435/521] Linux 4.4.64 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index ec52973043f6..17708f5dc169 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 63 +SUBLEVEL = 64 EXTRAVERSION = NAME = Blurry Fish Butt From 44b3b7e068874040ca511fcd2a812b5fbcf44616 Mon Sep 17 00:00:00 2001 From: Richard Alpe Date: Mon, 14 Mar 2016 09:43:52 +0100 Subject: [PATCH 436/521] tipc: make sure IPv6 header fits in skb headroom commit 9bd160bfa27fa41927dbbce7ee0ea779700e09ef upstream. Expand headroom further in order to be able to fit the larger IPv6 header. Prior to this patch this caused a skb under panic for certain tipc packets when using IPv6 UDP bearer(s). Signed-off-by: Richard Alpe Acked-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Jon Maloy Signed-off-by: Greg Kroah-Hartman --- net/tipc/udp_media.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 6af78c6276b4..4056798c54a5 100644 --- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -52,7 +52,7 @@ /* IANA assigned UDP port */ #define UDP_PORT_DEFAULT 6118 -#define UDP_MIN_HEADROOM 28 +#define UDP_MIN_HEADROOM 48 static const struct nla_policy tipc_nl_udp_policy[TIPC_NLA_UDP_MAX + 1] = { [TIPC_NLA_UDP_UNSPEC] = {.type = NLA_UNSPEC}, From 3f31559043087b9cd45582c2eb12d7900cedc4ed Mon Sep 17 00:00:00 2001 From: Erik Hugne Date: Thu, 7 Apr 2016 10:40:43 -0400 Subject: [PATCH 437/521] tipc: make dist queue pernet commit 541726abe7daca64390c2ec34e6a203145f1686d upstream. Nametable updates received from the network that cannot be applied immediately are placed on a defer queue. This queue is global to the TIPC module, which might cause problems when using TIPC in containers. To prevent nametable updates from escaping into the wrong namespace, we make the queue pernet instead. Signed-off-by: Erik Hugne Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/core.c | 1 + net/tipc/core.h | 3 +++ net/tipc/name_distr.c | 16 +++++++--------- 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/net/tipc/core.c b/net/tipc/core.c index 03a842870c52..e2bdb07a49a2 100644 --- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -69,6 +69,7 @@ static int __net_init tipc_init_net(struct net *net) if (err) goto out_nametbl; + INIT_LIST_HEAD(&tn->dist_queue); err = tipc_topsrv_start(net); if (err) goto out_subscr; diff --git a/net/tipc/core.h b/net/tipc/core.h index 18e95a8020cd..fe3b89e9cde4 100644 --- a/net/tipc/core.h +++ b/net/tipc/core.h @@ -103,6 +103,9 @@ struct tipc_net { spinlock_t nametbl_lock; struct name_table *nametbl; + /* Name dist queue */ + struct list_head dist_queue; + /* Topology subscription server */ struct tipc_server *topsrv; atomic_t subscription_count; diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c index f51c8bdbea1c..18f8152888f4 100644 --- a/net/tipc/name_distr.c +++ b/net/tipc/name_distr.c @@ -40,11 +40,6 @@ int sysctl_tipc_named_timeout __read_mostly = 2000; -/** - * struct tipc_dist_queue - queue holding deferred name table updates - */ -static struct list_head tipc_dist_queue = LIST_HEAD_INIT(tipc_dist_queue); - struct distr_queue_item { struct distr_item i; u32 dtype; @@ -340,9 +335,11 @@ static bool tipc_update_nametbl(struct net *net, struct distr_item *i, * tipc_named_add_backlog - add a failed name table update to the backlog * */ -static void tipc_named_add_backlog(struct distr_item *i, u32 type, u32 node) +static void tipc_named_add_backlog(struct net *net, struct distr_item *i, + u32 type, u32 node) { struct distr_queue_item *e; + struct tipc_net *tn = net_generic(net, tipc_net_id); unsigned long now = get_jiffies_64(); e = kzalloc(sizeof(*e), GFP_ATOMIC); @@ -352,7 +349,7 @@ static void tipc_named_add_backlog(struct distr_item *i, u32 type, u32 node) e->node = node; e->expires = now + msecs_to_jiffies(sysctl_tipc_named_timeout); memcpy(e, i, sizeof(*i)); - list_add_tail(&e->next, &tipc_dist_queue); + list_add_tail(&e->next, &tn->dist_queue); } /** @@ -362,10 +359,11 @@ static void tipc_named_add_backlog(struct distr_item *i, u32 type, u32 node) void tipc_named_process_backlog(struct net *net) { struct distr_queue_item *e, *tmp; + struct tipc_net *tn = net_generic(net, tipc_net_id); char addr[16]; unsigned long now = get_jiffies_64(); - list_for_each_entry_safe(e, tmp, &tipc_dist_queue, next) { + list_for_each_entry_safe(e, tmp, &tn->dist_queue, next) { if (time_after(e->expires, now)) { if (!tipc_update_nametbl(net, &e->i, e->node, e->dtype)) continue; @@ -405,7 +403,7 @@ void tipc_named_rcv(struct net *net, struct sk_buff_head *inputq) node = msg_orignode(msg); while (count--) { if (!tipc_update_nametbl(net, item, node, mtype)) - tipc_named_add_backlog(item, mtype, node); + tipc_named_add_backlog(net, item, mtype, node); item++; } kfree_skb(skb); From 76ca3053f32c997472c325176c235a25170fc98b Mon Sep 17 00:00:00 2001 From: Jon Paul Maloy Date: Mon, 2 May 2016 11:58:45 -0400 Subject: [PATCH 438/521] tipc: re-enable compensation for socket receive buffer double counting commit 7c8bcfb1255fe9d929c227d67bdcd84430fd200b upstream. In the refactoring commit d570d86497ee ("tipc: enqueue arrived buffers in socket in separate function") we did by accident replace the test if (sk->sk_backlog.len == 0) atomic_set(&tsk->dupl_rcvcnt, 0); with if (sk->sk_backlog.len) atomic_set(&tsk->dupl_rcvcnt, 0); This effectively disables the compensation we have for the double receive buffer accounting that occurs temporarily when buffers are moved from the backlog to the socket receive queue. Until now, this has gone unnoticed because of the large receive buffer limits we are applying, but becomes indispensable when we reduce this buffer limit later in this series. We now fix this by inverting the mentioned condition. Acked-by: Ying Xue Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index b26b7a127773..d119291db852 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1755,7 +1755,7 @@ static void tipc_sk_enqueue(struct sk_buff_head *inputq, struct sock *sk, /* Try backlog, compensating for double-counted bytes */ dcnt = &tipc_sk(sk)->dupl_rcvcnt; - if (sk->sk_backlog.len) + if (!sk->sk_backlog.len) atomic_set(dcnt, 0); lim = rcvbuf_limit(sk, skb) + atomic_read(dcnt); if (likely(!sk_add_backlog(sk, skb, lim))) From 2847736f563d0ac1f84ddad1e4877c0856bc1adb Mon Sep 17 00:00:00 2001 From: Jon Paul Maloy Date: Wed, 8 Jun 2016 12:00:04 -0400 Subject: [PATCH 439/521] tipc: correct error in node fsm commit c4282ca76c5b81ed73ef4c5eb5c07ee397e51642 upstream. commit 88e8ac7000dc ("tipc: reduce transmission rate of reset messages when link is down") revealed a flaw in the node FSM, as defined in the log of commit 66996b6c47ed ("tipc: extend node FSM"). We see the following scenario: 1: Node B receives a RESET message from node A before its link endpoint is fully up, i.e., the node FSM is in state SELF_UP_PEER_COMING. This event will not change the node FSM state, but the (distinct) link FSM will move to state RESETTING. 2: As an effect of the previous event, the local endpoint on B will declare node A lost, and post the event SELF_DOWN to the its node FSM. This moves the FSM state to SELF_DOWN_PEER_LEAVING, meaning that no messages will be accepted from A until it receives another RESET message that confirms that A's endpoint has been reset. This is wasteful, since we know this as a fact already from the first received RESET, but worse is that the link instance's FSM has not wasted this information, but instead moved on to state ESTABLISHING, meaning that it repeatedly sends out ACTIVATE messages to the reset peer A. 3: Node A will receive one of the ACTIVATE messages, move its link FSM to state ESTABLISHED, and start repeatedly sending out STATE messages to node B. 4: Node B will consistently drop these messages, since it can only accept accept a RESET according to its node FSM. 5: After four lost STATE messages node A will reset its link and start repeatedly sending out RESET messages to B. 6: Because of the reduced send rate for RESET messages, it is very likely that A will receive an ACTIVATE (which is sent out at a much higher frequency) before it gets the chance to send a RESET, and A may hence quickly move back to state ESTABLISHED and continue sending out STATE messages, which will again be dropped by B. 7: GOTO 5. 8: After having repeated the cycle 5-7 a number of times, node A will by chance get in between with sending a RESET, and the situation is resolved. Unfortunately, we have seen that it may take a substantial amount of time before this vicious loop is broken, sometimes in the order of minutes. We correct this by making a small correction to the node FSM: When a node in state SELF_UP_PEER_COMING receives a SELF_DOWN event, it now moves directly back to state SELF_DOWN_PEER_DOWN, instead of as now SELF_DOWN_PEER_LEAVING. This is logically consistent, since we don't need to wait for RESET confirmation from of an endpoint that we alread know has been reset. It also means that node B in the scenario above will not be dropping incoming STATE messages, and the link can come up immediately. Finally, a symmetry comparison reveals that the FSM has a similar error when receiving the event PEER_DOWN in state PEER_UP_SELF_COMING. Instead of moving to PERR_DOWN_SELF_LEAVING, it should move directly to SELF_DOWN_PEER_DOWN. Although we have never seen any negative effect of this logical error, we choose fix this one, too. The node FSM looks as follows after those changes: +----------------------------------------+ | PEER_DOWN_EVT| | | +------------------------+----------------+ | |SELF_DOWN_EVT | | | | | | | | +-----------+ +-----------+ | | |NODE_ | |NODE_ | | | +----------|FAILINGOVER|<---------|SYNCHING |-----------+ | | |SELF_ +-----------+ FAILOVER_+-----------+ PEER_ | | | |DOWN_EVT | A BEGIN_EVT A | DOWN_EVT| | | | | | | | | | | | | | | | | | | | |FAILOVER_ |FAILOVER_ |SYNCH_ |SYNCH_ | | | | |END_EVT |BEGIN_EVT |BEGIN_EVT|END_EVT | | | | | | | | | | | | | | | | | | | | | +--------------+ | | | | | +-------->| SELF_UP_ |<-------+ | | | | +-----------------| PEER_UP |----------------+ | | | | |SELF_DOWN_EVT +--------------+ PEER_DOWN_EVT| | | | | | A A | | | | | | | | | | | | | | PEER_UP_EVT| |SELF_UP_EVT | | | | | | | | | | | V V V | | V V V +------------+ +-----------+ +-----------+ +------------+ |SELF_DOWN_ | |SELF_UP_ | |PEER_UP_ | |PEER_DOWN | |PEER_LEAVING| |PEER_COMING| |SELF_COMING| |SELF_LEAVING| +------------+ +-----------+ +-----------+ +------------+ | | A A | | | | | | | | | SELF_ | |SELF_ |PEER_ |PEER_ | | DOWN_EVT| |UP_EVT |UP_EVT |DOWN_EVT | | | | | | | | | | | | | | | +--------------+ | | |PEER_DOWN_EVT +--->| SELF_DOWN_ |<---+ SELF_DOWN_EVT| +------------------->| PEER_DOWN |<--------------------+ +--------------+ Acked-by: Ying Xue Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/node.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/tipc/node.c b/net/tipc/node.c index d468aad6163e..2df0b98d4a32 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -728,7 +728,7 @@ static void tipc_node_fsm_evt(struct tipc_node *n, int evt) state = SELF_UP_PEER_UP; break; case SELF_LOST_CONTACT_EVT: - state = SELF_DOWN_PEER_LEAVING; + state = SELF_DOWN_PEER_DOWN; break; case SELF_ESTABL_CONTACT_EVT: case PEER_LOST_CONTACT_EVT: @@ -747,7 +747,7 @@ static void tipc_node_fsm_evt(struct tipc_node *n, int evt) state = SELF_UP_PEER_UP; break; case PEER_LOST_CONTACT_EVT: - state = SELF_LEAVING_PEER_DOWN; + state = SELF_DOWN_PEER_DOWN; break; case SELF_LOST_CONTACT_EVT: case PEER_ESTABL_CONTACT_EVT: From 58f80ccf09c4fb8ae2819cd2c0583b885b6b5454 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 25 Jan 2016 22:54:56 +0100 Subject: [PATCH 440/521] tty: nozomi: avoid a harmless gcc warning commit a4f642a8a3c2838ad09fe8313d45db46600e1478 upstream. The nozomi wireless data driver has its own helper function to transfer data from a FIFO, doing an extra byte swap on big-endian architectures, presumably to bring the data back into byte-serial order after readw() or readl() perform their implicit byteswap. This helper function is used in the receive_data() function to first read the length into a 32-bit variable, which causes a compile-time warning: drivers/tty/nozomi.c: In function 'receive_data': drivers/tty/nozomi.c:857:9: warning: 'size' may be used uninitialized in this function [-Wmaybe-uninitialized] The problem is that gcc is unsure whether the data was actually read or not. We know that it is at this point, so we can replace it with a single readl() to shut up that warning. I am leaving the byteswap in there, to preserve the existing behavior, even though this seems fishy: Reading the length of the data into a cpu-endian variable should normally not use a second byteswap on big-endian systems, unless the hardware is aware of the CPU endianess. There appears to be a lot more confusion about endianess in this driver, so it probably has not worked on big-endian systems in a long time, if ever, and I have no way to test it. It's well possible that this driver has not been used by anyone in a while, the last patch that looks like it was tested on the hardware is from 2008. Signed-off-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- drivers/tty/nozomi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/nozomi.c b/drivers/tty/nozomi.c index 80f9de907563..5cc80b80c82b 100644 --- a/drivers/tty/nozomi.c +++ b/drivers/tty/nozomi.c @@ -823,7 +823,7 @@ static int receive_data(enum port_type index, struct nozomi *dc) struct tty_struct *tty = tty_port_tty_get(&port->port); int i, ret; - read_mem32((u32 *) &size, addr, 4); + size = __le32_to_cpu(readl(addr)); /* DBG1( "%d bytes port: %d", size, index); */ if (tty && test_bit(TTY_THROTTLED, &tty->flags)) { From 9a35bc2ae545b352966a107bf81d8fdcafe4d7bf Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 28 Jan 2016 22:58:28 +0100 Subject: [PATCH 441/521] hostap: avoid uninitialized variable use in hfa384x_get_rid commit 48dc5fb3ba53b20418de8514700f63d88c5de3a3 upstream. The driver reads a value from hfa384x_from_bap(), which may fail, and then assigns the value to a local variable. gcc detects that in in the failure case, the 'rlen' variable now contains uninitialized data: In file included from ../drivers/net/wireless/intersil/hostap/hostap_pci.c:220:0: drivers/net/wireless/intersil/hostap/hostap_hw.c: In function 'hfa384x_get_rid': drivers/net/wireless/intersil/hostap/hostap_hw.c:842:5: warning: 'rec' may be used uninitialized in this function [-Wmaybe-uninitialized] if (le16_to_cpu(rec.len) == 0) { This restructures the function as suggested by Russell King, to make it more readable and get more reliable error handling, by handling each failure mode using a goto. Signed-off-by: Arnd Bergmann Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/hostap/hostap_hw.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/net/wireless/hostap/hostap_hw.c b/drivers/net/wireless/hostap/hostap_hw.c index 6df3ee561d52..515aa3f993f3 100644 --- a/drivers/net/wireless/hostap/hostap_hw.c +++ b/drivers/net/wireless/hostap/hostap_hw.c @@ -836,25 +836,30 @@ static int hfa384x_get_rid(struct net_device *dev, u16 rid, void *buf, int len, spin_lock_bh(&local->baplock); res = hfa384x_setup_bap(dev, BAP0, rid, 0); - if (!res) - res = hfa384x_from_bap(dev, BAP0, &rec, sizeof(rec)); + if (res) + goto unlock; + + res = hfa384x_from_bap(dev, BAP0, &rec, sizeof(rec)); + if (res) + goto unlock; if (le16_to_cpu(rec.len) == 0) { /* RID not available */ res = -ENODATA; + goto unlock; } rlen = (le16_to_cpu(rec.len) - 1) * 2; - if (!res && exact_len && rlen != len) { + if (exact_len && rlen != len) { printk(KERN_DEBUG "%s: hfa384x_get_rid - RID len mismatch: " "rid=0x%04x, len=%d (expected %d)\n", dev->name, rid, rlen, len); res = -ENODATA; } - if (!res) - res = hfa384x_from_bap(dev, BAP0, buf, len); + res = hfa384x_from_bap(dev, BAP0, buf, len); +unlock: spin_unlock_bh(&local->baplock); mutex_unlock(&local->rid_bap_mtx); From d39cb4a597295c6fd5e01795a134f1e3c0914049 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 26 Jan 2016 13:08:10 -0500 Subject: [PATCH 442/521] gfs2: avoid uninitialized variable warning commit 67893f12e5374bbcaaffbc6e570acbc2714ea884 upstream. We get a bogus warning about a potential uninitialized variable use in gfs2, because the compiler does not figure out that we never use the leaf number if get_leaf_nr() returns an error: fs/gfs2/dir.c: In function 'get_first_leaf': fs/gfs2/dir.c:802:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized] fs/gfs2/dir.c: In function 'dir_split_leaf': fs/gfs2/dir.c:1021:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized] Changing the 'if (!error)' to 'if (!IS_ERR_VALUE(error))' is sufficient to let gcc understand that this is exactly the same condition as in IS_ERR() so it can optimize the code path enough to understand it. Signed-off-by: Arnd Bergmann Signed-off-by: Bob Peterson Signed-off-by: Greg Kroah-Hartman --- fs/gfs2/dir.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index ad8a5b757cc7..a443c6e54412 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -760,7 +760,7 @@ static int get_first_leaf(struct gfs2_inode *dip, u32 index, int error; error = get_leaf_nr(dip, index, &leaf_no); - if (!error) + if (!IS_ERR_VALUE(error)) error = get_leaf(dip, leaf_no, bh_out); return error; @@ -976,7 +976,7 @@ static int dir_split_leaf(struct inode *inode, const struct qstr *name) index = name->hash >> (32 - dip->i_depth); error = get_leaf_nr(dip, index, &leaf_no); - if (error) + if (IS_ERR_VALUE(error)) return error; /* Get the old leaf block */ From abc025d1e88a47c24a0f4411d851c1e9c3e0e87d Mon Sep 17 00:00:00 2001 From: Parthasarathy Bhuvaragan Date: Thu, 1 Sep 2016 16:22:16 +0200 Subject: [PATCH 443/521] tipc: fix random link resets while adding a second bearer commit d2f394dc4816b7bd1b44981d83509f18f19c53f0 upstream. In a dual bearer configuration, if the second tipc link becomes active while the first link still has pending nametable "bulk" updates, it randomly leads to reset of the second link. When a link is established, the function named_distribute(), fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL) with NAME_DISTRIBUTOR message for each PUBLICATION. However, the function named_distribute() allocates the buffer by increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR). This consumes the space allocated for TUNNEL_PROTOCOL. When establishing the second link, the link shall tunnel all the messages in the first link queue including the "bulk" update. As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds the link mtu the transmission fails (-EMSGSIZE). Thus, the synch point based on the message count of the tunnel packets is never reached leading to link timeout. In this commit, we adjust the size of name distributor message so that they can be tunnelled. Reviewed-by: Jon Maloy Signed-off-by: Parthasarathy Bhuvaragan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/name_distr.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c index 18f8152888f4..c4c151bc000c 100644 --- a/net/tipc/name_distr.c +++ b/net/tipc/name_distr.c @@ -62,6 +62,8 @@ static void publ_to_item(struct distr_item *i, struct publication *p) /** * named_prepare_buf - allocate & initialize a publication message + * + * The buffer returned is of size INT_H_SIZE + payload size */ static struct sk_buff *named_prepare_buf(struct net *net, u32 type, u32 size, u32 dest) @@ -166,9 +168,9 @@ static void named_distribute(struct net *net, struct sk_buff_head *list, struct publication *publ; struct sk_buff *skb = NULL; struct distr_item *item = NULL; - uint msg_dsz = (tipc_node_get_mtu(net, dnode, 0) / ITEM_SIZE) * - ITEM_SIZE; - uint msg_rem = msg_dsz; + u32 msg_dsz = ((tipc_node_get_mtu(net, dnode, 0) - INT_H_SIZE) / + ITEM_SIZE) * ITEM_SIZE; + u32 msg_rem = msg_dsz; list_for_each_entry(publ, pls, local_list) { /* Prepare next buffer: */ From 59e0cd110fb9fb9aa97bb59c57789adb0e82da8d Mon Sep 17 00:00:00 2001 From: Jon Paul Maloy Date: Fri, 17 Jun 2016 06:35:57 -0400 Subject: [PATCH 444/521] tipc: fix socket timer deadlock commit f1d048f24e66ba85d3dabf3d076cefa5f2b546b0 upstream. We sometimes observe a 'deadly embrace' type deadlock occurring between mutually connected sockets on the same node. This happens when the one-hour peer supervision timers happen to expire simultaneously in both sockets. The scenario is as follows: CPU 1: CPU 2: -------- -------- tipc_sk_timeout(sk1) tipc_sk_timeout(sk2) lock(sk1.slock) lock(sk2.slock) msg_create(probe) msg_create(probe) unlock(sk1.slock) unlock(sk2.slock) tipc_node_xmit_skb() tipc_node_xmit_skb() tipc_node_xmit() tipc_node_xmit() tipc_sk_rcv(sk2) tipc_sk_rcv(sk1) lock(sk2.slock) lock((sk1.slock) filter_rcv() filter_rcv() tipc_sk_proto_rcv() tipc_sk_proto_rcv() msg_create(probe_rsp) msg_create(probe_rsp) tipc_sk_respond() tipc_sk_respond() tipc_node_xmit_skb() tipc_node_xmit_skb() tipc_node_xmit() tipc_node_xmit() tipc_sk_rcv(sk1) tipc_sk_rcv(sk2) lock((sk1.slock) lock((sk2.slock) ===> DEADLOCK ===> DEADLOCK Further analysis reveals that there are three different locations in the socket code where tipc_sk_respond() is called within the context of the socket lock, with ensuing risk of similar deadlocks. We now solve this by passing a buffer queue along with all upcalls where sk_lock.slock may potentially be held. Response or rejected message buffers are accumulated into this queue instead of being sent out directly, and only sent once we know we are safely outside the slock context. Reported-by: GUNA Acked-by: Ying Xue Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/socket.c | 54 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 42 insertions(+), 12 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index d119291db852..65171f8e8c45 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -777,9 +777,11 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, * @tsk: receiving socket * @skb: pointer to message buffer. */ -static void tipc_sk_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb) +static void tipc_sk_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb, + struct sk_buff_head *xmitq) { struct sock *sk = &tsk->sk; + u32 onode = tsk_own_node(tsk); struct tipc_msg *hdr = buf_msg(skb); int mtyp = msg_type(hdr); int conn_cong; @@ -792,7 +794,8 @@ static void tipc_sk_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb) if (mtyp == CONN_PROBE) { msg_set_type(hdr, CONN_PROBE_REPLY); - tipc_sk_respond(sk, skb, TIPC_OK); + if (tipc_msg_reverse(onode, &skb, TIPC_OK)) + __skb_queue_tail(xmitq, skb); return; } else if (mtyp == CONN_ACK) { conn_cong = tsk_conn_cong(tsk); @@ -1647,7 +1650,8 @@ static unsigned int rcvbuf_limit(struct sock *sk, struct sk_buff *buf) * * Returns true if message was added to socket receive queue, otherwise false */ -static bool filter_rcv(struct sock *sk, struct sk_buff *skb) +static bool filter_rcv(struct sock *sk, struct sk_buff *skb, + struct sk_buff_head *xmitq) { struct socket *sock = sk->sk_socket; struct tipc_sock *tsk = tipc_sk(sk); @@ -1657,7 +1661,7 @@ static bool filter_rcv(struct sock *sk, struct sk_buff *skb) int usr = msg_user(hdr); if (unlikely(msg_user(hdr) == CONN_MANAGER)) { - tipc_sk_proto_rcv(tsk, skb); + tipc_sk_proto_rcv(tsk, skb, xmitq); return false; } @@ -1700,7 +1704,8 @@ static bool filter_rcv(struct sock *sk, struct sk_buff *skb) return true; reject: - tipc_sk_respond(sk, skb, err); + if (tipc_msg_reverse(tsk_own_node(tsk), &skb, err)) + __skb_queue_tail(xmitq, skb); return false; } @@ -1716,9 +1721,24 @@ static bool filter_rcv(struct sock *sk, struct sk_buff *skb) static int tipc_backlog_rcv(struct sock *sk, struct sk_buff *skb) { unsigned int truesize = skb->truesize; + struct sk_buff_head xmitq; + u32 dnode, selector; - if (likely(filter_rcv(sk, skb))) + __skb_queue_head_init(&xmitq); + + if (likely(filter_rcv(sk, skb, &xmitq))) { atomic_add(truesize, &tipc_sk(sk)->dupl_rcvcnt); + return 0; + } + + if (skb_queue_empty(&xmitq)) + return 0; + + /* Send response/rejected message */ + skb = __skb_dequeue(&xmitq); + dnode = msg_destnode(buf_msg(skb)); + selector = msg_origport(buf_msg(skb)); + tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector); return 0; } @@ -1732,12 +1752,13 @@ static int tipc_backlog_rcv(struct sock *sk, struct sk_buff *skb) * Caller must hold socket lock */ static void tipc_sk_enqueue(struct sk_buff_head *inputq, struct sock *sk, - u32 dport) + u32 dport, struct sk_buff_head *xmitq) { + unsigned long time_limit = jiffies + 2; + struct sk_buff *skb; unsigned int lim; atomic_t *dcnt; - struct sk_buff *skb; - unsigned long time_limit = jiffies + 2; + u32 onode; while (skb_queue_len(inputq)) { if (unlikely(time_after_eq(jiffies, time_limit))) @@ -1749,7 +1770,7 @@ static void tipc_sk_enqueue(struct sk_buff_head *inputq, struct sock *sk, /* Add message directly to receive queue if possible */ if (!sock_owned_by_user(sk)) { - filter_rcv(sk, skb); + filter_rcv(sk, skb, xmitq); continue; } @@ -1762,7 +1783,9 @@ static void tipc_sk_enqueue(struct sk_buff_head *inputq, struct sock *sk, continue; /* Overload => reject message back to sender */ - tipc_sk_respond(sk, skb, TIPC_ERR_OVERLOAD); + onode = tipc_own_addr(sock_net(sk)); + if (tipc_msg_reverse(onode, &skb, TIPC_ERR_OVERLOAD)) + __skb_queue_tail(xmitq, skb); break; } } @@ -1775,12 +1798,14 @@ static void tipc_sk_enqueue(struct sk_buff_head *inputq, struct sock *sk, */ void tipc_sk_rcv(struct net *net, struct sk_buff_head *inputq) { + struct sk_buff_head xmitq; u32 dnode, dport = 0; int err; struct tipc_sock *tsk; struct sock *sk; struct sk_buff *skb; + __skb_queue_head_init(&xmitq); while (skb_queue_len(inputq)) { dport = tipc_skb_peek_port(inputq, dport); tsk = tipc_sk_lookup(net, dport); @@ -1788,9 +1813,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head *inputq) if (likely(tsk)) { sk = &tsk->sk; if (likely(spin_trylock_bh(&sk->sk_lock.slock))) { - tipc_sk_enqueue(inputq, sk, dport); + tipc_sk_enqueue(inputq, sk, dport, &xmitq); spin_unlock_bh(&sk->sk_lock.slock); } + /* Send pending response/rejected messages, if any */ + while ((skb = __skb_dequeue(&xmitq))) { + dnode = msg_destnode(buf_msg(skb)); + tipc_node_xmit_skb(net, skb, dnode, dport); + } sock_put(sk); continue; } From c50fd34e10897114a7be2120133bd7e0b4184024 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 28 Sep 2016 00:27:17 -0500 Subject: [PATCH 445/521] mnt: Add a per mount namespace limit on the number of mounts commit d29216842a85c7970c536108e093963f02714498 upstream. CAI Qian pointed out that the semantics of shared subtrees make it possible to create an exponentially increasing number of mounts in a mount namespace. mkdir /tmp/1 /tmp/2 mount --make-rshared / for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done Will create create 2^20 or 1048576 mounts, which is a practical problem as some people have managed to hit this by accident. As such CVE-2016-6213 was assigned. Ian Kent described the situation for autofs users as follows: > The number of mounts for direct mount maps is usually not very large because of > the way they are implemented, large direct mount maps can have performance > problems. There can be anywhere from a few (likely case a few hundred) to less > than 10000, plus mounts that have been triggered and not yet expired. > > Indirect mounts have one autofs mount at the root plus the number of mounts that > have been triggered and not yet expired. > > The number of autofs indirect map entries can range from a few to the common > case of several thousand and in rare cases up to between 30000 and 50000. I've > not heard of people with maps larger than 50000 entries. > > The larger the number of map entries the greater the possibility for a large > number of active mounts so it's not hard to expect cases of a 1000 or somewhat > more active mounts. So I am setting the default number of mounts allowed per mount namespace at 100,000. This is more than enough for any use case I know of, but small enough to quickly stop an exponential increase in mounts. Which should be perfect to catch misconfigurations and malfunctioning programs. For anyone who needs a higher limit this can be changed by writing to the new /proc/sys/fs/mount-max sysctl. Tested-by: CAI Qian Signed-off-by: "Eric W. Biederman" [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- Documentation/sysctl/fs.txt | 7 ++++++ fs/mount.h | 2 ++ fs/namespace.c | 50 ++++++++++++++++++++++++++++++++++++- fs/pnode.c | 2 +- fs/pnode.h | 1 + include/linux/mount.h | 2 ++ kernel/sysctl.c | 9 +++++++ 7 files changed, 71 insertions(+), 2 deletions(-) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 302b5ed616a6..35e17f748ca7 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -265,6 +265,13 @@ aio-nr can grow to. ============================================================== +mount-max: + +This denotes the maximum number of mounts that may exist +in a mount namespace. + +============================================================== + 2. /proc/sys/fs/binfmt_misc ---------------------------------------------------------- diff --git a/fs/mount.h b/fs/mount.h index 3dc7dea5a357..13a4ebbbaa74 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -13,6 +13,8 @@ struct mnt_namespace { u64 seq; /* Sequence number to prevent loops */ wait_queue_head_t poll; u64 event; + unsigned int mounts; /* # of mounts in the namespace */ + unsigned int pending_mounts; }; struct mnt_pcp { diff --git a/fs/namespace.c b/fs/namespace.c index 7df3d406d3e0..f26d18d69712 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -27,6 +27,9 @@ #include "pnode.h" #include "internal.h" +/* Maximum number of mounts in a mount namespace */ +unsigned int sysctl_mount_max __read_mostly = 100000; + static unsigned int m_hash_mask __read_mostly; static unsigned int m_hash_shift __read_mostly; static unsigned int mp_hash_mask __read_mostly; @@ -925,6 +928,9 @@ static void commit_tree(struct mount *mnt) list_splice(&head, n->list.prev); + n->mounts += n->pending_mounts; + n->pending_mounts = 0; + __attach_mnt(mnt, parent); touch_mnt_namespace(n); } @@ -1445,11 +1451,16 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how) propagate_umount(&tmp_list); while (!list_empty(&tmp_list)) { + struct mnt_namespace *ns; bool disconnect; p = list_first_entry(&tmp_list, struct mount, mnt_list); list_del_init(&p->mnt_expire); list_del_init(&p->mnt_list); - __touch_mnt_namespace(p->mnt_ns); + ns = p->mnt_ns; + if (ns) { + ns->mounts--; + __touch_mnt_namespace(ns); + } p->mnt_ns = NULL; if (how & UMOUNT_SYNC) p->mnt.mnt_flags |= MNT_SYNC_UMOUNT; @@ -1850,6 +1861,28 @@ static int invent_group_ids(struct mount *mnt, bool recurse) return 0; } +int count_mounts(struct mnt_namespace *ns, struct mount *mnt) +{ + unsigned int max = READ_ONCE(sysctl_mount_max); + unsigned int mounts = 0, old, pending, sum; + struct mount *p; + + for (p = mnt; p; p = next_mnt(p, mnt)) + mounts++; + + old = ns->mounts; + pending = ns->pending_mounts; + sum = old + pending; + if ((old > sum) || + (pending > sum) || + (max < sum) || + (mounts > (max - sum))) + return -ENOSPC; + + ns->pending_mounts = pending + mounts; + return 0; +} + /* * @source_mnt : mount tree to be attached * @nd : place the mount tree @source_mnt is attached @@ -1919,6 +1952,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, struct path *parent_path) { HLIST_HEAD(tree_list); + struct mnt_namespace *ns = dest_mnt->mnt_ns; struct mountpoint *smp; struct mount *child, *p; struct hlist_node *n; @@ -1931,6 +1965,13 @@ static int attach_recursive_mnt(struct mount *source_mnt, if (IS_ERR(smp)) return PTR_ERR(smp); + /* Is there space to add these mounts to the mount namespace? */ + if (!parent_path) { + err = count_mounts(ns, source_mnt); + if (err) + goto out; + } + if (IS_MNT_SHARED(dest_mnt)) { err = invent_group_ids(source_mnt, true); if (err) @@ -1970,11 +2011,14 @@ static int attach_recursive_mnt(struct mount *source_mnt, out_cleanup_ids: while (!hlist_empty(&tree_list)) { child = hlist_entry(tree_list.first, struct mount, mnt_hash); + child->mnt_parent->mnt_ns->pending_mounts = 0; umount_tree(child, UMOUNT_SYNC); } unlock_mount_hash(); cleanup_group_ids(source_mnt, NULL); out: + ns->pending_mounts = 0; + read_seqlock_excl(&mount_lock); put_mountpoint(smp); read_sequnlock_excl(&mount_lock); @@ -2804,6 +2848,8 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns) init_waitqueue_head(&new_ns->poll); new_ns->event = 0; new_ns->user_ns = get_user_ns(user_ns); + new_ns->mounts = 0; + new_ns->pending_mounts = 0; return new_ns; } @@ -2853,6 +2899,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, q = new; while (p) { q->mnt_ns = new_ns; + new_ns->mounts++; if (new_fs) { if (&p->mnt == new_fs->root.mnt) { new_fs->root.mnt = mntget(&q->mnt); @@ -2891,6 +2938,7 @@ static struct mnt_namespace *create_mnt_ns(struct vfsmount *m) struct mount *mnt = real_mount(m); mnt->mnt_ns = new_ns; new_ns->root = mnt; + new_ns->mounts++; list_add(&mnt->mnt_list, &new_ns->list); } else { mntput(m); diff --git a/fs/pnode.c b/fs/pnode.c index b9f2af59b9a6..b394ca5307ec 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -259,7 +259,7 @@ static int propagate_one(struct mount *m) read_sequnlock_excl(&mount_lock); } hlist_add_head(&child->mnt_hash, list); - return 0; + return count_mounts(m->mnt_ns, child); } /* diff --git a/fs/pnode.h b/fs/pnode.h index 623f01772bec..dc87e65becd2 100644 --- a/fs/pnode.h +++ b/fs/pnode.h @@ -54,4 +54,5 @@ void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct mount *copy_tree(struct mount *, struct dentry *, int); bool is_path_reachable(struct mount *, struct dentry *, const struct path *root); +int count_mounts(struct mnt_namespace *ns, struct mount *mnt); #endif /* _LINUX_PNODE_H */ diff --git a/include/linux/mount.h b/include/linux/mount.h index f822c3c11377..dc6cd800cd5d 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -95,4 +95,6 @@ extern void mark_mounts_for_expiry(struct list_head *mounts); extern dev_t name_to_dev_t(const char *name); +extern unsigned int sysctl_mount_max; + #endif /* _LINUX_MOUNT_H */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 2f0d157258a2..300d64162aff 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -65,6 +65,7 @@ #include #include #include +#include #include #include @@ -1749,6 +1750,14 @@ static struct ctl_table fs_table[] = { .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, + { + .procname = "mount-max", + .data = &sysctl_mount_max, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &one, + }, { } }; From 0d9dac5d7cc31df50757f26bcbdfbcf47277a1b2 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 28 Jan 2016 09:22:44 -0200 Subject: [PATCH 446/521] xc2028: avoid use after free commit 8dfbcc4351a0b6d2f2d77f367552f48ffefafe18 upstream. If struct xc2028_config is passed without a firmware name, the following trouble may happen: [11009.907205] xc2028 5-0061: type set to XCeive xc2028/xc3028 tuner [11009.907491] ================================================================== [11009.907750] BUG: KASAN: use-after-free in strcmp+0x96/0xb0 at addr ffff8803bd78ab40 [11009.907992] Read of size 1 by task modprobe/28992 [11009.907994] ============================================================================= [11009.907997] BUG kmalloc-16 (Tainted: G W ): kasan: bad access detected [11009.907999] ----------------------------------------------------------------------------- [11009.908008] INFO: Allocated in xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd] age=0 cpu=3 pid=28992 [11009.908012] ___slab_alloc+0x581/0x5b0 [11009.908014] __slab_alloc+0x51/0x90 [11009.908017] __kmalloc+0x27b/0x350 [11009.908022] xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd] [11009.908026] usb_hcd_submit_urb+0x1e8/0x1c60 [11009.908029] usb_submit_urb+0xb0e/0x1200 [11009.908032] usb_serial_generic_write_start+0xb6/0x4c0 [11009.908035] usb_serial_generic_write+0x92/0xc0 [11009.908039] usb_console_write+0x38a/0x560 [11009.908045] call_console_drivers.constprop.14+0x1ee/0x2c0 [11009.908051] console_unlock+0x40d/0x900 [11009.908056] vprintk_emit+0x4b4/0x830 [11009.908061] vprintk_default+0x1f/0x30 [11009.908064] printk+0x99/0xb5 [11009.908067] kasan_report_error+0x10a/0x550 [11009.908070] __asan_report_load1_noabort+0x43/0x50 [11009.908074] INFO: Freed in xc2028_set_config+0x90/0x630 [tuner_xc2028] age=1 cpu=3 pid=28992 [11009.908077] __slab_free+0x2ec/0x460 [11009.908080] kfree+0x266/0x280 [11009.908083] xc2028_set_config+0x90/0x630 [tuner_xc2028] [11009.908086] xc2028_attach+0x310/0x8a0 [tuner_xc2028] [11009.908090] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb] [11009.908094] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb] [11009.908098] em28xx_dvb_init+0x81/0x8a [em28xx_dvb] [11009.908101] em28xx_register_extension+0xd9/0x190 [em28xx] [11009.908105] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb] [11009.908108] do_one_initcall+0x141/0x300 [11009.908111] do_init_module+0x1d0/0x5ad [11009.908114] load_module+0x6666/0x9ba0 [11009.908117] SyS_finit_module+0x108/0x130 [11009.908120] entry_SYSCALL_64_fastpath+0x16/0x76 [11009.908123] INFO: Slab 0xffffea000ef5e280 objects=25 used=25 fp=0x (null) flags=0x2ffff8000004080 [11009.908126] INFO: Object 0xffff8803bd78ab40 @offset=2880 fp=0x0000000000000001 [11009.908130] Bytes b4 ffff8803bd78ab30: 01 00 00 00 2a 07 00 00 9d 28 00 00 01 00 00 00 ....*....(...... [11009.908133] Object ffff8803bd78ab40: 01 00 00 00 00 00 00 00 b0 1d c3 6a 00 88 ff ff ...........j.... [11009.908137] CPU: 3 PID: 28992 Comm: modprobe Tainted: G B W 4.5.0-rc1+ #43 [11009.908140] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015 [11009.908142] ffff8803bd78a000 ffff8802c273f1b8 ffffffff81932007 ffff8803c6407a80 [11009.908148] ffff8802c273f1e8 ffffffff81556759 ffff8803c6407a80 ffffea000ef5e280 [11009.908153] ffff8803bd78ab40 dffffc0000000000 ffff8802c273f210 ffffffff8155ccb4 [11009.908158] Call Trace: [11009.908162] [] dump_stack+0x4b/0x64 [11009.908165] [] print_trailer+0xf9/0x150 [11009.908168] [] object_err+0x34/0x40 [11009.908171] [] kasan_report_error+0x230/0x550 [11009.908175] [] ? trace_hardirqs_off_caller+0x21/0x290 [11009.908179] [] ? kasan_unpoison_shadow+0x36/0x50 [11009.908182] [] __asan_report_load1_noabort+0x43/0x50 [11009.908185] [] ? __asan_register_globals+0x50/0xa0 [11009.908189] [] ? strcmp+0x96/0xb0 [11009.908192] [] strcmp+0x96/0xb0 [11009.908196] [] xc2028_set_config+0x15c/0x630 [tuner_xc2028] [11009.908200] [] xc2028_attach+0x310/0x8a0 [tuner_xc2028] [11009.908203] [] ? memset+0x28/0x30 [11009.908206] [] ? xc2028_set_config+0x630/0x630 [tuner_xc2028] [11009.908211] [] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb] [11009.908215] [] ? em28xx_dvb_init.part.3+0x37c/0x5cf4 [em28xx_dvb] [11009.908219] [] ? hauppauge_hvr930c_init+0x487/0x487 [em28xx_dvb] [11009.908222] [] ? lgdt330x_attach+0x1cc/0x370 [lgdt330x] [11009.908226] [] ? i2c_read_demod_bytes.isra.2+0x210/0x210 [lgdt330x] [11009.908230] [] ? ref_module.part.15+0x10/0x10 [11009.908233] [] ? module_assert_mutex_or_preempt+0x80/0x80 [11009.908238] [] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb] [11009.908242] [] ? em28xx_attach_xc3028.constprop.7+0x30d/0x30d [em28xx_dvb] [11009.908245] [] ? string+0x14d/0x1f0 [11009.908249] [] ? symbol_string+0xff/0x1a0 [11009.908253] [] ? uuid_string+0x6f0/0x6f0 [11009.908257] [] ? __kernel_text_address+0x7e/0xa0 [11009.908260] [] ? print_context_stack+0x7f/0xf0 [11009.908264] [] ? __module_address+0xb6/0x360 [11009.908268] [] ? is_ftrace_trampoline+0x99/0xe0 [11009.908271] [] ? __kernel_text_address+0x7e/0xa0 [11009.908275] [] ? debug_check_no_locks_freed+0x290/0x290 [11009.908278] [] ? dump_trace+0x11b/0x300 [11009.908282] [] ? em28xx_register_extension+0x23/0x190 [em28xx] [11009.908285] [] ? trace_hardirqs_off_caller+0x21/0x290 [11009.908289] [] ? trace_hardirqs_on_caller+0x16/0x590 [11009.908292] [] ? trace_hardirqs_on+0xd/0x10 [11009.908296] [] ? em28xx_register_extension+0x23/0x190 [em28xx] [11009.908299] [] ? mutex_trylock+0x400/0x400 [11009.908302] [] ? do_one_initcall+0x131/0x300 [11009.908306] [] ? call_rcu_sched+0x17/0x20 [11009.908309] [] ? put_object+0x48/0x70 [11009.908314] [] em28xx_dvb_init+0x81/0x8a [em28xx_dvb] [11009.908317] [] em28xx_register_extension+0xd9/0x190 [em28xx] [11009.908320] [] ? 0xffffffffa0150000 [11009.908324] [] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb] [11009.908327] [] do_one_initcall+0x141/0x300 [11009.908330] [] ? try_to_run_init_process+0x40/0x40 [11009.908333] [] ? trace_hardirqs_on_caller+0x16/0x590 [11009.908337] [] ? kasan_unpoison_shadow+0x36/0x50 [11009.908340] [] ? kasan_unpoison_shadow+0x36/0x50 [11009.908343] [] ? kasan_unpoison_shadow+0x36/0x50 [11009.908346] [] ? __asan_register_globals+0x87/0xa0 [11009.908350] [] do_init_module+0x1d0/0x5ad [11009.908353] [] load_module+0x6666/0x9ba0 [11009.908356] [] ? symbol_put_addr+0x50/0x50 [11009.908361] [] ? em28xx_dvb_init.part.3+0x5989/0x5cf4 [em28xx_dvb] [11009.908366] [] ? module_frob_arch_sections+0x20/0x20 [11009.908369] [] ? open_exec+0x50/0x50 [11009.908374] [] ? ns_capable+0x5b/0xd0 [11009.908377] [] SyS_finit_module+0x108/0x130 [11009.908379] [] ? SyS_init_module+0x1f0/0x1f0 [11009.908383] [] ? lockdep_sys_exit_thunk+0x12/0x14 [11009.908394] [] entry_SYSCALL_64_fastpath+0x16/0x76 [11009.908396] Memory state around the buggy address: [11009.908398] ffff8803bd78aa00: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc [11009.908401] ffff8803bd78aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [11009.908403] >ffff8803bd78ab00: fc fc fc fc fc fc fc fc 00 00 fc fc fc fc fc fc [11009.908405] ^ [11009.908407] ffff8803bd78ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [11009.908409] ffff8803bd78ac00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [11009.908411] ================================================================== In order to avoid it, let's set the cached value of the firmware name to NULL after freeing it. While here, return an error if the memory allocation fails. Signed-off-by: Mauro Carvalho Chehab Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- drivers/media/tuners/tuner-xc2028.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/media/tuners/tuner-xc2028.c b/drivers/media/tuners/tuner-xc2028.c index 4e941f00b600..082ff5608455 100644 --- a/drivers/media/tuners/tuner-xc2028.c +++ b/drivers/media/tuners/tuner-xc2028.c @@ -1403,11 +1403,12 @@ static int xc2028_set_config(struct dvb_frontend *fe, void *priv_cfg) * in order to avoid troubles during device release. */ kfree(priv->ctrl.fname); + priv->ctrl.fname = NULL; memcpy(&priv->ctrl, p, sizeof(priv->ctrl)); if (p->fname) { priv->ctrl.fname = kstrdup(p->fname, GFP_KERNEL); if (priv->ctrl.fname == NULL) - rc = -ENOMEM; + return -ENOMEM; } /* From 9540baadb61ba5ed08832bb2a4cbfd876db37ff4 Mon Sep 17 00:00:00 2001 From: Phil Turnbull Date: Tue, 2 Feb 2016 13:36:45 -0500 Subject: [PATCH 447/521] netfilter: nfnetlink: correctly validate length of batch messages commit c58d6c93680f28ac58984af61d0a7ebf4319c241 upstream. If nlh->nlmsg_len is zero then an infinite loop is triggered because 'skb_pull(skb, msglen);' pulls zero bytes. The calculation in nlmsg_len() underflows if 'nlh->nlmsg_len < NLMSG_HDRLEN' which bypasses the length validation and will later trigger an out-of-bound read. If the length validation does fail then the malformed batch message is copied back to userspace. However, we cannot do this because the nlh->nlmsg_len can be invalid. This leads to an out-of-bounds read in netlink_ack: [ 41.455421] ================================================================== [ 41.456431] BUG: KASAN: slab-out-of-bounds in memcpy+0x1d/0x40 at addr ffff880119e79340 [ 41.456431] Read of size 4294967280 by task a.out/987 [ 41.456431] ============================================================================= [ 41.456431] BUG kmalloc-512 (Not tainted): kasan: bad access detected [ 41.456431] ----------------------------------------------------------------------------- ... [ 41.456431] Bytes b4 ffff880119e79310: 00 00 00 00 d5 03 00 00 b0 fb fe ff 00 00 00 00 ................ [ 41.456431] Object ffff880119e79320: 20 00 00 00 10 00 05 00 00 00 00 00 00 00 00 00 ............... [ 41.456431] Object ffff880119e79330: 14 00 0a 00 01 03 fc 40 45 56 11 22 33 10 00 05 .......@EV."3... [ 41.456431] Object ffff880119e79340: f0 ff ff ff 88 99 aa bb 00 14 00 0a 00 06 fe fb ................ ^^ start of batch nlmsg with nlmsg_len=4294967280 ... [ 41.456431] Memory state around the buggy address: [ 41.456431] ffff880119e79400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 41.456431] ffff880119e79480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 41.456431] >ffff880119e79500: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc [ 41.456431] ^ [ 41.456431] ffff880119e79580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 41.456431] ffff880119e79600: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb [ 41.456431] ================================================================== Fix this with better validation of nlh->nlmsg_len and by setting NFNL_BATCH_FAILURE if any batch message fails length validation. CAP_NET_ADMIN is required to trigger the bugs. Fixes: 9ea2aa8b7dba ("netfilter: nfnetlink: validate nfnetlink header from batch") Signed-off-by: Phil Turnbull Signed-off-by: Pablo Neira Ayuso Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nfnetlink.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index 77afe913d03d..9adedba78eea 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -326,10 +326,12 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh, nlh = nlmsg_hdr(skb); err = 0; - if (nlmsg_len(nlh) < sizeof(struct nfgenmsg) || - skb->len < nlh->nlmsg_len) { - err = -EINVAL; - goto ack; + if (nlh->nlmsg_len < NLMSG_HDRLEN || + skb->len < nlh->nlmsg_len || + nlmsg_len(nlh) < sizeof(struct nfgenmsg)) { + nfnl_err_reset(&err_list); + status |= NFNL_BATCH_FAILURE; + goto done; } /* Only requests are handled by the kernel */ From 65d30f7545ffdddcf10a59f3e54b032c5ade2e9d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Kube=C4=8Dek?= Date: Fri, 2 Dec 2016 09:33:41 +0100 Subject: [PATCH 448/521] tipc: check minimum bearer MTU MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 3de81b758853f0b29c61e246679d20b513c4cfec upstream. Qian Zhang (张谦) reported a potential socket buffer overflow in tipc_msg_build() which is also known as CVE-2016-8632: due to insufficient checks, a buffer overflow can occur if MTU is too short for even tipc headers. As anyone can set device MTU in a user/net namespace, this issue can be abused by a regular user. As agreed in the discussion on Ben Hutchings' original patch, we should check the MTU at the moment a bearer is attached rather than for each processed packet. We also need to repeat the check when bearer MTU is adjusted to new device MTU. UDP case also needs a check to avoid overflow when calculating bearer MTU. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Michal Kubecek Reported-by: Qian Zhang (张谦) Acked-by: Ying Xue Signed-off-by: David S. Miller [bwh: Backported to 4.4: - Adjust context - NETDEV_GOING_DOWN and NETDEV_CHANGEMTU cases in net notifier were combined] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- net/tipc/bearer.c | 13 +++++++++++-- net/tipc/bearer.h | 13 +++++++++++++ net/tipc/udp_media.c | 5 +++++ 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 648f2a67f314..cb1381513c82 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -381,6 +381,10 @@ int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b, dev = dev_get_by_name(net, driver_name); if (!dev) return -ENODEV; + if (tipc_mtu_bad(dev, 0)) { + dev_put(dev); + return -EINVAL; + } /* Associate TIPC bearer with L2 bearer */ rcu_assign_pointer(b->media_ptr, dev); @@ -570,14 +574,19 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt, if (!b_ptr) return NOTIFY_DONE; - b_ptr->mtu = dev->mtu; - switch (evt) { case NETDEV_CHANGE: if (netif_carrier_ok(dev)) break; case NETDEV_GOING_DOWN: + tipc_reset_bearer(net, b_ptr); + break; case NETDEV_CHANGEMTU: + if (tipc_mtu_bad(dev, 0)) { + bearer_disable(net, b_ptr); + break; + } + b_ptr->mtu = dev->mtu; tipc_reset_bearer(net, b_ptr); break; case NETDEV_CHANGEADDR: diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 552185bc4773..5f11e18b1fa1 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -39,6 +39,7 @@ #include "netlink.h" #include "core.h" +#include "msg.h" #include #define MAX_MEDIA 3 @@ -61,6 +62,9 @@ #define TIPC_MEDIA_TYPE_IB 2 #define TIPC_MEDIA_TYPE_UDP 3 +/* minimum bearer MTU */ +#define TIPC_MIN_BEARER_MTU (MAX_H_SIZE + INT_H_SIZE) + /** * struct tipc_node_map - set of node identifiers * @count: # of nodes in set @@ -226,4 +230,13 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id, void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id, struct sk_buff_head *xmitq); +/* check if device MTU is too low for tipc headers */ +static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int reserve) +{ + if (dev->mtu >= TIPC_MIN_BEARER_MTU + reserve) + return false; + netdev_warn(dev, "MTU too low for tipc bearer\n"); + return true; +} + #endif /* _TIPC_BEARER_H */ diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 4056798c54a5..78d6b78de29d 100644 --- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -376,6 +376,11 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b, udp_conf.local_ip.s_addr = htonl(INADDR_ANY); udp_conf.use_udp_checksums = false; ub->ifindex = dev->ifindex; + if (tipc_mtu_bad(dev, sizeof(struct iphdr) + + sizeof(struct udphdr))) { + err = -EINVAL; + goto err; + } b->mtu = dev->mtu - sizeof(struct iphdr) - sizeof(struct udphdr); #if IS_ENABLED(CONFIG_IPV6) From d23ef85b123d3dbd3ba8a3c5f0ef5e556feb635e Mon Sep 17 00:00:00 2001 From: Vlad Tsyrklevich Date: Wed, 12 Oct 2016 18:51:24 +0200 Subject: [PATCH 449/521] vfio/pci: Fix integer overflows, bitmask check commit 05692d7005a364add85c6e25a6c4447ce08f913a upstream. The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize user-supplied integers, potentially allowing memory corruption. This patch adds appropriate integer overflow checks, checks the range bounds for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set. VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in vfio_pci_set_irqs_ioctl(). Furthermore, a kzalloc is changed to a kcalloc because the use of a kzalloc with an integer multiplication allowed an integer overflow condition to be reached without this patch. kcalloc checks for overflow and should prevent a similar occurrence. Signed-off-by: Vlad Tsyrklevich Signed-off-by: Alex Williamson Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- drivers/vfio/pci/vfio_pci.c | 33 ++++++++++++++++++++----------- drivers/vfio/pci/vfio_pci_intrs.c | 2 +- 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 9982cb176ce8..830e2fd47642 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -562,8 +562,9 @@ static long vfio_pci_ioctl(void *device_data, } else if (cmd == VFIO_DEVICE_SET_IRQS) { struct vfio_irq_set hdr; + size_t size; u8 *data = NULL; - int ret = 0; + int max, ret = 0; minsz = offsetofend(struct vfio_irq_set, count); @@ -571,23 +572,31 @@ static long vfio_pci_ioctl(void *device_data, return -EFAULT; if (hdr.argsz < minsz || hdr.index >= VFIO_PCI_NUM_IRQS || + hdr.count >= (U32_MAX - hdr.start) || hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK | VFIO_IRQ_SET_ACTION_TYPE_MASK)) return -EINVAL; - if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) { - size_t size; - int max = vfio_pci_get_irq_count(vdev, hdr.index); + max = vfio_pci_get_irq_count(vdev, hdr.index); + if (hdr.start >= max || hdr.start + hdr.count > max) + return -EINVAL; - if (hdr.flags & VFIO_IRQ_SET_DATA_BOOL) - size = sizeof(uint8_t); - else if (hdr.flags & VFIO_IRQ_SET_DATA_EVENTFD) - size = sizeof(int32_t); - else - return -EINVAL; + switch (hdr.flags & VFIO_IRQ_SET_DATA_TYPE_MASK) { + case VFIO_IRQ_SET_DATA_NONE: + size = 0; + break; + case VFIO_IRQ_SET_DATA_BOOL: + size = sizeof(uint8_t); + break; + case VFIO_IRQ_SET_DATA_EVENTFD: + size = sizeof(int32_t); + break; + default: + return -EINVAL; + } - if (hdr.argsz - minsz < hdr.count * size || - hdr.start >= max || hdr.start + hdr.count > max) + if (size) { + if (hdr.argsz - minsz < hdr.count * size) return -EINVAL; data = memdup_user((void __user *)(arg + minsz), diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 20e9a86d2dcf..5c8f767b6368 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -255,7 +255,7 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix) if (!is_irq_none(vdev)) return -EINVAL; - vdev->ctx = kzalloc(nvec * sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL); + vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL); if (!vdev->ctx) return -ENOMEM; From a7544fdd1626b65db635022c9d36007bb32dd6d8 Mon Sep 17 00:00:00 2001 From: EunTaik Lee Date: Wed, 24 Feb 2016 04:38:06 +0000 Subject: [PATCH 450/521] staging/android/ion : fix a race condition in the ion driver commit 9590232bb4f4cc824f3425a6e1349afbe6d6d2b7 upstream. There is a use-after-free problem in the ion driver. This is caused by a race condition in the ion_ioctl() function. A handle has ref count of 1 and two tasks on different cpus calls ION_IOC_FREE simultaneously. cpu 0 cpu 1 ------------------------------------------------------- ion_handle_get_by_id() (ref == 2) ion_handle_get_by_id() (ref == 3) ion_free() (ref == 2) ion_handle_put() (ref == 1) ion_free() (ref == 0 so ion_handle_destroy() is called and the handle is freed.) ion_handle_put() is called and it decreases the slub's next free pointer The problem is detected as an unaligned access in the spin lock functions since it uses load exclusive instruction. In some cases it corrupts the slub's free pointer which causes a mis-aligned access to the next free pointer.(kmalloc returns a pointer like ffffc0745b4580aa). And it causes lots of other hard-to-debug problems. This symptom is caused since the first member in the ion_handle structure is the reference count and the ion driver decrements the reference after it has been freed. To fix this problem client->lock mutex is extended to protect all the codes that uses the handle. Signed-off-by: Eun Taik Lee Reviewed-by: Laura Abbott Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman index 7ff2a7ec871f..33b390e7ea31 --- drivers/staging/android/ion/ion.c | 59 +++++++++++++++++++++++-------- 1 file changed, 44 insertions(+), 15 deletions(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index df560216d702..374f840f31a4 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -387,13 +387,22 @@ static void ion_handle_get(struct ion_handle *handle) kref_get(&handle->ref); } -static int ion_handle_put(struct ion_handle *handle) +static int ion_handle_put_nolock(struct ion_handle *handle) +{ + int ret; + + ret = kref_put(&handle->ref, ion_handle_destroy); + + return ret; +} + +int ion_handle_put(struct ion_handle *handle) { struct ion_client *client = handle->client; int ret; mutex_lock(&client->lock); - ret = kref_put(&handle->ref, ion_handle_destroy); + ret = ion_handle_put_nolock(handle); mutex_unlock(&client->lock); return ret; @@ -417,18 +426,28 @@ static struct ion_handle *ion_handle_lookup(struct ion_client *client, return ERR_PTR(-EINVAL); } -static struct ion_handle *ion_handle_get_by_id(struct ion_client *client, +static struct ion_handle *ion_handle_get_by_id_nolock(struct ion_client *client, + int id) +{ + struct ion_handle *handle; + + handle = idr_find(&client->idr, id); + if (handle) + ion_handle_get(handle); + + return handle ? handle : ERR_PTR(-EINVAL); +} + +struct ion_handle *ion_handle_get_by_id(struct ion_client *client, int id) { struct ion_handle *handle; mutex_lock(&client->lock); - handle = idr_find(&client->idr, id); - if (handle) - ion_handle_get(handle); + handle = ion_handle_get_by_id_nolock(client, id); mutex_unlock(&client->lock); - return handle ? handle : ERR_PTR(-EINVAL); + return handle; } static bool ion_handle_validate(struct ion_client *client, @@ -532,22 +551,28 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len, } EXPORT_SYMBOL(ion_alloc); -void ion_free(struct ion_client *client, struct ion_handle *handle) +static void ion_free_nolock(struct ion_client *client, struct ion_handle *handle) { bool valid_handle; BUG_ON(client != handle->client); - mutex_lock(&client->lock); valid_handle = ion_handle_validate(client, handle); if (!valid_handle) { WARN(1, "%s: invalid handle passed to free.\n", __func__); - mutex_unlock(&client->lock); return; } + ion_handle_put_nolock(handle); +} + +void ion_free(struct ion_client *client, struct ion_handle *handle) +{ + BUG_ON(client != handle->client); + + mutex_lock(&client->lock); + ion_free_nolock(client, handle); mutex_unlock(&client->lock); - ion_handle_put(handle); } EXPORT_SYMBOL(ion_free); @@ -1283,11 +1308,15 @@ static long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct ion_handle *handle; - handle = ion_handle_get_by_id(client, data.handle.handle); - if (IS_ERR(handle)) + mutex_lock(&client->lock); + handle = ion_handle_get_by_id_nolock(client, data.handle.handle); + if (IS_ERR(handle)) { + mutex_unlock(&client->lock); return PTR_ERR(handle); - ion_free(client, handle); - ion_handle_put(handle); + } + ion_free_nolock(client, handle); + ion_handle_put_nolock(handle); + mutex_unlock(&client->lock); break; } case ION_IOC_SHARE: From b7f47c794bc45eae975bf2a52a4463333111bb2a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 24 Mar 2017 19:36:13 -0700 Subject: [PATCH 451/521] ping: implement proper locking commit 43a6684519ab0a6c52024b5e25322476cabad893 upstream. We got a report of yet another bug in ping http://www.openwall.com/lists/oss-security/2017/03/24/6 ->disconnect() is not called with socket lock held. Fix this by acquiring ping rwlock earlier. Thanks to Daniel, Alexander and Andrey for letting us know this problem. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet Reported-by: Daniel Jiang Reported-by: Solar Designer Reported-by: Andrey Konovalov Signed-off-by: David S. Miller Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- net/ipv4/ping.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 3a00512addbc..37a3b05d175c 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -154,17 +154,18 @@ void ping_hash(struct sock *sk) void ping_unhash(struct sock *sk) { struct inet_sock *isk = inet_sk(sk); + pr_debug("ping_unhash(isk=%p,isk->num=%u)\n", isk, isk->inet_num); + write_lock_bh(&ping_table.lock); if (sk_hashed(sk)) { - write_lock_bh(&ping_table.lock); hlist_nulls_del(&sk->sk_nulls_node); sk_nulls_node_init(&sk->sk_nulls_node); sock_put(sk); isk->inet_num = 0; isk->inet_sport = 0; sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); - write_unlock_bh(&ping_table.lock); } + write_unlock_bh(&ping_table.lock); } EXPORT_SYMBOL_GPL(ping_unhash); From 416bd4a366f3b4cd3f6a3246f91bd9f425891547 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 11 Jan 2017 21:09:50 +0100 Subject: [PATCH 452/521] perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race commit 321027c1fe77f892f4ea07846aeae08cefbbb290 upstream. Di Shen reported a race between two concurrent sys_perf_event_open() calls where both try and move the same pre-existing software group into a hardware context. The problem is exactly that described in commit: f63a8daa5812 ("perf: Fix event->ctx locking") ... where, while we wait for a ctx->mutex acquisition, the event->ctx relation can have changed under us. That very same commit failed to recognise sys_perf_event_context() as an external access vector to the events and thereby didn't apply the established locking rules correctly. So while one sys_perf_event_open() call is stuck waiting on mutex_lock_double(), the other (which owns said locks) moves the group about. So by the time the former sys_perf_event_open() acquires the locks, the context we've acquired is stale (and possibly dead). Apply the established locking rules as per perf_event_ctx_lock_nested() to the mutex_lock_double() for the 'move_group' case. This obviously means we need to validate state after we acquire the locks. Reported-by: Di Shen (Keen Lab) Tested-by: John Dias Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Kees Cook Cc: Linus Torvalds Cc: Min Chong Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar [bwh: Backported to 4.4: - Test perf_event::group_flags instead of group_caps - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- kernel/events/core.c | 57 ++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 53 insertions(+), 4 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index e4b5494f05f8..784ab8fe8714 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8250,6 +8250,37 @@ static int perf_event_set_clock(struct perf_event *event, clockid_t clk_id) return 0; } +/* + * Variation on perf_event_ctx_lock_nested(), except we take two context + * mutexes. + */ +static struct perf_event_context * +__perf_event_ctx_lock_double(struct perf_event *group_leader, + struct perf_event_context *ctx) +{ + struct perf_event_context *gctx; + +again: + rcu_read_lock(); + gctx = READ_ONCE(group_leader->ctx); + if (!atomic_inc_not_zero(&gctx->refcount)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + + mutex_lock_double(&gctx->mutex, &ctx->mutex); + + if (group_leader->ctx != gctx) { + mutex_unlock(&ctx->mutex); + mutex_unlock(&gctx->mutex); + put_ctx(gctx); + goto again; + } + + return gctx; +} + /** * sys_perf_event_open - open a performance event, associate it to a task/cpu * @@ -8486,8 +8517,26 @@ SYSCALL_DEFINE5(perf_event_open, } if (move_group) { - gctx = group_leader->ctx; - mutex_lock_double(&gctx->mutex, &ctx->mutex); + gctx = __perf_event_ctx_lock_double(group_leader, ctx); + + /* + * Check if we raced against another sys_perf_event_open() call + * moving the software group underneath us. + */ + if (!(group_leader->group_flags & PERF_GROUP_SOFTWARE)) { + /* + * If someone moved the group out from under us, check + * if this new event wound up on the same ctx, if so + * its the regular !move_group case, otherwise fail. + */ + if (gctx != ctx) { + err = -EINVAL; + goto err_locked; + } else { + perf_event_ctx_unlock(group_leader, gctx); + move_group = 0; + } + } } else { mutex_lock(&ctx->mutex); } @@ -8582,7 +8631,7 @@ SYSCALL_DEFINE5(perf_event_open, perf_unpin_context(ctx); if (move_group) - mutex_unlock(&gctx->mutex); + perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); if (task) { @@ -8610,7 +8659,7 @@ SYSCALL_DEFINE5(perf_event_open, err_locked: if (move_group) - mutex_unlock(&gctx->mutex); + perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); /* err_file: */ fput(event_file); From 418b99042b87b2b08a5d4f7f19e775f10211d431 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 30 Apr 2017 05:50:11 +0200 Subject: [PATCH 453/521] Linux 4.4.65 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 17708f5dc169..ddaef04f528a 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 64 +SUBLEVEL = 65 EXTRAVERSION = NAME = Blurry Fish Butt From 33a79a60f5a5d924e3b31f496ecda1c250a7f03b Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 16 Nov 2016 17:31:31 +0000 Subject: [PATCH 454/521] BACKPORT: arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions The core and the cluster sleep state entry latencies can't be same as cluster sleep involves more work compared to core level e.g. shared cache maintenance. Experiments have shown on an average about 100us more latency for the cluster sleep state compared to the core level sleep. This patch fixes the entry latency for the cluster sleep state. Fixes: 28e10a8f3a03 ("arm64: dts: juno: Add idle-states to device tree") Cc: Lorenzo Pieralisi Cc: "Jon Medhurst (Tixy)" Reviewed-by: Liviu Dudau Signed-off-by: Sudeep Holla Signed-off-by: Arnd Bergmann Fixes: Change-Id: I7b2d81fa66f8ce8b229457cfefff06e9edd545c7 (arm64: dts: juno: Add idle-states to device tree) (cherry picked from commit 909e481e2467f202b97d42beef246e8829416a85) Signed-off-by: Amit Pundir --- arch/arm64/boot/dts/arm/juno-r1.dts | 2 +- arch/arm64/boot/dts/arm/juno.dts | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/boot/dts/arm/juno-r1.dts b/arch/arm64/boot/dts/arm/juno-r1.dts index 8826f834f54f..29315af22147 100644 --- a/arch/arm64/boot/dts/arm/juno-r1.dts +++ b/arch/arm64/boot/dts/arm/juno-r1.dts @@ -76,7 +76,7 @@ CLUSTER_SLEEP_0: cluster-sleep-0 { compatible = "arm,idle-state"; arm,psci-suspend-param = <0x1010000>; local-timer-stop; - entry-latency-us = <300>; + entry-latency-us = <400>; exit-latency-us = <1200>; min-residency-us = <2500>; }; diff --git a/arch/arm64/boot/dts/arm/juno.dts b/arch/arm64/boot/dts/arm/juno.dts index 8830f12a8289..68816f71fa51 100644 --- a/arch/arm64/boot/dts/arm/juno.dts +++ b/arch/arm64/boot/dts/arm/juno.dts @@ -76,7 +76,7 @@ CLUSTER_SLEEP_0: cluster-sleep-0 { compatible = "arm,idle-state"; arm,psci-suspend-param = <0x1010000>; local-timer-stop; - entry-latency-us = <300>; + entry-latency-us = <400>; exit-latency-us = <1200>; min-residency-us = <2500>; }; From 37645a5f31c9cef322fcbd0a45dcb86ca46a2cda Mon Sep 17 00:00:00 2001 From: Lianwei Wang Date: Sun, 19 Jun 2016 23:52:27 -0700 Subject: [PATCH 455/521] UPSTREAM: PM / sleep: make PM notifiers called symmetrically (cherry picked from commit ea00f4f4f00cc2bc3b63ad512a4e6df3b20832b9) This makes pm notifier PREPARE/POST symmetrical: if PREPARE fails, we will only undo what ever happened on PREPARE. It fixes the unbalanced CPU hotplug enable in CPU PM notifier. Change-Id: I01dce3cc95c5d6b8913b7b6be301f2909258c745 Signed-off-by: Lianwei Wang Signed-off-by: Rafael J. Wysocki --- kernel/power/hibernate.c | 20 ++++++++++++-------- kernel/power/main.c | 11 +++++++++-- kernel/power/power.h | 2 ++ kernel/power/suspend.c | 10 ++++++---- kernel/power/user.c | 14 ++++++++------ 5 files changed, 37 insertions(+), 20 deletions(-) diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index 3124cebaec31..797f19e2aaa9 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -647,7 +647,7 @@ static void power_down(void) */ int hibernate(void) { - int error; + int error, nr_calls = 0; if (!hibernation_available()) { pr_debug("PM: Hibernation not available.\n"); @@ -662,9 +662,11 @@ int hibernate(void) } pm_prepare_console(); - error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE); - if (error) + error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls); + if (error) { + nr_calls--; goto Exit; + } printk(KERN_INFO "PM: Syncing filesystems ... "); sys_sync(); @@ -714,7 +716,7 @@ int hibernate(void) /* Don't bother checking whether freezer_test_done is true */ freezer_test_done = false; Exit: - pm_notifier_call_chain(PM_POST_HIBERNATION); + __pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL); pm_restore_console(); atomic_inc(&snapshot_device_available); Unlock: @@ -740,7 +742,7 @@ int hibernate(void) */ static int software_resume(void) { - int error; + int error, nr_calls = 0; unsigned int flags; /* @@ -827,9 +829,11 @@ static int software_resume(void) } pm_prepare_console(); - error = pm_notifier_call_chain(PM_RESTORE_PREPARE); - if (error) + error = __pm_notifier_call_chain(PM_RESTORE_PREPARE, -1, &nr_calls); + if (error) { + nr_calls--; goto Close_Finish; + } pr_debug("PM: Preparing processes for restore.\n"); error = freeze_processes(); @@ -855,7 +859,7 @@ static int software_resume(void) unlock_device_hotplug(); thaw_processes(); Finish: - pm_notifier_call_chain(PM_POST_RESTORE); + __pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL); pm_restore_console(); atomic_inc(&snapshot_device_available); /* For success case, the suspend path will release the lock */ diff --git a/kernel/power/main.c b/kernel/power/main.c index 27946975eff0..5ea50b1b7595 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -38,12 +38,19 @@ int unregister_pm_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_pm_notifier); -int pm_notifier_call_chain(unsigned long val) +int __pm_notifier_call_chain(unsigned long val, int nr_to_call, int *nr_calls) { - int ret = blocking_notifier_call_chain(&pm_chain_head, val, NULL); + int ret; + + ret = __blocking_notifier_call_chain(&pm_chain_head, val, NULL, + nr_to_call, nr_calls); return notifier_to_errno(ret); } +int pm_notifier_call_chain(unsigned long val) +{ + return __pm_notifier_call_chain(val, -1, NULL); +} /* If set, devices may be suspended and resumed asynchronously. */ int pm_async_enabled = 1; diff --git a/kernel/power/power.h b/kernel/power/power.h index efe1b3b17c88..51f02ecaf125 100644 --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -200,6 +200,8 @@ static inline void suspend_test_finish(const char *label) {} #ifdef CONFIG_PM_SLEEP /* kernel/power/main.c */ +extern int __pm_notifier_call_chain(unsigned long val, int nr_to_call, + int *nr_calls); extern int pm_notifier_call_chain(unsigned long val); #endif diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 024411816ccf..58209d8bfc56 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -268,16 +268,18 @@ static int suspend_test(int level) */ static int suspend_prepare(suspend_state_t state) { - int error; + int error, nr_calls = 0; if (!sleep_state_supported(state)) return -EPERM; pm_prepare_console(); - error = pm_notifier_call_chain(PM_SUSPEND_PREPARE); - if (error) + error = __pm_notifier_call_chain(PM_SUSPEND_PREPARE, -1, &nr_calls); + if (error) { + nr_calls--; goto Finish; + } trace_suspend_resume(TPS("freeze_processes"), 0, true); error = suspend_freeze_processes(); @@ -288,7 +290,7 @@ static int suspend_prepare(suspend_state_t state) suspend_stats.failed_freeze++; dpm_save_failed_step(SUSPEND_FREEZE); Finish: - pm_notifier_call_chain(PM_POST_SUSPEND); + __pm_notifier_call_chain(PM_POST_SUSPEND, nr_calls, NULL); pm_restore_console(); return error; } diff --git a/kernel/power/user.c b/kernel/power/user.c index 526e8911460a..35310b627388 100644 --- a/kernel/power/user.c +++ b/kernel/power/user.c @@ -47,7 +47,7 @@ atomic_t snapshot_device_available = ATOMIC_INIT(1); static int snapshot_open(struct inode *inode, struct file *filp) { struct snapshot_data *data; - int error; + int error, nr_calls = 0; if (!hibernation_available()) return -EPERM; @@ -74,9 +74,9 @@ static int snapshot_open(struct inode *inode, struct file *filp) swap_type_of(swsusp_resume_device, 0, NULL) : -1; data->mode = O_RDONLY; data->free_bitmaps = false; - error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE); + error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls); if (error) - pm_notifier_call_chain(PM_POST_HIBERNATION); + __pm_notifier_call_chain(PM_POST_HIBERNATION, --nr_calls, NULL); } else { /* * Resuming. We may need to wait for the image device to @@ -86,13 +86,15 @@ static int snapshot_open(struct inode *inode, struct file *filp) data->swap = -1; data->mode = O_WRONLY; - error = pm_notifier_call_chain(PM_RESTORE_PREPARE); + error = __pm_notifier_call_chain(PM_RESTORE_PREPARE, -1, &nr_calls); if (!error) { error = create_basic_memory_bitmaps(); data->free_bitmaps = !error; - } + } else + nr_calls--; + if (error) - pm_notifier_call_chain(PM_POST_RESTORE); + __pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL); } if (error) atomic_inc(&snapshot_device_available); From 8bbc8d5787cabafc0b782304882f5a7f2c80fbbb Mon Sep 17 00:00:00 2001 From: Wei Wang Date: Fri, 7 Apr 2017 15:22:19 -0700 Subject: [PATCH 456/521] UPSTREAM: checkpatch: special audit for revert commit line Currently checkpatch.pl does not recognize git's default commit revert message and will complain about the hash format. Add special audit for revert commit message line to fix it. Signed-off-by: Wei Wang Acked-by: Joe Perches Bug: 37158168 Test: checkpatch.pl --patch [diff] and no longer see failure Change-Id: I65cf9a46874621dd6d5c349d2d3ca3b862d61ba3 --- scripts/checkpatch.pl | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 2b3c22808c3b..e70147742cce 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2348,6 +2348,7 @@ sub process { # Check for git id commit length and improperly formed commit descriptions if ($in_commit_log && !$commit_log_possible_stack_dump && + $line !~ /^This reverts commit [0-9a-f]{7,40}/ && ($line =~ /\bcommit\s+[0-9a-f]{5,}\b/i || ($line =~ /\b[0-9a-f]{12,40}\b/i && $line !~ /[\<\[][0-9a-f]{12,40}[\>\]]/i && From ed966e3649596cbb488a5dbf178bacf965a26e4c Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Mon, 10 Apr 2017 20:54:30 -0700 Subject: [PATCH 457/521] ANDROID: sdcardfs: Directly pass lower file for mmap Instead of relying on a copy hack, pass the lower file as private data. This lets the kernel find the vma mapping for pages used by the file, allowing pages used by mapping to be reclaimed. This is adapted from following esdfs patches commit 0647e638d: ("esdfs: store lower file in vm_file for mmap") commit 064850866: ("esdfs: keep a counter for mmaped file") Change-Id: I75b74d1e5061db1b8c13be38d184e118c0851a1a Signed-off-by: Daniel Rosenberg --- fs/sdcardfs/file.c | 3 +++ fs/sdcardfs/mmap.c | 57 ++++++++++++++++++---------------------------- 2 files changed, 25 insertions(+), 35 deletions(-) diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c index c0146e03fa2e..1f6921e2ffbf 100644 --- a/fs/sdcardfs/file.c +++ b/fs/sdcardfs/file.c @@ -192,6 +192,9 @@ static int sdcardfs_mmap(struct file *file, struct vm_area_struct *vma) file->f_mapping->a_ops = &sdcardfs_aops; /* set our aops */ if (!SDCARDFS_F(file)->lower_vm_ops) /* save for our ->fault */ SDCARDFS_F(file)->lower_vm_ops = saved_vm_ops; + vma->vm_private_data = file; + get_file(lower_file); + vma->vm_file = lower_file; out: return err; diff --git a/fs/sdcardfs/mmap.c b/fs/sdcardfs/mmap.c index 0d4089c62c3a..b61f82275e7d 100644 --- a/fs/sdcardfs/mmap.c +++ b/fs/sdcardfs/mmap.c @@ -23,60 +23,45 @@ static int sdcardfs_fault(struct vm_area_struct *vma, struct vm_fault *vmf) { int err; - struct file *file, *lower_file; + struct file *file; const struct vm_operations_struct *lower_vm_ops; - struct vm_area_struct lower_vma; - memcpy(&lower_vma, vma, sizeof(struct vm_area_struct)); - file = lower_vma.vm_file; + file = (struct file *)vma->vm_private_data; lower_vm_ops = SDCARDFS_F(file)->lower_vm_ops; BUG_ON(!lower_vm_ops); - lower_file = sdcardfs_lower_file(file); - /* - * XXX: vm_ops->fault may be called in parallel. Because we have to - * resort to temporarily changing the vma->vm_file to point to the - * lower file, a concurrent invocation of sdcardfs_fault could see a - * different value. In this workaround, we keep a different copy of - * the vma structure in our stack, so we never expose a different - * value of the vma->vm_file called to us, even temporarily. A - * better fix would be to change the calling semantics of ->fault to - * take an explicit file pointer. - */ - lower_vma.vm_file = lower_file; - err = lower_vm_ops->fault(&lower_vma, vmf); + err = lower_vm_ops->fault(vma, vmf); return err; } +static void sdcardfs_vm_open(struct vm_area_struct *vma) +{ + struct file *file = (struct file *)vma->vm_private_data; + + get_file(file); +} + +static void sdcardfs_vm_close(struct vm_area_struct *vma) +{ + struct file *file = (struct file *)vma->vm_private_data; + + fput(file); +} + static int sdcardfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) { int err = 0; - struct file *file, *lower_file; + struct file *file; const struct vm_operations_struct *lower_vm_ops; - struct vm_area_struct lower_vma; - memcpy(&lower_vma, vma, sizeof(struct vm_area_struct)); - file = lower_vma.vm_file; + file = (struct file *)vma->vm_private_data; lower_vm_ops = SDCARDFS_F(file)->lower_vm_ops; BUG_ON(!lower_vm_ops); if (!lower_vm_ops->page_mkwrite) goto out; - lower_file = sdcardfs_lower_file(file); - /* - * XXX: vm_ops->page_mkwrite may be called in parallel. - * Because we have to resort to temporarily changing the - * vma->vm_file to point to the lower file, a concurrent - * invocation of sdcardfs_page_mkwrite could see a different - * value. In this workaround, we keep a different copy of the - * vma structure in our stack, so we never expose a different - * value of the vma->vm_file called to us, even temporarily. - * A better fix would be to change the calling semantics of - * ->page_mkwrite to take an explicit file pointer. - */ - lower_vma.vm_file = lower_file; - err = lower_vm_ops->page_mkwrite(&lower_vma, vmf); + err = lower_vm_ops->page_mkwrite(vma, vmf); out: return err; } @@ -99,4 +84,6 @@ const struct address_space_operations sdcardfs_aops = { const struct vm_operations_struct sdcardfs_vm_ops = { .fault = sdcardfs_fault, .page_mkwrite = sdcardfs_page_mkwrite, + .open = sdcardfs_vm_open, + .close = sdcardfs_vm_close, }; From c14efef048c60c9f1612393a470a8e62e4d94b93 Mon Sep 17 00:00:00 2001 From: Lorenzo Colitti Date: Fri, 23 Dec 2016 00:33:57 +0900 Subject: [PATCH 458/521] UPSTREAM: net: ipv4: Don't crash if passing a null sk to ip_do_redirect. Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.") made ip_do_redirect call sock_net(sk) to determine the network namespace of the passed-in socket. This crashes if sk is NULL. Fix this by getting the network namespace from the skb instead. Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.") Signed-off-by: Lorenzo Colitti Signed-off-by: David S. Miller Change-Id: I16a3c343cb142c482ca6dd363c28b3a12d73a46d Fixes: Change-Id: I910504b508948057912bc188fd1e8aca28294de3 ("net: inet: Support UID-based routing in IP protocols.") (cherry picked from commit 7d99569460eae28b187d574aec930a4cf8b90441) Signed-off-by: Amit Pundir --- net/ipv4/route.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 9af0f461b00d..d162ce41f761 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -792,6 +792,7 @@ static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buf struct rtable *rt; struct flowi4 fl4; const struct iphdr *iph = (const struct iphdr *) skb->data; + struct net *net = dev_net(skb->dev); int oif = skb->dev->ifindex; u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; @@ -799,7 +800,7 @@ static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buf rt = (struct rtable *) dst; - __build_flow_key(sock_net(sk), &fl4, sk, iph, oif, tos, prot, mark, 0); + __build_flow_key(net, &fl4, sk, iph, oif, tos, prot, mark, 0); __ip_do_redirect(rt, skb, &fl4, true); } From 87a98325b3d4b248d4058774a39eca2515a76a9b Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Tue, 10 Jan 2017 09:30:51 +0100 Subject: [PATCH 459/521] UPSTREAM: net: socket: Make unnecessarily global sockfs_setattr() static MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Make sockfs_setattr() static as it is not used outside of net/socket.c This fixes the following GCC warning: net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes] Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Cc: Lorenzo Colitti Signed-off-by: Tobias Klauser Acked-by: Lorenzo Colitti Signed-off-by: David S. Miller Change-Id: Ie613c441b3fe081bdaec8c480d3aade482873bf8 Fixes: Change-Id: Idbc3e9a0cec91c4c6e01916b967b6237645ebe59 ("net: core: Add a UID field to struct sock.") (cherry picked from commit dc647ec88e029307e60e6bf9988056605f11051a) Signed-off-by: Amit Pundir --- net/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/socket.c b/net/socket.c index 1489761b371e..18aff3d804ec 100644 --- a/net/socket.c +++ b/net/socket.c @@ -520,7 +520,7 @@ static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer, return used; } -int sockfs_setattr(struct dentry *dentry, struct iattr *iattr) +static int sockfs_setattr(struct dentry *dentry, struct iattr *iattr) { int err = simple_setattr(dentry, iattr); From 2ffd5f0b9d77573380c7cf5fc79cc7d5ea03a1d9 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 22 Feb 2016 11:49:09 -0500 Subject: [PATCH 460/521] BACKPORT: [UPSTREAM] mbcache2: reimplement mbcache (Cherry-pick from commit f9a61eb4e2471c56a63cd804c7474128138c38ac) Original mbcache was designed to have more features than what ext? filesystems ended up using. It supported entry being in more hashes, it had a home-grown rwlocking of each entry, and one cache could cache entries from multiple filesystems. This genericity also resulted in more complex locking, larger cache entries, and generally more code complexity. This is reimplementation of the mbcache functionality to exactly fit the purpose ext? filesystems use it for. Cache entries are now considerably smaller (7 instead of 13 longs), the code is considerably smaller as well (414 vs 913 lines of code), and IMO also simpler. The new code is also much more lightweight. I have measured the speed using artificial xattr-bench benchmark, which spawns P processes, each process sets xattr for F different files, and the value of xattr is randomly chosen from a pool of V values. Averages of runtimes for 5 runs for various combinations of parameters are below. The first value in each cell is old mbache, the second value is the new mbcache. V=10 F\P 1 2 4 8 16 32 64 10 0.158,0.157 0.208,0.196 0.500,0.277 0.798,0.400 3.258,0.584 13.807,1.047 61.339,2.803 100 0.172,0.167 0.279,0.222 0.520,0.275 0.825,0.341 2.981,0.505 12.022,1.202 44.641,2.943 1000 0.185,0.174 0.297,0.239 0.445,0.283 0.767,0.340 2.329,0.480 6.342,1.198 16.440,3.888 V=100 F\P 1 2 4 8 16 32 64 10 0.162,0.153 0.200,0.186 0.362,0.257 0.671,0.496 1.433,0.943 3.801,1.345 7.938,2.501 100 0.153,0.160 0.221,0.199 0.404,0.264 0.945,0.379 1.556,0.485 3.761,1.156 7.901,2.484 1000 0.215,0.191 0.303,0.246 0.471,0.288 0.960,0.347 1.647,0.479 3.916,1.176 8.058,3.160 V=1000 F\P 1 2 4 8 16 32 64 10 0.151,0.129 0.210,0.163 0.326,0.245 0.685,0.521 1.284,0.859 3.087,2.251 6.451,4.801 100 0.154,0.153 0.211,0.191 0.276,0.282 0.687,0.506 1.202,0.877 3.259,1.954 8.738,2.887 1000 0.145,0.179 0.202,0.222 0.449,0.319 0.899,0.333 1.577,0.524 4.221,1.240 9.782,3.579 V=10000 F\P 1 2 4 8 16 32 64 10 0.161,0.154 0.198,0.190 0.296,0.256 0.662,0.480 1.192,0.818 2.989,2.200 6.362,4.746 100 0.176,0.174 0.236,0.203 0.326,0.255 0.696,0.511 1.183,0.855 4.205,3.444 19.510,17.760 1000 0.199,0.183 0.240,0.227 1.159,1.014 2.286,2.154 6.023,6.039 ---,10.933 ---,36.620 V=100000 F\P 1 2 4 8 16 32 64 10 0.171,0.162 0.204,0.198 0.285,0.230 0.692,0.500 1.225,0.881 2.990,2.243 6.379,4.771 100 0.151,0.171 0.220,0.210 0.295,0.255 0.720,0.518 1.226,0.844 3.423,2.831 19.234,17.544 1000 0.192,0.189 0.249,0.225 1.162,1.043 2.257,2.093 5.853,4.997 ---,10.399 ---,32.198 We see that the new code is faster in pretty much all the cases and starting from 4 processes there are significant gains with the new code resulting in upto 20-times shorter runtimes. Also for large numbers of cached entries all values for the old code could not be measured as the kernel started hitting softlockups and died before the test completed. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Bug: 32461228 --- fs/Makefile | 2 +- fs/mbcache2.c | 359 +++++++++++++++++++++++++++++++++++++++ include/linux/mbcache2.h | 50 ++++++ 3 files changed, 410 insertions(+), 1 deletion(-) create mode 100644 fs/mbcache2.c create mode 100644 include/linux/mbcache2.h diff --git a/fs/Makefile b/fs/Makefile index 3b54070cd629..dee237540bc0 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -41,7 +41,7 @@ obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o -obj-$(CONFIG_FS_MBCACHE) += mbcache.o +obj-$(CONFIG_FS_MBCACHE) += mbcache.o mbcache2.o obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o obj-$(CONFIG_NFS_COMMON) += nfs_common/ obj-$(CONFIG_COREDUMP) += coredump.o diff --git a/fs/mbcache2.c b/fs/mbcache2.c new file mode 100644 index 000000000000..5c3e1a8c38f6 --- /dev/null +++ b/fs/mbcache2.c @@ -0,0 +1,359 @@ +#include +#include +#include +#include +#include +#include +#include + +/* + * Mbcache is a simple key-value store. Keys need not be unique, however + * key-value pairs are expected to be unique (we use this fact in + * mb2_cache_entry_delete_block()). + * + * Ext2 and ext4 use this cache for deduplication of extended attribute blocks. + * They use hash of a block contents as a key and block number as a value. + * That's why keys need not be unique (different xattr blocks may end up having + * the same hash). However block number always uniquely identifies a cache + * entry. + * + * We provide functions for creation and removal of entries, search by key, + * and a special "delete entry with given key-value pair" operation. Fixed + * size hash table is used for fast key lookups. + */ + +struct mb2_cache { + /* Hash table of entries */ + struct hlist_bl_head *c_hash; + /* log2 of hash table size */ + int c_bucket_bits; + /* Protects c_lru_list, c_entry_count */ + spinlock_t c_lru_list_lock; + struct list_head c_lru_list; + /* Number of entries in cache */ + unsigned long c_entry_count; + struct shrinker c_shrink; +}; + +static struct kmem_cache *mb2_entry_cache; + +/* + * mb2_cache_entry_create - create entry in cache + * @cache - cache where the entry should be created + * @mask - gfp mask with which the entry should be allocated + * @key - key of the entry + * @block - block that contains data + * + * Creates entry in @cache with key @key and records that data is stored in + * block @block. The function returns -EBUSY if entry with the same key + * and for the same block already exists in cache. Otherwise 0 is returned. + */ +int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, + sector_t block) +{ + struct mb2_cache_entry *entry, *dup; + struct hlist_bl_node *dup_node; + struct hlist_bl_head *head; + + entry = kmem_cache_alloc(mb2_entry_cache, mask); + if (!entry) + return -ENOMEM; + + INIT_LIST_HEAD(&entry->e_lru_list); + /* One ref for hash, one ref returned */ + atomic_set(&entry->e_refcnt, 1); + entry->e_key = key; + entry->e_block = block; + head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + entry->e_hash_list_head = head; + hlist_bl_lock(head); + hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { + if (dup->e_key == key && dup->e_block == block) { + hlist_bl_unlock(head); + kmem_cache_free(mb2_entry_cache, entry); + return -EBUSY; + } + } + hlist_bl_add_head(&entry->e_hash_list, head); + hlist_bl_unlock(head); + + spin_lock(&cache->c_lru_list_lock); + list_add_tail(&entry->e_lru_list, &cache->c_lru_list); + /* Grab ref for LRU list */ + atomic_inc(&entry->e_refcnt); + cache->c_entry_count++; + spin_unlock(&cache->c_lru_list_lock); + + return 0; +} +EXPORT_SYMBOL(mb2_cache_entry_create); + +void __mb2_cache_entry_free(struct mb2_cache_entry *entry) +{ + kmem_cache_free(mb2_entry_cache, entry); +} +EXPORT_SYMBOL(__mb2_cache_entry_free); + +static struct mb2_cache_entry *__entry_find(struct mb2_cache *cache, + struct mb2_cache_entry *entry, + u32 key) +{ + struct mb2_cache_entry *old_entry = entry; + struct hlist_bl_node *node; + struct hlist_bl_head *head; + + if (entry) + head = entry->e_hash_list_head; + else + head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + hlist_bl_lock(head); + if (entry && !hlist_bl_unhashed(&entry->e_hash_list)) + node = entry->e_hash_list.next; + else + node = hlist_bl_first(head); + while (node) { + entry = hlist_bl_entry(node, struct mb2_cache_entry, + e_hash_list); + if (entry->e_key == key) { + atomic_inc(&entry->e_refcnt); + goto out; + } + node = node->next; + } + entry = NULL; +out: + hlist_bl_unlock(head); + if (old_entry) + mb2_cache_entry_put(cache, old_entry); + + return entry; +} + +/* + * mb2_cache_entry_find_first - find the first entry in cache with given key + * @cache: cache where we should search + * @key: key to look for + * + * Search in @cache for entry with key @key. Grabs reference to the first + * entry found and returns the entry. + */ +struct mb2_cache_entry *mb2_cache_entry_find_first(struct mb2_cache *cache, + u32 key) +{ + return __entry_find(cache, NULL, key); +} +EXPORT_SYMBOL(mb2_cache_entry_find_first); + +/* + * mb2_cache_entry_find_next - find next entry in cache with the same + * @cache: cache where we should search + * @entry: entry to start search from + * + * Finds next entry in the hash chain which has the same key as @entry. + * If @entry is unhashed (which can happen when deletion of entry races + * with the search), finds the first entry in the hash chain. The function + * drops reference to @entry and returns with a reference to the found entry. + */ +struct mb2_cache_entry *mb2_cache_entry_find_next(struct mb2_cache *cache, + struct mb2_cache_entry *entry) +{ + return __entry_find(cache, entry, entry->e_key); +} +EXPORT_SYMBOL(mb2_cache_entry_find_next); + +/* mb2_cache_entry_delete_block - remove information about block from cache + * @cache - cache we work with + * @key - key of the entry to remove + * @block - block containing data for @key + * + * Remove entry from cache @cache with key @key with data stored in @block. + */ +void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, + sector_t block) +{ + struct hlist_bl_node *node; + struct hlist_bl_head *head; + struct mb2_cache_entry *entry; + + head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + hlist_bl_lock(head); + hlist_bl_for_each_entry(entry, node, head, e_hash_list) { + if (entry->e_key == key && entry->e_block == block) { + /* We keep hash list reference to keep entry alive */ + hlist_bl_del_init(&entry->e_hash_list); + hlist_bl_unlock(head); + spin_lock(&cache->c_lru_list_lock); + if (!list_empty(&entry->e_lru_list)) { + list_del_init(&entry->e_lru_list); + cache->c_entry_count--; + atomic_dec(&entry->e_refcnt); + } + spin_unlock(&cache->c_lru_list_lock); + mb2_cache_entry_put(cache, entry); + return; + } + } + hlist_bl_unlock(head); +} +EXPORT_SYMBOL(mb2_cache_entry_delete_block); + +/* mb2_cache_entry_touch - cache entry got used + * @cache - cache the entry belongs to + * @entry - entry that got used + * + * Move entry in lru list to reflect the fact that it was used. + */ +void mb2_cache_entry_touch(struct mb2_cache *cache, + struct mb2_cache_entry *entry) +{ + spin_lock(&cache->c_lru_list_lock); + if (!list_empty(&entry->e_lru_list)) + list_move_tail(&cache->c_lru_list, &entry->e_lru_list); + spin_unlock(&cache->c_lru_list_lock); +} +EXPORT_SYMBOL(mb2_cache_entry_touch); + +static unsigned long mb2_cache_count(struct shrinker *shrink, + struct shrink_control *sc) +{ + struct mb2_cache *cache = container_of(shrink, struct mb2_cache, + c_shrink); + + return cache->c_entry_count; +} + +/* Shrink number of entries in cache */ +static unsigned long mb2_cache_scan(struct shrinker *shrink, + struct shrink_control *sc) +{ + int nr_to_scan = sc->nr_to_scan; + struct mb2_cache *cache = container_of(shrink, struct mb2_cache, + c_shrink); + struct mb2_cache_entry *entry; + struct hlist_bl_head *head; + unsigned int shrunk = 0; + + spin_lock(&cache->c_lru_list_lock); + while (nr_to_scan-- && !list_empty(&cache->c_lru_list)) { + entry = list_first_entry(&cache->c_lru_list, + struct mb2_cache_entry, e_lru_list); + list_del_init(&entry->e_lru_list); + cache->c_entry_count--; + /* + * We keep LRU list reference so that entry doesn't go away + * from under us. + */ + spin_unlock(&cache->c_lru_list_lock); + head = entry->e_hash_list_head; + hlist_bl_lock(head); + if (!hlist_bl_unhashed(&entry->e_hash_list)) { + hlist_bl_del_init(&entry->e_hash_list); + atomic_dec(&entry->e_refcnt); + } + hlist_bl_unlock(head); + if (mb2_cache_entry_put(cache, entry)) + shrunk++; + cond_resched(); + spin_lock(&cache->c_lru_list_lock); + } + spin_unlock(&cache->c_lru_list_lock); + + return shrunk; +} + +/* + * mb2_cache_create - create cache + * @bucket_bits: log2 of the hash table size + * + * Create cache for keys with 2^bucket_bits hash entries. + */ +struct mb2_cache *mb2_cache_create(int bucket_bits) +{ + struct mb2_cache *cache; + int bucket_count = 1 << bucket_bits; + int i; + + if (!try_module_get(THIS_MODULE)) + return NULL; + + cache = kzalloc(sizeof(struct mb2_cache), GFP_KERNEL); + if (!cache) + goto err_out; + cache->c_bucket_bits = bucket_bits; + INIT_LIST_HEAD(&cache->c_lru_list); + spin_lock_init(&cache->c_lru_list_lock); + cache->c_hash = kmalloc(bucket_count * sizeof(struct hlist_bl_head), + GFP_KERNEL); + if (!cache->c_hash) { + kfree(cache); + goto err_out; + } + for (i = 0; i < bucket_count; i++) + INIT_HLIST_BL_HEAD(&cache->c_hash[i]); + + cache->c_shrink.count_objects = mb2_cache_count; + cache->c_shrink.scan_objects = mb2_cache_scan; + cache->c_shrink.seeks = DEFAULT_SEEKS; + register_shrinker(&cache->c_shrink); + + return cache; + +err_out: + module_put(THIS_MODULE); + return NULL; +} +EXPORT_SYMBOL(mb2_cache_create); + +/* + * mb2_cache_destroy - destroy cache + * @cache: the cache to destroy + * + * Free all entries in cache and cache itself. Caller must make sure nobody + * (except shrinker) can reach @cache when calling this. + */ +void mb2_cache_destroy(struct mb2_cache *cache) +{ + struct mb2_cache_entry *entry, *next; + + unregister_shrinker(&cache->c_shrink); + + /* + * We don't bother with any locking. Cache must not be used at this + * point. + */ + list_for_each_entry_safe(entry, next, &cache->c_lru_list, e_lru_list) { + if (!hlist_bl_unhashed(&entry->e_hash_list)) { + hlist_bl_del_init(&entry->e_hash_list); + atomic_dec(&entry->e_refcnt); + } else + WARN_ON(1); + list_del(&entry->e_lru_list); + WARN_ON(atomic_read(&entry->e_refcnt) != 1); + mb2_cache_entry_put(cache, entry); + } + kfree(cache->c_hash); + kfree(cache); + module_put(THIS_MODULE); +} +EXPORT_SYMBOL(mb2_cache_destroy); + +static int __init mb2cache_init(void) +{ + mb2_entry_cache = kmem_cache_create("mbcache", + sizeof(struct mb2_cache_entry), 0, + SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL); + BUG_ON(!mb2_entry_cache); + return 0; +} + +static void __exit mb2cache_exit(void) +{ + kmem_cache_destroy(mb2_entry_cache); +} + +module_init(mb2cache_init) +module_exit(mb2cache_exit) + +MODULE_AUTHOR("Jan Kara "); +MODULE_DESCRIPTION("Meta block cache (for extended attributes)"); +MODULE_LICENSE("GPL"); diff --git a/include/linux/mbcache2.h b/include/linux/mbcache2.h new file mode 100644 index 000000000000..b6f160ff2533 --- /dev/null +++ b/include/linux/mbcache2.h @@ -0,0 +1,50 @@ +#ifndef _LINUX_MB2CACHE_H +#define _LINUX_MB2CACHE_H + +#include +#include +#include +#include +#include + +struct mb2_cache; + +struct mb2_cache_entry { + /* LRU list - protected by cache->c_lru_list_lock */ + struct list_head e_lru_list; + /* Hash table list - protected by bitlock in e_hash_list_head */ + struct hlist_bl_node e_hash_list; + atomic_t e_refcnt; + /* Key in hash - stable during lifetime of the entry */ + u32 e_key; + /* Block number of hashed block - stable during lifetime of the entry */ + sector_t e_block; + /* Head of hash list (for list bit lock) - stable */ + struct hlist_bl_head *e_hash_list_head; +}; + +struct mb2_cache *mb2_cache_create(int bucket_bits); +void mb2_cache_destroy(struct mb2_cache *cache); + +int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, + sector_t block); +void __mb2_cache_entry_free(struct mb2_cache_entry *entry); +static inline int mb2_cache_entry_put(struct mb2_cache *cache, + struct mb2_cache_entry *entry) +{ + if (!atomic_dec_and_test(&entry->e_refcnt)) + return 0; + __mb2_cache_entry_free(entry); + return 1; +} + +void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, + sector_t block); +struct mb2_cache_entry *mb2_cache_entry_find_first(struct mb2_cache *cache, + u32 key); +struct mb2_cache_entry *mb2_cache_entry_find_next(struct mb2_cache *cache, + struct mb2_cache_entry *entry); +void mb2_cache_entry_touch(struct mb2_cache *cache, + struct mb2_cache_entry *entry); + +#endif /* _LINUX_MB2CACHE_H */ From 0588ab2893f1b95395b49406b1b5cfdb43d1da38 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 22 Feb 2016 11:50:13 -0500 Subject: [PATCH 461/521] BACKPORT [UPSTREAM] ext4: convert to mbcache2 (Cherry-pick from commit 82939d7999dfc1f1998c4b1c12e2f19edbdff272) The conversion is generally straightforward. The only tricky part is that xattr block corresponding to found mbcache entry can get freed before we get buffer lock for that block. So we have to check whether the entry is still valid after getting buffer lock. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Bug; 32461228 --- fs/ext4/ext4.h | 2 +- fs/ext4/super.c | 7 ++- fs/ext4/xattr.c | 136 ++++++++++++++++++++++++------------------------ fs/ext4/xattr.h | 5 +- 4 files changed, 75 insertions(+), 75 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index ac720a31e4ff..2b3ac9fa9460 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1442,7 +1442,7 @@ struct ext4_sb_info { struct list_head s_es_list; /* List of inodes with reclaimable extents */ long s_es_nr_inode; struct ext4_es_stats s_es_stats; - struct mb_cache *s_mb_cache; + struct mb2_cache *s_mb_cache; spinlock_t s_es_lock ____cacheline_aligned_in_smp; /* Ratelimit ext4 messages. */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 68345a9e59b8..bd8831bfbafe 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -816,7 +816,6 @@ static void ext4_put_super(struct super_block *sb) ext4_release_system_zone(sb); ext4_mb_release(sb); ext4_ext_release(sb); - ext4_xattr_put_super(sb); if (!(sb->s_flags & MS_RDONLY) && !aborted) { ext4_clear_feature_journal_needs_recovery(sb); @@ -3833,7 +3832,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) no_journal: if (ext4_mballoc_ready) { - sbi->s_mb_cache = ext4_xattr_create_cache(sb->s_id); + sbi->s_mb_cache = ext4_xattr_create_cache(); if (!sbi->s_mb_cache) { ext4_msg(sb, KERN_ERR, "Failed to create an mb_cache"); goto failed_mount_wq; @@ -4065,6 +4064,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) if (EXT4_SB(sb)->rsv_conversion_wq) destroy_workqueue(EXT4_SB(sb)->rsv_conversion_wq); failed_mount_wq: + if (sbi->s_mb_cache) { + ext4_xattr_destroy_cache(sbi->s_mb_cache); + sbi->s_mb_cache = NULL; + } if (sbi->s_journal) { jbd2_journal_destroy(sbi->s_journal); sbi->s_journal = NULL; diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 263002f0389d..f7455e668474 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -53,7 +53,7 @@ #include #include #include -#include +#include #include #include "ext4_jbd2.h" #include "ext4.h" @@ -80,10 +80,10 @@ # define ea_bdebug(bh, fmt, ...) no_printk(fmt, ##__VA_ARGS__) #endif -static void ext4_xattr_cache_insert(struct mb_cache *, struct buffer_head *); +static void ext4_xattr_cache_insert(struct mb2_cache *, struct buffer_head *); static struct buffer_head *ext4_xattr_cache_find(struct inode *, struct ext4_xattr_header *, - struct mb_cache_entry **); + struct mb2_cache_entry **); static void ext4_xattr_rehash(struct ext4_xattr_header *, struct ext4_xattr_entry *); static int ext4_xattr_list(struct dentry *dentry, char *buffer, @@ -279,7 +279,7 @@ ext4_xattr_block_get(struct inode *inode, int name_index, const char *name, struct ext4_xattr_entry *entry; size_t size; int error; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); @@ -426,7 +426,7 @@ ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size) struct inode *inode = d_inode(dentry); struct buffer_head *bh = NULL; int error; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); @@ -543,11 +543,8 @@ static void ext4_xattr_release_block(handle_t *handle, struct inode *inode, struct buffer_head *bh) { - struct mb_cache_entry *ce = NULL; int error = 0; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); - ce = mb_cache_entry_get(ext4_mb_cache, bh->b_bdev, bh->b_blocknr); BUFFER_TRACE(bh, "get_write_access"); error = ext4_journal_get_write_access(handle, bh); if (error) @@ -555,9 +552,15 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, lock_buffer(bh); if (BHDR(bh)->h_refcount == cpu_to_le32(1)) { + __u32 hash = le32_to_cpu(BHDR(bh)->h_hash); + ea_bdebug(bh, "refcount now=0; freeing"); - if (ce) - mb_cache_entry_free(ce); + /* + * This must happen under buffer lock for + * ext4_xattr_block_set() to reliably detect freed block + */ + mb2_cache_entry_delete_block(EXT4_GET_MB_CACHE(inode), hash, + bh->b_blocknr); get_bh(bh); unlock_buffer(bh); ext4_free_blocks(handle, inode, bh, 0, 1, @@ -565,8 +568,6 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, EXT4_FREE_BLOCKS_FORGET); } else { le32_add_cpu(&BHDR(bh)->h_refcount, -1); - if (ce) - mb_cache_entry_release(ce); /* * Beware of this ugliness: Releasing of xattr block references * from different inodes can race and so we have to protect @@ -779,17 +780,15 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, struct super_block *sb = inode->i_sb; struct buffer_head *new_bh = NULL; struct ext4_xattr_search *s = &bs->s; - struct mb_cache_entry *ce = NULL; + struct mb2_cache_entry *ce = NULL; int error = 0; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); #define header(x) ((struct ext4_xattr_header *)(x)) if (i->value && i->value_len > sb->s_blocksize) return -ENOSPC; if (s->base) { - ce = mb_cache_entry_get(ext4_mb_cache, bs->bh->b_bdev, - bs->bh->b_blocknr); BUFFER_TRACE(bs->bh, "get_write_access"); error = ext4_journal_get_write_access(handle, bs->bh); if (error) @@ -797,10 +796,15 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, lock_buffer(bs->bh); if (header(s->base)->h_refcount == cpu_to_le32(1)) { - if (ce) { - mb_cache_entry_free(ce); - ce = NULL; - } + __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash); + + /* + * This must happen under buffer lock for + * ext4_xattr_block_set() to reliably detect modified + * block + */ + mb2_cache_entry_delete_block(ext4_mb_cache, hash, + bs->bh->b_blocknr); ea_bdebug(bs->bh, "modifying in-place"); error = ext4_xattr_set_entry(i, s); if (!error) { @@ -824,10 +828,6 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, int offset = (char *)s->here - bs->bh->b_data; unlock_buffer(bs->bh); - if (ce) { - mb_cache_entry_release(ce); - ce = NULL; - } ea_bdebug(bs->bh, "cloning"); s->base = kmalloc(bs->bh->b_size, GFP_NOFS); error = -ENOMEM; @@ -882,6 +882,31 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, if (error) goto cleanup_dquot; lock_buffer(new_bh); + /* + * We have to be careful about races with + * freeing or rehashing of xattr block. Once we + * hold buffer lock xattr block's state is + * stable so we can check whether the block got + * freed / rehashed or not. Since we unhash + * mbcache entry under buffer lock when freeing + * / rehashing xattr block, checking whether + * entry is still hashed is reliable. + */ + if (hlist_bl_unhashed(&ce->e_hash_list)) { + /* + * Undo everything and check mbcache + * again. + */ + unlock_buffer(new_bh); + dquot_free_block(inode, + EXT4_C2B(EXT4_SB(sb), + 1)); + brelse(new_bh); + mb2_cache_entry_put(ext4_mb_cache, ce); + ce = NULL; + new_bh = NULL; + goto inserted; + } le32_add_cpu(&BHDR(new_bh)->h_refcount, 1); ea_bdebug(new_bh, "reusing; refcount now=%d", le32_to_cpu(BHDR(new_bh)->h_refcount)); @@ -892,7 +917,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, if (error) goto cleanup_dquot; } - mb_cache_entry_release(ce); + mb2_cache_entry_touch(ext4_mb_cache, ce); + mb2_cache_entry_put(ext4_mb_cache, ce); ce = NULL; } else if (bs->bh && s->base == bs->bh->b_data) { /* We were modifying this block in-place. */ @@ -957,7 +983,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, cleanup: if (ce) - mb_cache_entry_release(ce); + mb2_cache_entry_put(ext4_mb_cache, ce); brelse(new_bh); if (!(bs->bh && s->base == bs->bh->b_data)) kfree(s->base); @@ -1518,17 +1544,6 @@ ext4_xattr_delete_inode(handle_t *handle, struct inode *inode) brelse(bh); } -/* - * ext4_xattr_put_super() - * - * This is called when a file system is unmounted. - */ -void -ext4_xattr_put_super(struct super_block *sb) -{ - mb_cache_shrink(sb->s_bdev); -} - /* * ext4_xattr_cache_insert() * @@ -1538,28 +1553,18 @@ ext4_xattr_put_super(struct super_block *sb) * Returns 0, or a negative error number on failure. */ static void -ext4_xattr_cache_insert(struct mb_cache *ext4_mb_cache, struct buffer_head *bh) +ext4_xattr_cache_insert(struct mb2_cache *ext4_mb_cache, struct buffer_head *bh) { __u32 hash = le32_to_cpu(BHDR(bh)->h_hash); - struct mb_cache_entry *ce; int error; - ce = mb_cache_entry_alloc(ext4_mb_cache, GFP_NOFS); - if (!ce) { - ea_bdebug(bh, "out of memory"); - return; - } - error = mb_cache_entry_insert(ce, bh->b_bdev, bh->b_blocknr, hash); + error = mb2_cache_entry_create(ext4_mb_cache, GFP_NOFS, hash, + bh->b_blocknr); if (error) { - mb_cache_entry_free(ce); - if (error == -EBUSY) { + if (error == -EBUSY) ea_bdebug(bh, "already in cache"); - error = 0; - } - } else { + } else ea_bdebug(bh, "inserting [%x]", (int)hash); - mb_cache_entry_release(ce); - } } /* @@ -1612,26 +1617,19 @@ ext4_xattr_cmp(struct ext4_xattr_header *header1, */ static struct buffer_head * ext4_xattr_cache_find(struct inode *inode, struct ext4_xattr_header *header, - struct mb_cache_entry **pce) + struct mb2_cache_entry **pce) { __u32 hash = le32_to_cpu(header->h_hash); - struct mb_cache_entry *ce; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache_entry *ce; + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); -again: - ce = mb_cache_entry_find_first(ext4_mb_cache, inode->i_sb->s_bdev, - hash); + ce = mb2_cache_entry_find_first(ext4_mb_cache, hash); while (ce) { struct buffer_head *bh; - if (IS_ERR(ce)) { - if (PTR_ERR(ce) == -EAGAIN) - goto again; - break; - } bh = sb_bread(inode->i_sb, ce->e_block); if (!bh) { EXT4_ERROR_INODE(inode, "block %lu read error", @@ -1647,7 +1645,7 @@ ext4_xattr_cache_find(struct inode *inode, struct ext4_xattr_header *header, return bh; } brelse(bh); - ce = mb_cache_entry_find_next(ce, inode->i_sb->s_bdev, hash); + ce = mb2_cache_entry_find_next(ext4_mb_cache, ce); } return NULL; } @@ -1722,15 +1720,15 @@ static void ext4_xattr_rehash(struct ext4_xattr_header *header, #define HASH_BUCKET_BITS 10 -struct mb_cache * -ext4_xattr_create_cache(char *name) +struct mb2_cache * +ext4_xattr_create_cache(void) { - return mb_cache_create(name, HASH_BUCKET_BITS); + return mb2_cache_create(HASH_BUCKET_BITS); } -void ext4_xattr_destroy_cache(struct mb_cache *cache) +void ext4_xattr_destroy_cache(struct mb2_cache *cache) { if (cache) - mb_cache_destroy(cache); + mb2_cache_destroy(cache); } diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h index ddc0957760ba..10b0f7323ed6 100644 --- a/fs/ext4/xattr.h +++ b/fs/ext4/xattr.h @@ -108,7 +108,6 @@ extern int ext4_xattr_set(struct inode *, int, const char *, const void *, size_ extern int ext4_xattr_set_handle(handle_t *, struct inode *, int, const char *, const void *, size_t, int); extern void ext4_xattr_delete_inode(handle_t *, struct inode *); -extern void ext4_xattr_put_super(struct super_block *); extern int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize, struct ext4_inode *raw_inode, handle_t *handle); @@ -124,8 +123,8 @@ extern int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode, struct ext4_xattr_info *i, struct ext4_xattr_ibody_find *is); -extern struct mb_cache *ext4_xattr_create_cache(char *name); -extern void ext4_xattr_destroy_cache(struct mb_cache *); +extern struct mb2_cache *ext4_xattr_create_cache(void); +extern void ext4_xattr_destroy_cache(struct mb2_cache *); #ifdef CONFIG_EXT4_FS_SECURITY extern int ext4_init_security(handle_t *handle, struct inode *inode, From 4629e13aff1d2814eef87e7487d91d9ffb3599b5 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 22 Feb 2016 11:56:38 -0500 Subject: [PATCH 462/521] BACKPORT: [UPSTREAM] ext2: convert to mbcache2 (Cherry-pick from commit be0726d33cb8f411945884664924bed3cb8c70ee) The conversion is generally straightforward. We convert filesystem from a global cache to per-fs one. Similarly to ext4 the tricky part is that xattr block corresponding to found mbcache entry can get freed before we get buffer lock for that block. So we have to check whether the entry is still valid after getting the buffer lock. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Bug: 32461228 --- fs/ext2/ext2.h | 3 + fs/ext2/super.c | 25 ++++++--- fs/ext2/xattr.c | 143 +++++++++++++++++++++++------------------------- fs/ext2/xattr.h | 21 ++----- 4 files changed, 92 insertions(+), 100 deletions(-) diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 4c69c94cafd8..f98ce7e60a0f 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -61,6 +61,8 @@ struct ext2_block_alloc_info { #define rsv_start rsv_window._rsv_start #define rsv_end rsv_window._rsv_end +struct mb2_cache; + /* * second extended-fs super-block data in memory */ @@ -111,6 +113,7 @@ struct ext2_sb_info { * of the mount options. */ spinlock_t s_lock; + struct mb2_cache *s_mb_cache; }; static inline spinlock_t * diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 748d35afc902..111a31761ffa 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -131,7 +131,10 @@ static void ext2_put_super (struct super_block * sb) dquot_disable(sb, -1, DQUOT_USAGE_ENABLED | DQUOT_LIMITS_ENABLED); - ext2_xattr_put_super(sb); + if (sbi->s_mb_cache) { + ext2_xattr_destroy_cache(sbi->s_mb_cache); + sbi->s_mb_cache = NULL; + } if (!(sb->s_flags & MS_RDONLY)) { struct ext2_super_block *es = sbi->s_es; @@ -1104,6 +1107,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent) ext2_msg(sb, KERN_ERR, "error: insufficient memory"); goto failed_mount3; } + +#ifdef CONFIG_EXT2_FS_XATTR + sbi->s_mb_cache = ext2_xattr_create_cache(); + if (!sbi->s_mb_cache) { + ext2_msg(sb, KERN_ERR, "Failed to create an mb_cache"); + goto failed_mount3; + } +#endif /* * set up enough so that it can read an inode */ @@ -1149,6 +1160,8 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent) sb->s_id); goto failed_mount; failed_mount3: + if (sbi->s_mb_cache) + ext2_xattr_destroy_cache(sbi->s_mb_cache); percpu_counter_destroy(&sbi->s_freeblocks_counter); percpu_counter_destroy(&sbi->s_freeinodes_counter); percpu_counter_destroy(&sbi->s_dirs_counter); @@ -1555,20 +1568,17 @@ MODULE_ALIAS_FS("ext2"); static int __init init_ext2_fs(void) { - int err = init_ext2_xattr(); - if (err) - return err; + int err; + err = init_inodecache(); if (err) - goto out1; + return err; err = register_filesystem(&ext2_fs_type); if (err) goto out; return 0; out: destroy_inodecache(); -out1: - exit_ext2_xattr(); return err; } @@ -1576,7 +1586,6 @@ static void __exit exit_ext2_fs(void) { unregister_filesystem(&ext2_fs_type); destroy_inodecache(); - exit_ext2_xattr(); } MODULE_AUTHOR("Remy Card and others"); diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c index fa70848afa8f..24736c8b3d51 100644 --- a/fs/ext2/xattr.c +++ b/fs/ext2/xattr.c @@ -56,7 +56,7 @@ #include #include #include -#include +#include #include #include #include @@ -92,14 +92,12 @@ static int ext2_xattr_set2(struct inode *, struct buffer_head *, struct ext2_xattr_header *); -static int ext2_xattr_cache_insert(struct buffer_head *); +static int ext2_xattr_cache_insert(struct mb2_cache *, struct buffer_head *); static struct buffer_head *ext2_xattr_cache_find(struct inode *, struct ext2_xattr_header *); static void ext2_xattr_rehash(struct ext2_xattr_header *, struct ext2_xattr_entry *); -static struct mb_cache *ext2_xattr_cache; - static const struct xattr_handler *ext2_xattr_handler_map[] = { [EXT2_XATTR_INDEX_USER] = &ext2_xattr_user_handler, #ifdef CONFIG_EXT2_FS_POSIX_ACL @@ -154,6 +152,7 @@ ext2_xattr_get(struct inode *inode, int name_index, const char *name, size_t name_len, size; char *end; int error; + struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); @@ -198,7 +197,7 @@ bad_block: ext2_error(inode->i_sb, "ext2_xattr_get", goto found; entry = next; } - if (ext2_xattr_cache_insert(bh)) + if (ext2_xattr_cache_insert(ext2_mb_cache, bh)) ea_idebug(inode, "cache insert failed"); error = -ENODATA; goto cleanup; @@ -211,7 +210,7 @@ bad_block: ext2_error(inode->i_sb, "ext2_xattr_get", le16_to_cpu(entry->e_value_offs) + size > inode->i_sb->s_blocksize) goto bad_block; - if (ext2_xattr_cache_insert(bh)) + if (ext2_xattr_cache_insert(ext2_mb_cache, bh)) ea_idebug(inode, "cache insert failed"); if (buffer) { error = -ERANGE; @@ -249,6 +248,7 @@ ext2_xattr_list(struct dentry *dentry, char *buffer, size_t buffer_size) char *end; size_t rest = buffer_size; int error; + struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); @@ -283,7 +283,7 @@ bad_block: ext2_error(inode->i_sb, "ext2_xattr_list", goto bad_block; entry = next; } - if (ext2_xattr_cache_insert(bh)) + if (ext2_xattr_cache_insert(ext2_mb_cache, bh)) ea_idebug(inode, "cache insert failed"); /* list the attribute names */ @@ -480,22 +480,23 @@ bad_block: ext2_error(sb, "ext2_xattr_set", /* Here we know that we can set the new attribute. */ if (header) { - struct mb_cache_entry *ce; - /* assert(header == HDR(bh)); */ - ce = mb_cache_entry_get(ext2_xattr_cache, bh->b_bdev, - bh->b_blocknr); lock_buffer(bh); if (header->h_refcount == cpu_to_le32(1)) { + __u32 hash = le32_to_cpu(header->h_hash); + ea_bdebug(bh, "modifying in-place"); - if (ce) - mb_cache_entry_free(ce); + /* + * This must happen under buffer lock for + * ext2_xattr_set2() to reliably detect modified block + */ + mb2_cache_entry_delete_block(EXT2_SB(sb)->s_mb_cache, + hash, bh->b_blocknr); + /* keep the buffer locked while modifying it. */ } else { int offset; - if (ce) - mb_cache_entry_release(ce); unlock_buffer(bh); ea_bdebug(bh, "cloning"); header = kmalloc(bh->b_size, GFP_KERNEL); @@ -623,6 +624,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, struct super_block *sb = inode->i_sb; struct buffer_head *new_bh = NULL; int error; + struct mb2_cache *ext2_mb_cache = EXT2_SB(sb)->s_mb_cache; if (header) { new_bh = ext2_xattr_cache_find(inode, header); @@ -650,7 +652,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, don't need to change the reference count. */ new_bh = old_bh; get_bh(new_bh); - ext2_xattr_cache_insert(new_bh); + ext2_xattr_cache_insert(ext2_mb_cache, new_bh); } else { /* We need to allocate a new block */ ext2_fsblk_t goal = ext2_group_first_block_no(sb, @@ -671,7 +673,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, memcpy(new_bh->b_data, header, new_bh->b_size); set_buffer_uptodate(new_bh); unlock_buffer(new_bh); - ext2_xattr_cache_insert(new_bh); + ext2_xattr_cache_insert(ext2_mb_cache, new_bh); ext2_xattr_update_super_block(sb); } @@ -704,19 +706,21 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, error = 0; if (old_bh && old_bh != new_bh) { - struct mb_cache_entry *ce; - /* * If there was an old block and we are no longer using it, * release the old block. */ - ce = mb_cache_entry_get(ext2_xattr_cache, old_bh->b_bdev, - old_bh->b_blocknr); lock_buffer(old_bh); if (HDR(old_bh)->h_refcount == cpu_to_le32(1)) { + __u32 hash = le32_to_cpu(HDR(old_bh)->h_hash); + + /* + * This must happen under buffer lock for + * ext2_xattr_set2() to reliably detect freed block + */ + mb2_cache_entry_delete_block(ext2_mb_cache, + hash, old_bh->b_blocknr); /* Free the old block. */ - if (ce) - mb_cache_entry_free(ce); ea_bdebug(old_bh, "freeing"); ext2_free_blocks(inode, old_bh->b_blocknr, 1); mark_inode_dirty(inode); @@ -727,8 +731,6 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, } else { /* Decrement the refcount only. */ le32_add_cpu(&HDR(old_bh)->h_refcount, -1); - if (ce) - mb_cache_entry_release(ce); dquot_free_block_nodirty(inode, 1); mark_inode_dirty(inode); mark_buffer_dirty(old_bh); @@ -754,7 +756,6 @@ void ext2_xattr_delete_inode(struct inode *inode) { struct buffer_head *bh = NULL; - struct mb_cache_entry *ce; down_write(&EXT2_I(inode)->xattr_sem); if (!EXT2_I(inode)->i_file_acl) @@ -774,19 +775,22 @@ ext2_xattr_delete_inode(struct inode *inode) EXT2_I(inode)->i_file_acl); goto cleanup; } - ce = mb_cache_entry_get(ext2_xattr_cache, bh->b_bdev, bh->b_blocknr); lock_buffer(bh); if (HDR(bh)->h_refcount == cpu_to_le32(1)) { - if (ce) - mb_cache_entry_free(ce); + __u32 hash = le32_to_cpu(HDR(bh)->h_hash); + + /* + * This must happen under buffer lock for ext2_xattr_set2() to + * reliably detect freed block + */ + mb2_cache_entry_delete_block(EXT2_SB(inode->i_sb)->s_mb_cache, + hash, bh->b_blocknr); ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1); get_bh(bh); bforget(bh); unlock_buffer(bh); } else { le32_add_cpu(&HDR(bh)->h_refcount, -1); - if (ce) - mb_cache_entry_release(ce); ea_bdebug(bh, "refcount now=%d", le32_to_cpu(HDR(bh)->h_refcount)); unlock_buffer(bh); @@ -802,18 +806,6 @@ ext2_xattr_delete_inode(struct inode *inode) up_write(&EXT2_I(inode)->xattr_sem); } -/* - * ext2_xattr_put_super() - * - * This is called when a file system is unmounted. - */ -void -ext2_xattr_put_super(struct super_block *sb) -{ - mb_cache_shrink(sb->s_bdev); -} - - /* * ext2_xattr_cache_insert() * @@ -823,28 +815,20 @@ ext2_xattr_put_super(struct super_block *sb) * Returns 0, or a negative error number on failure. */ static int -ext2_xattr_cache_insert(struct buffer_head *bh) +ext2_xattr_cache_insert(struct mb2_cache *cache, struct buffer_head *bh) { __u32 hash = le32_to_cpu(HDR(bh)->h_hash); - struct mb_cache_entry *ce; int error; - ce = mb_cache_entry_alloc(ext2_xattr_cache, GFP_NOFS); - if (!ce) - return -ENOMEM; - error = mb_cache_entry_insert(ce, bh->b_bdev, bh->b_blocknr, hash); + error = mb2_cache_entry_create(cache, GFP_NOFS, hash, bh->b_blocknr); if (error) { - mb_cache_entry_free(ce); if (error == -EBUSY) { ea_bdebug(bh, "already in cache (%d cache entries)", atomic_read(&ext2_xattr_cache->c_entry_count)); error = 0; } - } else { - ea_bdebug(bh, "inserting [%x] (%d cache entries)", (int)hash, - atomic_read(&ext2_xattr_cache->c_entry_count)); - mb_cache_entry_release(ce); - } + } else + ea_bdebug(bh, "inserting [%x]", (int)hash); return error; } @@ -900,23 +884,17 @@ static struct buffer_head * ext2_xattr_cache_find(struct inode *inode, struct ext2_xattr_header *header) { __u32 hash = le32_to_cpu(header->h_hash); - struct mb_cache_entry *ce; + struct mb2_cache_entry *ce; + struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); again: - ce = mb_cache_entry_find_first(ext2_xattr_cache, inode->i_sb->s_bdev, - hash); + ce = mb2_cache_entry_find_first(ext2_mb_cache, hash); while (ce) { struct buffer_head *bh; - if (IS_ERR(ce)) { - if (PTR_ERR(ce) == -EAGAIN) - goto again; - break; - } - bh = sb_bread(inode->i_sb, ce->e_block); if (!bh) { ext2_error(inode->i_sb, "ext2_xattr_cache_find", @@ -924,7 +902,21 @@ ext2_xattr_cache_find(struct inode *inode, struct ext2_xattr_header *header) inode->i_ino, (unsigned long) ce->e_block); } else { lock_buffer(bh); - if (le32_to_cpu(HDR(bh)->h_refcount) > + /* + * We have to be careful about races with freeing or + * rehashing of xattr block. Once we hold buffer lock + * xattr block's state is stable so we can check + * whether the block got freed / rehashed or not. + * Since we unhash mbcache entry under buffer lock when + * freeing / rehashing xattr block, checking whether + * entry is still hashed is reliable. + */ + if (hlist_bl_unhashed(&ce->e_hash_list)) { + mb2_cache_entry_put(ext2_mb_cache, ce); + unlock_buffer(bh); + brelse(bh); + goto again; + } else if (le32_to_cpu(HDR(bh)->h_refcount) > EXT2_XATTR_REFCOUNT_MAX) { ea_idebug(inode, "block %ld refcount %d>%d", (unsigned long) ce->e_block, @@ -933,13 +925,14 @@ ext2_xattr_cache_find(struct inode *inode, struct ext2_xattr_header *header) } else if (!ext2_xattr_cmp(header, HDR(bh))) { ea_bdebug(bh, "b_count=%d", atomic_read(&(bh->b_count))); - mb_cache_entry_release(ce); + mb2_cache_entry_touch(ext2_mb_cache, ce); + mb2_cache_entry_put(ext2_mb_cache, ce); return bh; } unlock_buffer(bh); brelse(bh); } - ce = mb_cache_entry_find_next(ce, inode->i_sb->s_bdev, hash); + ce = mb2_cache_entry_find_next(ext2_mb_cache, ce); } return NULL; } @@ -1012,17 +1005,15 @@ static void ext2_xattr_rehash(struct ext2_xattr_header *header, #undef BLOCK_HASH_SHIFT -int __init -init_ext2_xattr(void) +#define HASH_BUCKET_BITS 10 + +struct mb2_cache *ext2_xattr_create_cache(void) { - ext2_xattr_cache = mb_cache_create("ext2_xattr", 6); - if (!ext2_xattr_cache) - return -ENOMEM; - return 0; + return mb2_cache_create(HASH_BUCKET_BITS); } -void -exit_ext2_xattr(void) +void ext2_xattr_destroy_cache(struct mb2_cache *cache) { - mb_cache_destroy(ext2_xattr_cache); + if (cache) + mb2_cache_destroy(cache); } diff --git a/fs/ext2/xattr.h b/fs/ext2/xattr.h index 60edf298644e..6ea38aa9563a 100644 --- a/fs/ext2/xattr.h +++ b/fs/ext2/xattr.h @@ -53,6 +53,8 @@ struct ext2_xattr_entry { #define EXT2_XATTR_SIZE(size) \ (((size) + EXT2_XATTR_ROUND) & ~EXT2_XATTR_ROUND) +struct mb2_cache; + # ifdef CONFIG_EXT2_FS_XATTR extern const struct xattr_handler ext2_xattr_user_handler; @@ -65,10 +67,9 @@ extern int ext2_xattr_get(struct inode *, int, const char *, void *, size_t); extern int ext2_xattr_set(struct inode *, int, const char *, const void *, size_t, int); extern void ext2_xattr_delete_inode(struct inode *); -extern void ext2_xattr_put_super(struct super_block *); -extern int init_ext2_xattr(void); -extern void exit_ext2_xattr(void); +extern struct mb2_cache *ext2_xattr_create_cache(void); +extern void ext2_xattr_destroy_cache(struct mb2_cache *cache); extern const struct xattr_handler *ext2_xattr_handlers[]; @@ -93,19 +94,7 @@ ext2_xattr_delete_inode(struct inode *inode) { } -static inline void -ext2_xattr_put_super(struct super_block *sb) -{ -} - -static inline int -init_ext2_xattr(void) -{ - return 0; -} - -static inline void -exit_ext2_xattr(void) +static inline void ext2_xattr_destroy_cache(struct mb2_cache *cache) { } From eec5abfcb9fe5ff1410e2f5fe93a968f0126e8be Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Mon, 17 Apr 2017 17:11:38 -0700 Subject: [PATCH 463/521] Android: sdcardfs: Change cache GID value Change-Id: Ieb955dd26493da26a458bc20fbbe75bca32b094f Signed-off-by: Daniel Rosenberg Bug: 37193650 --- fs/sdcardfs/derived_perm.c | 2 +- fs/sdcardfs/multiuser.h | 6 ++++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index f82fedd29007..11cb402d3465 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -222,7 +222,7 @@ void fixup_lower_ownership(struct dentry *dentry, const char *name) break; case PERM_ANDROID_PACKAGE_CACHE: if (info->d_uid != 0) - gid = multiuser_get_cache_gid(info->d_uid); + gid = multiuser_get_ext_cache_gid(info->d_uid); else gid = multiuser_get_uid(info->userid, uid); break; diff --git a/fs/sdcardfs/multiuser.h b/fs/sdcardfs/multiuser.h index 2e89b5872314..d0c925cda299 100644 --- a/fs/sdcardfs/multiuser.h +++ b/fs/sdcardfs/multiuser.h @@ -23,6 +23,8 @@ #define AID_APP_END 19999 /* last app user */ #define AID_CACHE_GID_START 20000 /* start of gids for apps to mark cached data */ #define AID_EXT_GID_START 30000 /* start of gids for apps to mark external data */ +#define AID_EXT_CACHE_GID_START 40000 /* start of gids for apps to mark external cached data */ +#define AID_EXT_CACHE_GID_END 49999 /* end of gids for apps to mark external cached data */ #define AID_SHARED_GID_START 50000 /* start of gids for apps in each user to share */ typedef uid_t userid_t; @@ -33,9 +35,9 @@ static inline uid_t multiuser_get_uid(userid_t user_id, appid_t app_id) return (user_id * AID_USER_OFFSET) + (app_id % AID_USER_OFFSET); } -static inline gid_t multiuser_get_cache_gid(uid_t uid) +static inline gid_t multiuser_get_ext_cache_gid(uid_t uid) { - return uid - AID_APP_START + AID_CACHE_GID_START; + return uid - AID_APP_START + AID_EXT_CACHE_GID_START; } static inline gid_t multiuser_get_ext_gid(uid_t uid) From 8b3e96a6d255da3cb2f2afcdfc6e5f9537998d03 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Tue, 18 Apr 2017 12:45:48 -0700 Subject: [PATCH 464/521] ANDROID: sdcardfs: ->iget fixes Adapted from wrapfs commit 8c49eaa0sb9c ("Wrapfs: ->iget fixes") Change where we igrab/iput to ensure we always hold a valid lower_inode. Return ENOMEM (not EACCES) if iget5_locked returns NULL. Signed-off-by: Erez Zadok Signed-off-by: Daniel Rosenberg Bug: 35766959 Change-Id: Id8d4e0c0cbc685a0a77685ce73c923e9a3ddc094 --- fs/sdcardfs/lookup.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index f10f7752f514..a0f221501b4c 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -91,7 +91,9 @@ struct inode *sdcardfs_iget(struct super_block *sb, struct inode *lower_inode, u struct sdcardfs_inode_info *info; struct inode_data data; struct inode *inode; /* the new inode to return */ - int err; + + if (!igrab(lower_inode)) + return ERR_PTR(-ESTALE); data.id = id; data.lower_inode = lower_inode; @@ -106,22 +108,19 @@ struct inode *sdcardfs_iget(struct super_block *sb, struct inode *lower_inode, u sdcardfs_inode_set, /* inode init function */ &data); /* data passed to test+set fxns */ if (!inode) { - err = -EACCES; iput(lower_inode); - return ERR_PTR(err); + return ERR_PTR(-ENOMEM); } - /* if found a cached inode, then just return it */ - if (!(inode->i_state & I_NEW)) + /* if found a cached inode, then just return it (after iput) */ + if (!(inode->i_state & I_NEW)) { + iput(lower_inode); return inode; + } /* initialize new inode */ info = SDCARDFS_I(inode); inode->i_ino = lower_inode->i_ino; - if (!igrab(lower_inode)) { - err = -ESTALE; - return ERR_PTR(err); - } sdcardfs_set_lower_inode(inode, lower_inode); inode->i_version++; From 53b4f1750fb2ab81bb2631c5f0843b44e429d752 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Tue, 18 Apr 2017 22:49:38 -0700 Subject: [PATCH 465/521] Android: sdcardfs: Don't complain in fixup_lower_ownership Not all filesystems support changing the owner of a file. We shouldn't complain if it doesn't happen. Signed-off-by: Daniel Rosenberg Bug: 37488099 Change-Id: I403e44ab7230f176e6df82f6adb4e5c82ce57f33 --- fs/sdcardfs/derived_perm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index 11cb402d3465..cddfc79ea365 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -252,7 +252,7 @@ void fixup_lower_ownership(struct dentry *dentry, const char *name) goto retry_deleg; } if (error) - pr_err("sdcardfs: Failed to touch up lower fs gid/uid.\n"); + pr_debug("sdcardfs: Failed to touch up lower fs gid/uid for %s\n", name); } sdcardfs_put_lower_path(dentry, &path); } From 09ebc323c3b203f96191a5d928dde07be3fc966a Mon Sep 17 00:00:00 2001 From: Jiebing Li Date: Tue, 10 Mar 2015 11:24:00 +0800 Subject: [PATCH 466/521] ANDROID: usb: gadget: fix MTP enumeration issue under super speed mode MTP function doesn't show as a drive in Windows when the device is connected to PC's USB3 port, because device fails to respond ACK to BULK OUT transfer request. This patch modifies MTP OUT request length as multiple of MaxPacketSize per databook requirement in order to fix this issue. Patchset: mtp Change-Id: I090d7880ff00c499dc5ba7fd644b1fe7cd87fcb5 Signed-off-by: Jiebing Li Signed-off-by: Wang, Yu Signed-off-by: Russ Weight --- drivers/usb/gadget/function/f_mtp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/usb/gadget/function/f_mtp.c b/drivers/usb/gadget/function/f_mtp.c index d8b69af6e335..b37cc8d87ca4 100644 --- a/drivers/usb/gadget/function/f_mtp.c +++ b/drivers/usb/gadget/function/f_mtp.c @@ -540,10 +540,12 @@ static ssize_t mtp_read(struct file *fp, char __user *buf, ssize_t r = count; unsigned xfer; int ret = 0; + size_t len; DBG(cdev, "mtp_read(%zu)\n", count); - if (count > MTP_BULK_BUFFER_SIZE) + len = usb_ep_align_maybe(cdev->gadget, dev->ep_out, count); + if (len > MTP_BULK_BUFFER_SIZE) return -EINVAL; /* we will block until we're online */ @@ -567,7 +569,7 @@ static ssize_t mtp_read(struct file *fp, char __user *buf, requeue_req: /* queue a request */ req = dev->rx_req[0]; - req->length = count; + req->length = len; dev->rx_done = 0; ret = usb_ep_queue(dev->ep_out, req, GFP_KERNEL); if (ret < 0) { From da83daedc80f9923f22ce9a87bea8e5d240d3e15 Mon Sep 17 00:00:00 2001 From: Jin Qian Date: Thu, 13 Apr 2017 17:07:58 -0700 Subject: [PATCH 467/521] ANDROID: uid_sys_stats: reduce update_io_stats overhead Replaced read_lock with rcu_read_lock to reduce time that preemption is disabled. Added a function to update io stats for specific uid and moved hash table lookup, user_namespace out of loops. Bug: 37319300 Change-Id: I2b81b5cd3b6399b40d08c3c14b42cad044556970 Signed-off-by: Jin Qian --- drivers/misc/uid_sys_stats.c | 61 ++++++++++++++++++++++++++++++------ 1 file changed, 51 insertions(+), 10 deletions(-) diff --git a/drivers/misc/uid_sys_stats.c b/drivers/misc/uid_sys_stats.c index 204b23484266..618617f3e867 100644 --- a/drivers/misc/uid_sys_stats.c +++ b/drivers/misc/uid_sys_stats.c @@ -237,28 +237,28 @@ static void clean_uid_io_last_stats(struct uid_entry *uid_entry, io_last->fsync -= task->ioac.syscfs; } -static void update_io_stats_locked(void) +static void update_io_stats_all_locked(void) { struct uid_entry *uid_entry; struct task_struct *task, *temp; struct io_stats *io_bucket, *io_curr, *io_last; + struct user_namespace *user_ns = current_user_ns(); unsigned long bkt; - - BUG_ON(!rt_mutex_is_locked(&uid_lock)); + uid_t uid; hash_for_each(hash_table, bkt, uid_entry, hash) memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0, sizeof(struct io_stats)); - read_lock(&tasklist_lock); + rcu_read_lock(); do_each_thread(temp, task) { - uid_entry = find_or_register_uid(from_kuid_munged( - current_user_ns(), task_uid(task))); + uid = from_kuid_munged(user_ns, task_uid(task)); + uid_entry = find_or_register_uid(uid); if (!uid_entry) continue; add_uid_io_curr_stats(uid_entry, task); } while_each_thread(temp, task); - read_unlock(&tasklist_lock); + rcu_read_unlock(); hash_for_each(hash_table, bkt, uid_entry, hash) { io_bucket = &uid_entry->io[uid_entry->state]; @@ -281,6 +281,47 @@ static void update_io_stats_locked(void) } } +static void update_io_stats_uid_locked(uid_t target_uid) +{ + struct uid_entry *uid_entry; + struct task_struct *task, *temp; + struct io_stats *io_bucket, *io_curr, *io_last; + struct user_namespace *user_ns = current_user_ns(); + + uid_entry = find_or_register_uid(target_uid); + if (!uid_entry) + return; + + memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0, + sizeof(struct io_stats)); + + rcu_read_lock(); + do_each_thread(temp, task) { + if (from_kuid_munged(user_ns, task_uid(task)) != target_uid) + continue; + add_uid_io_curr_stats(uid_entry, task); + } while_each_thread(temp, task); + rcu_read_unlock(); + + io_bucket = &uid_entry->io[uid_entry->state]; + io_curr = &uid_entry->io[UID_STATE_TOTAL_CURR]; + io_last = &uid_entry->io[UID_STATE_TOTAL_LAST]; + + io_bucket->read_bytes += + io_curr->read_bytes - io_last->read_bytes; + io_bucket->write_bytes += + io_curr->write_bytes - io_last->write_bytes; + io_bucket->rchar += io_curr->rchar - io_last->rchar; + io_bucket->wchar += io_curr->wchar - io_last->wchar; + io_bucket->fsync += io_curr->fsync - io_last->fsync; + + io_last->read_bytes = io_curr->read_bytes; + io_last->write_bytes = io_curr->write_bytes; + io_last->rchar = io_curr->rchar; + io_last->wchar = io_curr->wchar; + io_last->fsync = io_curr->fsync; +} + static int uid_io_show(struct seq_file *m, void *v) { struct uid_entry *uid_entry; @@ -288,7 +329,7 @@ static int uid_io_show(struct seq_file *m, void *v) rt_mutex_lock(&uid_lock); - update_io_stats_locked(); + update_io_stats_all_locked(); hash_for_each(hash_table, bkt, uid_entry, hash) { seq_printf(m, "%d %llu %llu %llu %llu %llu %llu %llu %llu %llu %llu\n", @@ -363,7 +404,7 @@ static ssize_t uid_procstat_write(struct file *file, return count; } - update_io_stats_locked(); + update_io_stats_uid_locked(uid); uid_entry->state = state; @@ -401,7 +442,7 @@ static int process_notifier(struct notifier_block *self, uid_entry->utime += utime; uid_entry->stime += stime; - update_io_stats_locked(); + update_io_stats_uid_locked(uid); clean_uid_io_last_stats(uid_entry, task); exit: From 758b6055e2c1ddf0557f84dd2eb07c6bb4c63752 Mon Sep 17 00:00:00 2001 From: Amit Pundir Date: Mon, 24 Apr 2017 15:50:31 +0530 Subject: [PATCH 468/521] Revert "USB: gadget: u_ether: Fix data stall issue in RNDIS tethering mode" This reverts commit 78281f6ed79413e5a16eac6edc4a72c6836ae5a9. This data stall fix is no longer required in AOSP. It is already skipped in android-4.9 patchset. Also core change from this data stall fix is already undone by android-4.4 merge commit 324e88de4aba ("Merge tag 'v4.4.32' into android-4.4.y"). This revert patch just clean up the left overs. It also reverts the compile fix from Change-Id: I38c4f4a850b0329fb4a06b2c7e45558e16d66151 40ceb2c69964 ("usb: gadget: Fix compilation problem with tx_qlen field"). Signed-off-by: Amit Pundir --- drivers/usb/gadget/function/u_ether.c | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c index e4920e5e1d64..21bf0a8423d5 100644 --- a/drivers/usb/gadget/function/u_ether.c +++ b/drivers/usb/gadget/function/u_ether.c @@ -66,7 +66,7 @@ struct eth_dev { spinlock_t req_lock; /* guard {rx,tx}_reqs */ struct list_head tx_reqs, rx_reqs; - unsigned tx_qlen; + atomic_t tx_qlen; /* Minimum number of TX USB request queued to UDC */ #define TX_REQ_THRESHOLD 5 int no_tx_req_used; @@ -568,6 +568,7 @@ static void tx_complete(struct usb_ep *ep, struct usb_request *req) dev_kfree_skb_any(skb); } + atomic_dec(&dev->tx_qlen); if (netif_carrier_ok(dev->net)) netif_wake_queue(dev->net); } @@ -741,20 +742,13 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb, req->length = length; - /* throttle highspeed IRQ rate back slightly */ - if (gadget_is_dualspeed(dev->gadget) && - (dev->gadget->speed == USB_SPEED_HIGH) && - !list_empty(&dev->tx_reqs)) { - dev->tx_qlen++; - if (dev->tx_qlen == (dev->qmult/2)) { - req->no_interrupt = 0; - dev->tx_qlen = 0; - } else { - req->no_interrupt = 1; - } - } else { - req->no_interrupt = 0; - } + /* throttle high/super speed IRQ rate back slightly */ + if (gadget_is_dualspeed(dev->gadget)) + req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH || + dev->gadget->speed == USB_SPEED_SUPER)) && + !list_empty(&dev->tx_reqs)) + ? ((atomic_read(&dev->tx_qlen) % dev->qmult) != 0) + : 0; retval = usb_ep_queue(in, req, GFP_ATOMIC); switch (retval) { @@ -763,6 +757,7 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb, break; case 0: net->trans_start = jiffies; + atomic_inc(&dev->tx_qlen); } if (retval) { @@ -791,7 +786,7 @@ static void eth_start(struct eth_dev *dev, gfp_t gfp_flags) rx_fill(dev, gfp_flags); /* and open the tx floodgates */ - dev->tx_qlen = 0; + atomic_set(&dev->tx_qlen, 0); netif_wake_queue(dev->net); } From d26b727c10ee67e9b4a6ffc864f954e95e52d873 Mon Sep 17 00:00:00 2001 From: David Zeuthen Date: Tue, 24 Jan 2017 13:17:01 -0500 Subject: [PATCH 469/521] ANDROID: AVB error handler to invalidate vbmeta partition. If androidboot.vbmeta.device is set and points to a device with vbmeta magic, this header will be overwritten upon an irrecoverable dm-verity error. The side-effect of this is that the slot will fail to verify on next reboot, effectively triggering the boot loader to fallback to another slot. This work both if the vbmeta struct is at the start of a partition or if there's an AVB footer at the end. This code is based on drivers/md/dm-verity-chromeos.c from ChromiumOS. Example: [ 0.000000] Kernel command line: rootfstype=ext4 init=/init console=ttyS0,115200 androidboot.console=ttyS0 androidboot.hardware=uefi_x86_64 enforcing=0 androidboot.selinux=permissive androidboot.debuggable=1 buildvariant=eng dm="1 vroot none ro 1,0 2080496 verity 1 PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b 4096 4096 260062 260062 sha1 4f76354c86e430e27426d584a726f2fbffecae32 7e4085342d634065269631ac9a199e1a43f4632c 1 ignore_zero_blocks" root=0xfd00 androidboot.vbmeta.device=PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 androidboot.slot_suffix=_a androidboot.vbmeta.device_state=unlocked androidboot.vbmeta.hash_alg=sha256 androidboot.vbmeta.size=3200 androidboot.vbmeta.digest=14fe41c2b3696c31b7ad5eae7877d7d188995e1ab122c604aaaf4785850b91f7 skip_initramfs [...] [ 0.612802] device-mapper: verity-avb: AVB error handler initialized with vbmeta device: PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 [...] [ 1.213804] device-mapper: init: attempting early device configuration. [ 1.214752] device-mapper: init: adding target '0 2080496 verity 1 PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b 4096 4096 260062 260062 sha1 4f76354c86e430e27426d584a726f2fbffecae32 7e4085342d634065269631ac9a199e1a43f4632c 1 ignore_zero_blocks' [ 1.217643] device-mapper: init: dm-0 is ready [ 1.226694] device-mapper: verity: 8:6: data block 0 is corrupted [ 1.227666] device-mapper: verity-avb: AVB error handler called for PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 [ 1.234308] device-mapper: verity-avb: invalidate_vbmeta: found vbmeta partition [ 1.235848] device-mapper: verity-avb: invalidate_vbmeta: completed. [...] Bug: 31622239 Test: Manually tested (other arch). Change-Id: Idf6be32d6a3d28e15de9302aa26ad6a516d663aa Signed-off-by: David Zeuthen --- drivers/md/Kconfig | 11 ++ drivers/md/Makefile | 4 + drivers/md/dm-verity-avb.c | 217 ++++++++++++++++++++++++++++++++++ drivers/md/dm-verity-target.c | 6 +- drivers/md/dm-verity.h | 1 + 5 files changed, 238 insertions(+), 1 deletion(-) create mode 100644 drivers/md/dm-verity-avb.c diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index 9eb08a43cd27..6f2fde5d98e7 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -515,6 +515,17 @@ config DM_LOG_WRITES If unsure, say N. +config DM_VERITY_AVB + tristate "Support AVB specific verity error behavior" + depends on DM_VERITY + ---help--- + Enables Android Verified Boot platform-specific error + behavior. In particular, it will modify the vbmeta partition + specified on the kernel command-line when non-transient error + occurs (followed by a panic). + + If unsure, say N. + config DM_ANDROID_VERITY bool "Android verity target support" depends on DM_VERITY=y diff --git a/drivers/md/Makefile b/drivers/md/Makefile index 32b5d0a90d60..c22cc74c9fa8 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -69,3 +69,7 @@ endif ifeq ($(CONFIG_DM_VERITY_FEC),y) dm-verity-objs += dm-verity-fec.o endif + +ifeq ($(CONFIG_DM_VERITY_AVB),y) +dm-verity-objs += dm-verity-avb.o +endif diff --git a/drivers/md/dm-verity-avb.c b/drivers/md/dm-verity-avb.c new file mode 100644 index 000000000000..88487346c4c6 --- /dev/null +++ b/drivers/md/dm-verity-avb.c @@ -0,0 +1,217 @@ +/* + * Copyright (C) 2017 Google. + * + * This file is released under the GPLv2. + * + * Based on drivers/md/dm-verity-chromeos.c + */ + +#include +#include +#include + +#define DM_MSG_PREFIX "verity-avb" + +/* Set via module parameter. */ +static char avb_vbmeta_device[64]; + +static void invalidate_vbmeta_endio(struct bio *bio) +{ + complete(bio->bi_private); +} + +static int invalidate_vbmeta_submit(struct bio *bio, + struct block_device *bdev, + int rw, int access_last_sector, + struct page *page) +{ + DECLARE_COMPLETION_ONSTACK(wait); + + bio->bi_private = &wait; + bio->bi_end_io = invalidate_vbmeta_endio; + bio->bi_bdev = bdev; + + bio->bi_iter.bi_sector = 0; + if (access_last_sector) { + sector_t last_sector = (i_size_read(bdev->bd_inode)>>SECTOR_SHIFT) - 1; + bio->bi_iter.bi_sector = last_sector; + } + bio->bi_vcnt = 1; + bio->bi_iter.bi_idx = 0; + bio->bi_iter.bi_size = 512; + bio->bi_iter.bi_bvec_done = 0; + bio->bi_rw = rw; + bio->bi_io_vec[0].bv_page = page; + bio->bi_io_vec[0].bv_len = 512; + bio->bi_io_vec[0].bv_offset = 0; + + submit_bio(rw, bio); + /* Wait up to 2 seconds for completion or fail. */ + if (!wait_for_completion_timeout(&wait, msecs_to_jiffies(2000))) + return -EIO; + return 0; +} + +static int invalidate_vbmeta(dev_t vbmeta_devt) +{ + int ret = 0; + struct block_device *bdev; + struct bio *bio; + struct page *page; + fmode_t dev_mode; + /* Ensure we do synchronous unblocked I/O. We may also need + * sync_bdev() on completion, but it really shouldn't. + */ + int rw = REQ_SYNC | REQ_SOFTBARRIER | REQ_NOIDLE; + int access_last_sector = 0; + + /* First we open the device for reading. */ + dev_mode = FMODE_READ | FMODE_EXCL; + bdev = blkdev_get_by_dev(vbmeta_devt, dev_mode, + invalidate_vbmeta); + if (IS_ERR(bdev)) { + DMERR("invalidate_kernel: could not open device for reading"); + dev_mode = 0; + ret = -ENOENT; + goto failed_to_read; + } + + bio = bio_alloc(GFP_NOIO, 1); + if (!bio) { + ret = -ENOMEM; + goto failed_bio_alloc; + } + + page = alloc_page(GFP_NOIO); + if (!page) { + ret = -ENOMEM; + goto failed_to_alloc_page; + } + + access_last_sector = 0; + ret = invalidate_vbmeta_submit(bio, bdev, rw, access_last_sector, page); + if (ret) { + DMERR("invalidate_vbmeta: error reading"); + goto failed_to_submit_read; + } + + /* We have a page. Let's make sure it looks right. */ + if (memcmp("AVB0", page_address(page), 4) == 0) { + /* Stamp it. */ + memcpy(page_address(page), "AVE0", 4); + DMINFO("invalidate_vbmeta: found vbmeta partition"); + } else { + /* Could be this is on a AVB footer, check. Also, since the + * AVB footer is in the last 64 bytes, adjust for the fact that + * we're dealing with 512-byte sectors. + */ + size_t offset = (1<bi_remaining. + */ + bio_reset(bio); + + rw |= REQ_WRITE; + ret = invalidate_vbmeta_submit(bio, bdev, rw, access_last_sector, page); + if (ret) { + DMERR("invalidate_vbmeta: error writing"); + goto failed_to_submit_write; + } + + DMERR("invalidate_vbmeta: completed."); + ret = 0; +failed_to_submit_write: +failed_to_write: +invalid_header: + __free_page(page); +failed_to_submit_read: + /* Technically, we'll leak a page with the pending bio, but + * we're about to reboot anyway. + */ +failed_to_alloc_page: + bio_put(bio); +failed_bio_alloc: + if (dev_mode) + blkdev_put(bdev, dev_mode); +failed_to_read: + return ret; +} + +void dm_verity_avb_error_handler(void) +{ + dev_t dev; + + DMINFO("AVB error handler called for %s", avb_vbmeta_device); + + if (avb_vbmeta_device[0] == '\0') { + DMERR("avb_vbmeta_device parameter not set"); + goto fail_no_dev; + } + + dev = name_to_dev_t(avb_vbmeta_device); + if (!dev) { + DMERR("No matching partition for device: %s", + avb_vbmeta_device); + goto fail_no_dev; + } + + invalidate_vbmeta(dev); + +fail_no_dev: + ; +} + +static int __init dm_verity_avb_init(void) +{ + DMINFO("AVB error handler initialized with vbmeta device: %s", + avb_vbmeta_device); + return 0; +} + +static void __exit dm_verity_avb_exit(void) +{ +} + +module_init(dm_verity_avb_init); +module_exit(dm_verity_avb_exit); + +MODULE_AUTHOR("David Zeuthen "); +MODULE_DESCRIPTION("AVB-specific error handler for dm-verity"); +MODULE_LICENSE("GPL"); + +/* Declare parameter with no module prefix */ +#undef MODULE_PARAM_PREFIX +#define MODULE_PARAM_PREFIX "androidboot.vbmeta." +module_param_string(device, avb_vbmeta_device, sizeof(avb_vbmeta_device), 0); diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c index c7e97cf6e7fb..e34cf53bd068 100644 --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -233,8 +233,12 @@ static int verity_handle_err(struct dm_verity *v, enum verity_block_type type, if (v->mode == DM_VERITY_MODE_LOGGING) return 0; - if (v->mode == DM_VERITY_MODE_RESTART) + if (v->mode == DM_VERITY_MODE_RESTART) { +#ifdef CONFIG_DM_VERITY_AVB + dm_verity_avb_error_handler(); +#endif kernel_restart("dm-verity device corrupted"); + } return 1; } diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h index 75effca400a3..a90d1d416107 100644 --- a/drivers/md/dm-verity.h +++ b/drivers/md/dm-verity.h @@ -136,4 +136,5 @@ extern void verity_io_hints(struct dm_target *ti, struct queue_limits *limits); extern void verity_dtr(struct dm_target *ti); extern int verity_ctr(struct dm_target *ti, unsigned argc, char **argv); extern int verity_map(struct dm_target *ti, struct bio *bio); +extern void dm_verity_avb_error_handler(void); #endif /* DM_VERITY_H */ From 1b8e8e800bd46c829d9a416a9b61630aef8d484f Mon Sep 17 00:00:00 2001 From: David Zeuthen Date: Tue, 24 Jan 2017 13:02:35 -0500 Subject: [PATCH 470/521] ANDROID: Update init/do_mounts_dm.c to the latest ChromiumOS version. This is needed for AVB integration work. Bug: 31796270 Test: Manually tested (other arch). Change-Id: I32fd37c1578c6414e3e6ff277d16ad94df7886b8 Signed-off-by: David Zeuthen --- include/linux/device-mapper.h | 7 + init/do_mounts_dm.c | 562 ++++++++++++++++++---------------- 2 files changed, 310 insertions(+), 259 deletions(-) diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index b874d5b61ffc..f3f87db34429 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -415,6 +415,13 @@ union map_info *dm_get_rq_mapinfo(struct request *rq); struct queue_limits *dm_get_queue_limits(struct mapped_device *md); +void dm_lock_md_type(struct mapped_device *md); +void dm_unlock_md_type(struct mapped_device *md); +void dm_set_md_type(struct mapped_device *md, unsigned type); +unsigned dm_get_md_type(struct mapped_device *md); +int dm_setup_md_queue(struct mapped_device *md); +unsigned dm_table_get_type(struct dm_table *t); + /* * Geometry functions. */ diff --git a/init/do_mounts_dm.c b/init/do_mounts_dm.c index ecda58df9a19..bce1c2fbb915 100644 --- a/init/do_mounts_dm.c +++ b/init/do_mounts_dm.c @@ -5,13 +5,17 @@ * * This file is released under the GPL. */ +#include +#include #include #include #include +#include #include "do_mounts.h" -#include "../drivers/md/dm.h" +#define DM_MAX_DEVICES 256 +#define DM_MAX_TARGETS 256 #define DM_MAX_NAME 32 #define DM_MAX_UUID 129 #define DM_NO_UUID "none" @@ -19,14 +23,47 @@ #define DM_MSG_PREFIX "init" /* Separators used for parsing the dm= argument. */ -#define DM_FIELD_SEP ' ' -#define DM_LINE_SEP ',' +#define DM_FIELD_SEP " " +#define DM_LINE_SEP "," +#define DM_ANY_SEP DM_FIELD_SEP DM_LINE_SEP /* * When the device-mapper and any targets are compiled into the kernel - * (not a module), one target may be created and used as the root device at - * boot time with the parameters given with the boot line dm=... - * The code for that is here. + * (not a module), one or more device-mappers may be created and used + * as the root device at boot time with the parameters given with the + * boot line dm=... + * + * Multiple device-mappers can be stacked specifing the number of + * devices. A device can have multiple targets if the the number of + * targets is specified. + * + * TODO(taysom:defect 32847) + * In the future, the field will be mandatory. + * + * ::= [] + + * ::= "," + + * ::= [] + * ::= "," + * ::= "ro" | "rw" + * ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | "none" + * ::= "verity" | "bootcache" | ... + * + * Example: + * 2 vboot none ro 1, + * 0 1768000 bootcache + * device=aa55b119-2a47-8c45-946a-5ac57765011f+1 + * signature=76e9be054b15884a9fa85973e9cb274c93afadb6 + * cache_start=1768000 max_blocks=100000 size_limit=23 max_trace=20000, + * vroot none ro 1, + * 0 1740800 verity payload=254:0 hashtree=254:0 hashstart=1740800 alg=sha1 + * root_hexdigest=76e9be054b15884a9fa85973e9cb274c93afadb6 + * salt=5b3549d54d6c7a3837b9b81ed72e49463a64c03680c47835bef94d768e5646fe + * + * Notes: + * 1. uuid is a label for the device and we set it to "none". + * 2. The field will be optional initially and assumed to be 1. + * Once all the scripts that set these fields have been set, it will + * be made mandatory. */ struct dm_setup_target { @@ -38,381 +75,388 @@ struct dm_setup_target { struct dm_setup_target *next; }; -static struct { +struct dm_device { int minor; int ro; char name[DM_MAX_NAME]; char uuid[DM_MAX_UUID]; - char *targets; + unsigned long num_targets; struct dm_setup_target *target; int target_count; + struct dm_device *next; +}; + +struct dm_option { + char *start; + char *next; + size_t len; + char delim; +}; + +static struct { + unsigned long num_devices; + char *str; } dm_setup_args __initdata; static __initdata int dm_early_setup; -static size_t __init get_dm_option(char *str, char **next, char sep) +static int __init get_dm_option(struct dm_option *opt, const char *accept) { - size_t len = 0; - char *endp = NULL; + char *str = opt->next; + char *endp; if (!str) return 0; - endp = strchr(str, sep); + str = skip_spaces(str); + opt->start = str; + endp = strpbrk(str, accept); if (!endp) { /* act like strchrnul */ - len = strlen(str); - endp = str + len; + opt->len = strlen(str); + endp = str + opt->len; } else { - len = endp - str; + opt->len = endp - str; } - - if (endp == str) - return 0; - - if (!next) - return len; - + opt->delim = *endp; if (*endp == 0) { /* Don't advance past the nul. */ - *next = endp; + opt->next = endp; } else { - *next = endp + 1; + opt->next = endp + 1; } - return len; + return opt->len != 0; } -static int __init dm_setup_args_init(void) +static int __init dm_setup_cleanup(struct dm_device *devices) { - dm_setup_args.minor = 0; - dm_setup_args.ro = 0; - dm_setup_args.target = NULL; - dm_setup_args.target_count = 0; + struct dm_device *dev = devices; + + while (dev) { + struct dm_device *old_dev = dev; + struct dm_setup_target *target = dev->target; + while (target) { + struct dm_setup_target *old_target = target; + kfree(target->type); + kfree(target->params); + target = target->next; + kfree(old_target); + dev->target_count--; + } + BUG_ON(dev->target_count); + dev = dev->next; + kfree(old_dev); + } return 0; } -static int __init dm_setup_cleanup(void) +static char * __init dm_parse_device(struct dm_device *dev, char *str) { - struct dm_setup_target *target = dm_setup_args.target; - struct dm_setup_target *old_target = NULL; - while (target) { - kfree(target->type); - kfree(target->params); - old_target = target; - target = target->next; - kfree(old_target); - dm_setup_args.target_count--; - } - BUG_ON(dm_setup_args.target_count); - return 0; -} - -static char * __init dm_setup_parse_device_args(char *str) -{ - char *next = NULL; - size_t len = 0; + struct dm_option opt; + size_t len; /* Grab the logical name of the device to be exported to udev */ - len = get_dm_option(str, &next, DM_FIELD_SEP); - if (!len) { + opt.next = str; + if (!get_dm_option(&opt, DM_FIELD_SEP)) { DMERR("failed to parse device name"); goto parse_fail; } - len = min(len + 1, sizeof(dm_setup_args.name)); - strlcpy(dm_setup_args.name, str, len); /* includes nul */ - str = skip_spaces(next); + len = min(opt.len + 1, sizeof(dev->name)); + strlcpy(dev->name, opt.start, len); /* includes nul */ /* Grab the UUID value or "none" */ - len = get_dm_option(str, &next, DM_FIELD_SEP); - if (!len) { + if (!get_dm_option(&opt, DM_FIELD_SEP)) { DMERR("failed to parse device uuid"); goto parse_fail; } - len = min(len + 1, sizeof(dm_setup_args.uuid)); - strlcpy(dm_setup_args.uuid, str, len); - str = skip_spaces(next); + len = min(opt.len + 1, sizeof(dev->uuid)); + strlcpy(dev->uuid, opt.start, len); /* Determine if the table/device will be read only or read-write */ - if (!strncmp("ro,", str, 3)) { - dm_setup_args.ro = 1; - } else if (!strncmp("rw,", str, 3)) { - dm_setup_args.ro = 0; + get_dm_option(&opt, DM_ANY_SEP); + if (!strncmp("ro", opt.start, opt.len)) { + dev->ro = 1; + } else if (!strncmp("rw", opt.start, opt.len)) { + dev->ro = 0; } else { DMERR("failed to parse table mode"); goto parse_fail; } - str = skip_spaces(str + 3); - return str; + /* Optional number field */ + /* XXX: The field will be mandatory in the next round */ + if (opt.delim == DM_FIELD_SEP[0]) { + if (!get_dm_option(&opt, DM_LINE_SEP)) + return NULL; + dev->num_targets = simple_strtoul(opt.start, NULL, 10); + } else { + dev->num_targets = 1; + } + if (dev->num_targets > DM_MAX_TARGETS) { + DMERR("too many targets %lu > %d", + dev->num_targets, DM_MAX_TARGETS); + } + return opt.next; parse_fail: return NULL; } -static void __init dm_substitute_devices(char *str, size_t str_len) +static char * __init dm_parse_targets(struct dm_device *dev, char *str) { - char *candidate = str; - char *candidate_end = str; - char old_char; - size_t len = 0; - dev_t dev; - - if (str_len < 3) - return; - - while (str && *str) { - candidate = strchr(str, '/'); - if (!candidate) - break; - - /* Avoid embedded slashes */ - if (candidate != str && *(candidate - 1) != DM_FIELD_SEP) { - str = strchr(candidate, DM_FIELD_SEP); - continue; - } - - len = get_dm_option(candidate, &candidate_end, DM_FIELD_SEP); - str = skip_spaces(candidate_end); - if (len < 3 || len > 37) /* name_to_dev_t max; maj:mix min */ - continue; - - /* Temporarily terminate with a nul */ - if (*candidate_end) - candidate_end--; - old_char = *candidate_end; - *candidate_end = '\0'; - - DMDEBUG("converting candidate device '%s' to dev_t", candidate); - /* Use the boot-time specific device naming */ - dev = name_to_dev_t(candidate); - *candidate_end = old_char; - - DMDEBUG(" -> %u", dev); - /* No suitable replacement found */ - if (!dev) - continue; - - /* Rewrite the /dev/path as a major:minor */ - len = snprintf(candidate, len, "%u:%u", MAJOR(dev), MINOR(dev)); - if (!len) { - DMERR("error substituting device major/minor."); - break; - } - candidate += len; - /* Pad out with spaces (fixing our nul) */ - while (candidate < candidate_end) - *(candidate++) = DM_FIELD_SEP; - } -} - -static int __init dm_setup_parse_targets(char *str) -{ - char *next = NULL; - size_t len = 0; - struct dm_setup_target **target = NULL; + struct dm_option opt; + struct dm_setup_target **target = &dev->target; + unsigned long num_targets = dev->num_targets; + unsigned long i; /* Targets are defined as per the table format but with a * comma as a newline separator. */ - target = &dm_setup_args.target; - while (str && *str) { + opt.next = str; + for (i = 0; i < num_targets; i++) { *target = kzalloc(sizeof(struct dm_setup_target), GFP_KERNEL); if (!*target) { - DMERR("failed to allocate memory for target %d", - dm_setup_args.target_count); + DMERR("failed to allocate memory for target %s<%ld>", + dev->name, i); goto parse_fail; } - dm_setup_args.target_count++; + dev->target_count++; - (*target)->begin = simple_strtoull(str, &next, 10); - if (!next || *next != DM_FIELD_SEP) { - DMERR("failed to parse starting sector for target %d", - dm_setup_args.target_count - 1); + if (!get_dm_option(&opt, DM_FIELD_SEP)) { + DMERR("failed to parse starting sector" + " for target %s<%ld>", dev->name, i); goto parse_fail; } - str = skip_spaces(next + 1); + (*target)->begin = simple_strtoull(opt.start, NULL, 10); - (*target)->length = simple_strtoull(str, &next, 10); - if (!next || *next != DM_FIELD_SEP) { - DMERR("failed to parse length for target %d", - dm_setup_args.target_count - 1); + if (!get_dm_option(&opt, DM_FIELD_SEP)) { + DMERR("failed to parse length for target %s<%ld>", + dev->name, i); goto parse_fail; } - str = skip_spaces(next + 1); + (*target)->length = simple_strtoull(opt.start, NULL, 10); - len = get_dm_option(str, &next, DM_FIELD_SEP); - if (!len || - !((*target)->type = kstrndup(str, len, GFP_KERNEL))) { - DMERR("failed to parse type for target %d", - dm_setup_args.target_count - 1); + if (get_dm_option(&opt, DM_FIELD_SEP)) + (*target)->type = kstrndup(opt.start, opt.len, + GFP_KERNEL); + if (!((*target)->type)) { + DMERR("failed to parse type for target %s<%ld>", + dev->name, i); goto parse_fail; } - str = skip_spaces(next); - - len = get_dm_option(str, &next, DM_LINE_SEP); - if (!len || - !((*target)->params = kstrndup(str, len, GFP_KERNEL))) { - DMERR("failed to parse params for target %d", - dm_setup_args.target_count - 1); + if (get_dm_option(&opt, DM_LINE_SEP)) + (*target)->params = kstrndup(opt.start, opt.len, + GFP_KERNEL); + if (!((*target)->params)) { + DMERR("failed to parse params for target %s<%ld>", + dev->name, i); goto parse_fail; } - str = skip_spaces(next); - - /* Before moving on, walk through the copied target and - * attempt to replace all /dev/xxx with the major:minor number. - * It may not be possible to resolve them traditionally at - * boot-time. */ - dm_substitute_devices((*target)->params, len); - target = &((*target)->next); } - DMDEBUG("parsed %d targets", dm_setup_args.target_count); + DMDEBUG("parsed %d targets", dev->target_count); - return 0; + return opt.next; parse_fail: - return 1; + return NULL; +} + +static struct dm_device * __init dm_parse_args(void) +{ + struct dm_device *devices = NULL; + struct dm_device **tail = &devices; + struct dm_device *dev; + char *str = dm_setup_args.str; + unsigned long num_devices = dm_setup_args.num_devices; + unsigned long i; + + if (!str) + return NULL; + for (i = 0; i < num_devices; i++) { + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) { + DMERR("failed to allocated memory for dev"); + goto error; + } + *tail = dev; + tail = &dev->next; + /* + * devices are given minor numbers 0 - n-1 + * in the order they are found in the arg + * string. + */ + dev->minor = i; + str = dm_parse_device(dev, str); + if (!str) /* NULL indicates error in parsing, bail */ + goto error; + + str = dm_parse_targets(dev, str); + if (!str) + goto error; + } + return devices; +error: + dm_setup_cleanup(devices); + return NULL; } /* * Parse the command-line parameters given our kernel, but do not * actually try to invoke the DM device now; that is handled by - * dm_setup_drive after the low-level disk drivers have initialised. - * dm format is as follows: - * dm="name uuid fmode,[table line 1],[table line 2],..." - * May be used with root=/dev/dm-0 as it always uses the first dm minor. + * dm_setup_drives after the low-level disk drivers have initialised. + * dm format is described at the top of the file. + * + * Because dm minor numbers are assigned in assending order starting with 0, + * You can assume the first device is /dev/dm-0, the next device is /dev/dm-1, + * and so forth. */ - static int __init dm_setup(char *str) { - dm_setup_args_init(); + struct dm_option opt; + unsigned long num_devices; - str = dm_setup_parse_device_args(str); if (!str) { DMDEBUG("str is NULL"); goto parse_fail; } - - /* Target parsing is delayed until we have dynamic memory */ - dm_setup_args.targets = str; - - printk(KERN_INFO "dm: will configure '%s' on dm-%d\n", - dm_setup_args.name, dm_setup_args.minor); - + opt.next = str; + if (!get_dm_option(&opt, DM_FIELD_SEP)) + goto parse_fail; + if (isdigit(opt.start[0])) { /* XXX: Optional number field */ + num_devices = simple_strtoul(opt.start, NULL, 10); + str = opt.next; + } else { + num_devices = 1; + /* Don't advance str */ + } + if (num_devices > DM_MAX_DEVICES) { + DMDEBUG("too many devices %lu > %d", + num_devices, DM_MAX_DEVICES); + } + dm_setup_args.str = str; + dm_setup_args.num_devices = num_devices; + DMINFO("will configure %lu devices", num_devices); dm_early_setup = 1; return 1; parse_fail: - printk(KERN_WARNING "dm: Invalid arguments supplied to dm=.\n"); + DMWARN("Invalid arguments supplied to dm=."); return 0; } - -static void __init dm_setup_drive(void) +static void __init dm_setup_drives(void) { struct mapped_device *md = NULL; struct dm_table *table = NULL; struct dm_setup_target *target; - char *uuid = dm_setup_args.uuid; + struct dm_device *dev; + char *uuid; fmode_t fmode = FMODE_READ; + struct dm_device *devices; - /* Finish parsing the targets. */ - if (dm_setup_parse_targets(dm_setup_args.targets)) - goto parse_fail; + devices = dm_parse_args(); - if (dm_create(dm_setup_args.minor, &md)) { - DMDEBUG("failed to create the device"); - goto dm_create_fail; - } - DMDEBUG("created device '%s'", dm_device_name(md)); - - /* In addition to flagging the table below, the disk must be - * set explicitly ro/rw. */ - set_disk_ro(dm_disk(md), dm_setup_args.ro); - - if (!dm_setup_args.ro) - fmode |= FMODE_WRITE; - if (dm_table_create(&table, fmode, dm_setup_args.target_count, md)) { - DMDEBUG("failed to create the table"); - goto dm_table_create_fail; - } - - dm_lock_md_type(md); - target = dm_setup_args.target; - while (target) { - DMINFO("adding target '%llu %llu %s %s'", - (unsigned long long) target->begin, - (unsigned long long) target->length, target->type, - target->params); - if (dm_table_add_target(table, target->type, target->begin, - target->length, target->params)) { - DMDEBUG("failed to add the target to the table"); - goto add_target_fail; + for (dev = devices; dev; dev = dev->next) { + if (dm_create(dev->minor, &md)) { + DMDEBUG("failed to create the device"); + goto dm_create_fail; } - target = target->next; - } + DMDEBUG("created device '%s'", dm_device_name(md)); - if (dm_table_complete(table)) { - DMDEBUG("failed to complete the table"); - goto table_complete_fail; - } + /* + * In addition to flagging the table below, the disk must be + * set explicitly ro/rw. + */ + set_disk_ro(dm_disk(md), dev->ro); - if (dm_get_md_type(md) == DM_TYPE_NONE) { + if (!dev->ro) + fmode |= FMODE_WRITE; + if (dm_table_create(&table, fmode, dev->target_count, md)) { + DMDEBUG("failed to create the table"); + goto dm_table_create_fail; + } + + dm_lock_md_type(md); + + for (target = dev->target; target; target = target->next) { + DMINFO("adding target '%llu %llu %s %s'", + (unsigned long long) target->begin, + (unsigned long long) target->length, + target->type, target->params); + if (dm_table_add_target(table, target->type, + target->begin, + target->length, + target->params)) { + DMDEBUG("failed to add the target" + " to the table"); + goto add_target_fail; + } + } + if (dm_table_complete(table)) { + DMDEBUG("failed to complete the table"); + goto table_complete_fail; + } + + /* Suspend the device so that we can bind it to the table. */ + if (dm_suspend(md, 0)) { + DMDEBUG("failed to suspend the device pre-bind"); + goto suspend_fail; + } + + /* Initial table load: acquire type of table. */ dm_set_md_type(md, dm_table_get_type(table)); + + /* Setup md->queue to reflect md's type. */ if (dm_setup_md_queue(md)) { DMWARN("unable to set up device queue for new table."); goto setup_md_queue_fail; } - } else if (dm_get_md_type(md) != dm_table_get_type(table)) { - DMWARN("can't change device type after initial table load."); - goto setup_md_queue_fail; - } - /* Suspend the device so that we can bind it to the table. */ - if (dm_suspend(md, 0)) { - DMDEBUG("failed to suspend the device pre-bind"); - goto suspend_fail; + /* + * Bind the table to the device. This is the only way + * to associate md->map with the table and set the disk + * capacity directly. + */ + if (dm_swap_table(md, table)) { /* should return NULL. */ + DMDEBUG("failed to bind the device to the table"); + goto table_bind_fail; + } + + /* Finally, resume and the device should be ready. */ + if (dm_resume(md)) { + DMDEBUG("failed to resume the device"); + goto resume_fail; + } + + /* Export the dm device via the ioctl interface */ + if (!strcmp(DM_NO_UUID, dev->uuid)) + uuid = NULL; + if (dm_ioctl_export(md, dev->name, uuid)) { + DMDEBUG("failed to export device with given" + " name and uuid"); + goto export_fail; + } + + dm_unlock_md_type(md); + + DMINFO("dm-%d is ready", dev->minor); } - - /* Bind the table to the device. This is the only way to associate - * md->map with the table and set the disk capacity directly. */ - if (dm_swap_table(md, table)) { /* should return NULL. */ - DMDEBUG("failed to bind the device to the table"); - goto table_bind_fail; - } - - /* Finally, resume and the device should be ready. */ - if (dm_resume(md)) { - DMDEBUG("failed to resume the device"); - goto resume_fail; - } - - /* Export the dm device via the ioctl interface */ - if (!strcmp(DM_NO_UUID, dm_setup_args.uuid)) - uuid = NULL; - if (dm_ioctl_export(md, dm_setup_args.name, uuid)) { - DMDEBUG("failed to export device with given name and uuid"); - goto export_fail; - } - printk(KERN_INFO "dm: dm-%d is ready\n", dm_setup_args.minor); - - dm_unlock_md_type(md); - dm_setup_cleanup(); + dm_setup_cleanup(devices); return; export_fail: resume_fail: table_bind_fail: -suspend_fail: setup_md_queue_fail: +suspend_fail: table_complete_fail: add_target_fail: dm_unlock_md_type(md); dm_table_create_fail: dm_put(md); dm_create_fail: - dm_setup_cleanup(); -parse_fail: - printk(KERN_WARNING "dm: starting dm-%d (%s) failed\n", - dm_setup_args.minor, dm_setup_args.name); + DMWARN("starting dm-%d (%s) failed", + dev->minor, dev->name); + dm_setup_cleanup(devices); } __setup("dm=", dm_setup); @@ -421,6 +465,6 @@ void __init dm_run_setup(void) { if (!dm_early_setup) return; - printk(KERN_INFO "dm: attempting early device configuration.\n"); - dm_setup_drive(); + DMINFO("attempting early device configuration."); + dm_setup_drives(); } From 9b2a5cdae059228e8039c0a779c4b77006b79438 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 20 Apr 2017 18:05:02 -0700 Subject: [PATCH 471/521] ANDROID: sdcardfs: Use filesystem specific hash We weren't accounting for FS specific hash functions, causing us to miss negative dentries for any FS that had one. Similar to a patch from esdfs commit 75bd25a9476d ("esdfs: support lower's own hash") Signed-off-by: Daniel Rosenberg Change-Id: I32d1ba304d728e0ca2648cacfb4c2e441ae63608 --- fs/sdcardfs/lookup.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index a0f221501b4c..446ef4027ebc 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -366,8 +366,13 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, /* instatiate a new negative dentry */ dname.name = name->name; dname.len = name->len; - dname.hash = full_name_hash(dname.name, dname.len); - lower_dentry = d_lookup(lower_dir_dentry, &dname); + + /* See if the low-level filesystem might want + * to use its own hash + */ + lower_dentry = d_hash_and_lookup(lower_dir_dentry, &dname); + if (IS_ERR(lower_dentry)) + return lower_dentry; if (lower_dentry) goto setup_lower; From 28002808ae1cb2967187c9e70a318e31dfc7ed72 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 20 Apr 2017 18:21:50 -0700 Subject: [PATCH 472/521] Revert "Revert "Android: sdcardfs: Don't do d_add for lower fs"" This reverts commit ffa75fdb9c408f49b9622b6d55752ed99ff61488. Turns out we just needed the right hash. Signed-off-by: Daniel Rosenberg Bug: 37231161 Change-Id: I6a6de7f7df99ad42b20fa062913b219f64020c31 --- fs/sdcardfs/lookup.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c index 446ef4027ebc..509d5fbcb472 100644 --- a/fs/sdcardfs/lookup.c +++ b/fs/sdcardfs/lookup.c @@ -373,17 +373,15 @@ static struct dentry *__sdcardfs_lookup(struct dentry *dentry, lower_dentry = d_hash_and_lookup(lower_dir_dentry, &dname); if (IS_ERR(lower_dentry)) return lower_dentry; - if (lower_dentry) - goto setup_lower; - - lower_dentry = d_alloc(lower_dir_dentry, &dname); if (!lower_dentry) { - err = -ENOMEM; + /* We called vfs_path_lookup earlier, and did not get a negative + * dentry then. Don't confuse the lower filesystem by forcing + * one on it now... + */ + err = -ENOENT; goto out; } - d_add(lower_dentry, NULL); /* instantiate and hash */ -setup_lower: lower_path.dentry = lower_dentry; lower_path.mnt = mntget(lower_dir_mnt); sdcardfs_set_lower_path(dentry, &lower_path); From e6eb0729b6145fc90613bc56e5903bf4f1d9f43f Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Mon, 24 Apr 2017 16:10:21 -0700 Subject: [PATCH 473/521] ANDROID: sdcardfs: Copy meta-data from lower inode From wrapfs commit 3ee9b365e38c ("Wrapfs: properly copy meta-data after AIO operations from lower inode") Signed-off-by: Daniel Rosenberg Bug: 35766959 Change-Id: I9a789222e27a17b8d85ce61c45397d1839f9a675 --- fs/sdcardfs/file.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c index 1f6921e2ffbf..6076c342dae6 100644 --- a/fs/sdcardfs/file.c +++ b/fs/sdcardfs/file.c @@ -358,9 +358,12 @@ ssize_t sdcardfs_read_iter(struct kiocb *iocb, struct iov_iter *iter) get_file(lower_file); /* prevent lower_file from being released */ iocb->ki_filp = lower_file; err = lower_file->f_op->read_iter(iocb, iter); - /* ? wait IO finish to update atime as ecryptfs ? */ iocb->ki_filp = file; fput(lower_file); + /* update upper inode atime as needed */ + if (err >= 0 || err == -EIOCBQUEUED) + fsstack_copy_attr_atime(file->f_path.dentry->d_inode, + file_inode(lower_file)); out: return err; } @@ -384,6 +387,13 @@ ssize_t sdcardfs_write_iter(struct kiocb *iocb, struct iov_iter *iter) err = lower_file->f_op->write_iter(iocb, iter); iocb->ki_filp = file; fput(lower_file); + /* update upper inode times/sizes as needed */ + if (err >= 0 || err == -EIOCBQUEUED) { + fsstack_copy_inode_size(file->f_path.dentry->d_inode, + file_inode(lower_file)); + fsstack_copy_attr_times(file->f_path.dentry->d_inode, + file_inode(lower_file)); + } out: return err; } From 1e64884a121b9d43a79370ddacb749c3873ac3e5 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Mon, 24 Apr 2017 16:11:03 -0700 Subject: [PATCH 474/521] ANDROID: sdcardfs: Avoid setting GIDs outside of valid ranges When setting up the ownership of files on the lower filesystem, ensure that these values are in reasonable ranges for apps. If they aren't, default to AID_MEDIA_RW Signed-off-by: Daniel Rosenberg Bug: 37516160 Change-Id: I0bec76a61ac72aff0b993ab1ad04be8382178a00 --- fs/sdcardfs/derived_perm.c | 8 ++++---- fs/sdcardfs/multiuser.h | 7 +++++++ 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c index cddfc79ea365..b4595aab5713 100644 --- a/fs/sdcardfs/derived_perm.c +++ b/fs/sdcardfs/derived_perm.c @@ -215,16 +215,16 @@ void fixup_lower_ownership(struct dentry *dentry, const char *name) gid = AID_MEDIA_OBB; break; case PERM_ANDROID_PACKAGE: - if (info->d_uid != 0) + if (uid_is_app(info->d_uid)) gid = multiuser_get_ext_gid(info->d_uid); else - gid = multiuser_get_uid(info->userid, uid); + gid = multiuser_get_uid(info->userid, AID_MEDIA_RW); break; case PERM_ANDROID_PACKAGE_CACHE: - if (info->d_uid != 0) + if (uid_is_app(info->d_uid)) gid = multiuser_get_ext_cache_gid(info->d_uid); else - gid = multiuser_get_uid(info->userid, uid); + gid = multiuser_get_uid(info->userid, AID_MEDIA_RW); break; case PERM_PRE_ROOT: default: diff --git a/fs/sdcardfs/multiuser.h b/fs/sdcardfs/multiuser.h index d0c925cda299..85341e753f8c 100644 --- a/fs/sdcardfs/multiuser.h +++ b/fs/sdcardfs/multiuser.h @@ -35,6 +35,13 @@ static inline uid_t multiuser_get_uid(userid_t user_id, appid_t app_id) return (user_id * AID_USER_OFFSET) + (app_id % AID_USER_OFFSET); } +static inline bool uid_is_app(uid_t uid) +{ + appid_t appid = uid % AID_USER_OFFSET; + + return appid >= AID_APP_START && appid <= AID_APP_END; +} + static inline gid_t multiuser_get_ext_cache_gid(uid_t uid) { return uid - AID_APP_START + AID_EXT_CACHE_GID_START; From 71e5025eb878c1d74ced556ff825b50b9fcd4820 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Mon, 24 Apr 2017 19:49:02 -0700 Subject: [PATCH 475/521] ANDROID: sdcardfs: Call lower fs's revalidate We should be calling the lower filesystem's revalidate inside of sdcardfs's revalidate, as wrapfs does. Signed-off-by: Daniel Rosenberg Bug: 35766959 Change-Id: I939d1c4192fafc1e21678aeab43fe3d588b8e2f4 --- fs/sdcardfs/dentry.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/sdcardfs/dentry.c b/fs/sdcardfs/dentry.c index 8e31d1a80f0c..7a19e77fce99 100644 --- a/fs/sdcardfs/dentry.c +++ b/fs/sdcardfs/dentry.c @@ -60,6 +60,14 @@ static int sdcardfs_d_revalidate(struct dentry *dentry, unsigned int flags) lower_dentry = lower_path.dentry; lower_cur_parent_dentry = dget_parent(lower_dentry); + if ((lower_dentry->d_flags & DCACHE_OP_REVALIDATE)) { + err = lower_dentry->d_op->d_revalidate(lower_dentry, flags); + if (err == 0) { + d_drop(dentry); + goto out; + } + } + spin_lock(&lower_dentry->d_lock); if (d_unhashed(lower_dentry)) { spin_unlock(&lower_dentry->d_lock); From d69a900de346b6da958ceb3711d3becd55a7ae71 Mon Sep 17 00:00:00 2001 From: Jin Qian Date: Mon, 24 Apr 2017 18:20:52 -0700 Subject: [PATCH 476/521] BACKPORT: f2fs: sanity check log_blocks_per_seg f2fs currently only supports 4KB block size and 2MB segment size. Sanity check log_blocks_per_seg == 9, i.e. 2MB/4KB = (1 << 9) Partially (cherry-picked from commit 9a59b62fd88196844cee5fff851bee2cfd7afb6e) f2fs: do more integrity verification for superblock Do more sanity check for superblock during ->mount. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Bug: 36817013 Change-Id: I0be52e54fba82083068337ceb9f7ad985a87319f Signed-off-by: Jin Qian --- fs/f2fs/super.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 3a65e0132352..98a77b0a365d 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -947,6 +947,14 @@ static int sanity_check_raw_super(struct super_block *sb, return 1; } + /* check log blocks per segment */ + if (le32_to_cpu(raw_super->log_blocks_per_seg) != 9) { + f2fs_msg(sb, KERN_INFO, + "Invalid log blocks per segment (%u)\n", + le32_to_cpu(raw_super->log_blocks_per_seg)); + return 1; + } + /* Currently, support 512/1024/2048/4096 bytes sector size */ if (le32_to_cpu(raw_super->log_sectorsize) > F2FS_MAX_LOG_SECTOR_SIZE || From 5fac43b0f151ac03417229eeb1c1ed517cab7569 Mon Sep 17 00:00:00 2001 From: Ganesh Mahendran Date: Tue, 25 Apr 2017 18:07:43 +0800 Subject: [PATCH 477/521] ANDROID: uid_sys_stats: fix access of task_uid(task) struct task_struct *task should be proteced by tasklist_lock. Change-Id: Iefcd13442a9b9d855a2bbcde9fd838a4132fee58 Signed-off-by: Ganesh Mahendran --- drivers/misc/uid_sys_stats.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/misc/uid_sys_stats.c b/drivers/misc/uid_sys_stats.c index 618617f3e867..ad21276c8d9e 100644 --- a/drivers/misc/uid_sys_stats.c +++ b/drivers/misc/uid_sys_stats.c @@ -95,9 +95,11 @@ static int uid_cputime_show(struct seq_file *m, void *v) { struct uid_entry *uid_entry; struct task_struct *task, *temp; + struct user_namespace *user_ns = current_user_ns(); cputime_t utime; cputime_t stime; unsigned long bkt; + uid_t uid; rt_mutex_lock(&uid_lock); @@ -108,14 +110,13 @@ static int uid_cputime_show(struct seq_file *m, void *v) read_lock(&tasklist_lock); do_each_thread(temp, task) { - uid_entry = find_or_register_uid(from_kuid_munged( - current_user_ns(), task_uid(task))); + uid = from_kuid_munged(user_ns, task_uid(task)); + uid_entry = find_or_register_uid(uid); if (!uid_entry) { read_unlock(&tasklist_lock); rt_mutex_unlock(&uid_lock); pr_err("%s: failed to find the uid_entry for uid %d\n", - __func__, from_kuid_munged(current_user_ns(), - task_uid(task))); + __func__, uid); return -ENOMEM; } task_cputime_adjusted(task, &utime, &stime); From 716bcfeb12b8d55d278af47b927839b382d2837a Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Tue, 15 Dec 2015 09:58:18 +0800 Subject: [PATCH 478/521] f2fs: do more integrity verification for superblock commit 9a59b62fd88196844cee5fff851bee2cfd7afb6e upstream. Do more sanity check for superblock during ->mount. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/super.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 3a65e0132352..16462e702f96 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -918,6 +918,79 @@ static loff_t max_file_size(unsigned bits) return result; } +static inline bool sanity_check_area_boundary(struct super_block *sb, + struct f2fs_super_block *raw_super) +{ + u32 segment0_blkaddr = le32_to_cpu(raw_super->segment0_blkaddr); + u32 cp_blkaddr = le32_to_cpu(raw_super->cp_blkaddr); + u32 sit_blkaddr = le32_to_cpu(raw_super->sit_blkaddr); + u32 nat_blkaddr = le32_to_cpu(raw_super->nat_blkaddr); + u32 ssa_blkaddr = le32_to_cpu(raw_super->ssa_blkaddr); + u32 main_blkaddr = le32_to_cpu(raw_super->main_blkaddr); + u32 segment_count_ckpt = le32_to_cpu(raw_super->segment_count_ckpt); + u32 segment_count_sit = le32_to_cpu(raw_super->segment_count_sit); + u32 segment_count_nat = le32_to_cpu(raw_super->segment_count_nat); + u32 segment_count_ssa = le32_to_cpu(raw_super->segment_count_ssa); + u32 segment_count_main = le32_to_cpu(raw_super->segment_count_main); + u32 segment_count = le32_to_cpu(raw_super->segment_count); + u32 log_blocks_per_seg = le32_to_cpu(raw_super->log_blocks_per_seg); + + if (segment0_blkaddr != cp_blkaddr) { + f2fs_msg(sb, KERN_INFO, + "Mismatch start address, segment0(%u) cp_blkaddr(%u)", + segment0_blkaddr, cp_blkaddr); + return true; + } + + if (cp_blkaddr + (segment_count_ckpt << log_blocks_per_seg) != + sit_blkaddr) { + f2fs_msg(sb, KERN_INFO, + "Wrong CP boundary, start(%u) end(%u) blocks(%u)", + cp_blkaddr, sit_blkaddr, + segment_count_ckpt << log_blocks_per_seg); + return true; + } + + if (sit_blkaddr + (segment_count_sit << log_blocks_per_seg) != + nat_blkaddr) { + f2fs_msg(sb, KERN_INFO, + "Wrong SIT boundary, start(%u) end(%u) blocks(%u)", + sit_blkaddr, nat_blkaddr, + segment_count_sit << log_blocks_per_seg); + return true; + } + + if (nat_blkaddr + (segment_count_nat << log_blocks_per_seg) != + ssa_blkaddr) { + f2fs_msg(sb, KERN_INFO, + "Wrong NAT boundary, start(%u) end(%u) blocks(%u)", + nat_blkaddr, ssa_blkaddr, + segment_count_nat << log_blocks_per_seg); + return true; + } + + if (ssa_blkaddr + (segment_count_ssa << log_blocks_per_seg) != + main_blkaddr) { + f2fs_msg(sb, KERN_INFO, + "Wrong SSA boundary, start(%u) end(%u) blocks(%u)", + ssa_blkaddr, main_blkaddr, + segment_count_ssa << log_blocks_per_seg); + return true; + } + + if (main_blkaddr + (segment_count_main << log_blocks_per_seg) != + segment0_blkaddr + (segment_count << log_blocks_per_seg)) { + f2fs_msg(sb, KERN_INFO, + "Wrong MAIN_AREA boundary, start(%u) end(%u) blocks(%u)", + main_blkaddr, + segment0_blkaddr + (segment_count << log_blocks_per_seg), + segment_count_main << log_blocks_per_seg); + return true; + } + + return false; +} + static int sanity_check_raw_super(struct super_block *sb, struct f2fs_super_block *raw_super) { @@ -947,6 +1020,14 @@ static int sanity_check_raw_super(struct super_block *sb, return 1; } + /* check log blocks per segment */ + if (le32_to_cpu(raw_super->log_blocks_per_seg) != 9) { + f2fs_msg(sb, KERN_INFO, + "Invalid log blocks per segment (%u)\n", + le32_to_cpu(raw_super->log_blocks_per_seg)); + return 1; + } + /* Currently, support 512/1024/2048/4096 bytes sector size */ if (le32_to_cpu(raw_super->log_sectorsize) > F2FS_MAX_LOG_SECTOR_SIZE || @@ -965,6 +1046,23 @@ static int sanity_check_raw_super(struct super_block *sb, le32_to_cpu(raw_super->log_sectorsize)); return 1; } + + /* check reserved ino info */ + if (le32_to_cpu(raw_super->node_ino) != 1 || + le32_to_cpu(raw_super->meta_ino) != 2 || + le32_to_cpu(raw_super->root_ino) != 3) { + f2fs_msg(sb, KERN_INFO, + "Invalid Fs Meta Ino: node(%u) meta(%u) root(%u)", + le32_to_cpu(raw_super->node_ino), + le32_to_cpu(raw_super->meta_ino), + le32_to_cpu(raw_super->root_ino)); + return 1; + } + + /* check CP/SIT/NAT/SSA/MAIN_AREA area boundary */ + if (sanity_check_area_boundary(sb, raw_super)) + return 1; + return 0; } From bd2d6cb00d1aee5df63dc95aedaf1f2b2a7d9d4e Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 3 Feb 2016 13:34:00 -0200 Subject: [PATCH 479/521] xc2028: unlock on error in xc2028_set_config() commit 210bd104c6acd31c3c6b8b075b3f12d4a9f6b60d upstream. We have to unlock before returning -ENOMEM. Fixes: 8dfbcc4351a0 ('[media] xc2028: avoid use after free') Signed-off-by: Dan Carpenter Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/tuners/tuner-xc2028.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/media/tuners/tuner-xc2028.c b/drivers/media/tuners/tuner-xc2028.c index 082ff5608455..317ef63ee789 100644 --- a/drivers/media/tuners/tuner-xc2028.c +++ b/drivers/media/tuners/tuner-xc2028.c @@ -1407,8 +1407,10 @@ static int xc2028_set_config(struct dvb_frontend *fe, void *priv_cfg) memcpy(&priv->ctrl, p, sizeof(priv->ctrl)); if (p->fname) { priv->ctrl.fname = kstrdup(p->fname, GFP_KERNEL); - if (priv->ctrl.fname == NULL) - return -ENOMEM; + if (priv->ctrl.fname == NULL) { + rc = -ENOMEM; + goto unlock; + } } /* @@ -1440,6 +1442,7 @@ static int xc2028_set_config(struct dvb_frontend *fe, void *priv_cfg) } else priv->state = XC2028_WAITING_FIRMWARE; } +unlock: mutex_unlock(&priv->lock); return rc; From 531be60fc5804318989061178fe8a34eff561e03 Mon Sep 17 00:00:00 2001 From: Tero Kristo Date: Thu, 16 Jun 2016 15:25:18 +0300 Subject: [PATCH 480/521] ARM: OMAP2+: timer: add probe for clocksources commit 970f9091d25df14e9540ec7ff48a2f709e284cd1 upstream. A few platforms are currently missing clocksource_probe() completely in their time_init functionality. On OMAP3430 for example, this is causing cpuidle to be pretty much dead, as the counter32k is not going to be registered and instead a gptimer is used as a clocksource. This will tick in periodic mode, preventing any deeper idle states. While here, also drop one unnecessary check for populated DT before existing clocksource_probe() call. Signed-off-by: Tero Kristo Signed-off-by: Tony Lindgren Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-omap2/timer.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-omap2/timer.c b/arch/arm/mach-omap2/timer.c index f86692dbcfd5..83fc403aec3c 100644 --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -496,8 +496,7 @@ void __init omap_init_time(void) __omap_sync32k_timer_init(1, "timer_32k_ck", "ti,timer-alwon", 2, "timer_sys_ck", NULL, false); - if (of_have_populated_dt()) - clocksource_probe(); + clocksource_probe(); } #if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM43XX) @@ -505,6 +504,8 @@ void __init omap3_secure_sync32k_timer_init(void) { __omap_sync32k_timer_init(12, "secure_32k_fck", "ti,timer-secure", 2, "timer_sys_ck", NULL, false); + + clocksource_probe(); } #endif /* CONFIG_ARCH_OMAP3 */ @@ -513,6 +514,8 @@ void __init omap3_gptimer_timer_init(void) { __omap_sync32k_timer_init(2, "timer_sys_ck", NULL, 1, "timer_sys_ck", "ti,timer-alwon", true); + + clocksource_probe(); } #endif From 40a55e4f9401499ecf0d9f9076ba06f48822d1aa Mon Sep 17 00:00:00 2001 From: Krzysztof Adamski Date: Mon, 22 Feb 2016 14:03:25 +0100 Subject: [PATCH 481/521] clk: sunxi: Add apb0 gates for H3 commit 6e17b4181603d183d20c73f4535529ddf2a2a020 upstream. This patch adds support for APB0 in H3. It seems to be compatible with earlier SOCs. apb0 gates controls R_ block peripherals (R_PIO, R_IR, etc). Since this gates behave just like any Allwinner clock gate, add a generic compatible that can be reused if we don't have any clock to protect. Signed-off-by: Krzysztof Adamski [Maxime: Removed the H3 compatible from the simple-gates driver, reworked the commit log a bit] Signed-off-by: Maxime Ripard Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/clock/sunxi.txt | 2 ++ drivers/clk/sunxi/clk-simple-gates.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/clock/sunxi.txt b/Documentation/devicetree/bindings/clock/sunxi.txt index 8a47b77abfca..e8c74a6e738b 100644 --- a/Documentation/devicetree/bindings/clock/sunxi.txt +++ b/Documentation/devicetree/bindings/clock/sunxi.txt @@ -18,6 +18,7 @@ Required properties: "allwinner,sun4i-a10-cpu-clk" - for the CPU multiplexer clock "allwinner,sun4i-a10-axi-clk" - for the AXI clock "allwinner,sun8i-a23-axi-clk" - for the AXI clock on A23 + "allwinner,sun4i-a10-gates-clk" - for generic gates on all compatible SoCs "allwinner,sun4i-a10-axi-gates-clk" - for the AXI gates "allwinner,sun4i-a10-ahb-clk" - for the AHB clock "allwinner,sun5i-a13-ahb-clk" - for the AHB clock on A13 @@ -43,6 +44,7 @@ Required properties: "allwinner,sun6i-a31-apb0-gates-clk" - for the APB0 gates on A31 "allwinner,sun7i-a20-apb0-gates-clk" - for the APB0 gates on A20 "allwinner,sun8i-a23-apb0-gates-clk" - for the APB0 gates on A23 + "allwinner,sun8i-h3-apb0-gates-clk" - for the APB0 gates on H3 "allwinner,sun9i-a80-apb0-gates-clk" - for the APB0 gates on A80 "allwinner,sun4i-a10-apb1-clk" - for the APB1 clock "allwinner,sun9i-a80-apb1-clk" - for the APB1 bus clock on A80 diff --git a/drivers/clk/sunxi/clk-simple-gates.c b/drivers/clk/sunxi/clk-simple-gates.c index 0214c6548afd..97cb4221de25 100644 --- a/drivers/clk/sunxi/clk-simple-gates.c +++ b/drivers/clk/sunxi/clk-simple-gates.c @@ -98,6 +98,8 @@ static void __init sunxi_simple_gates_init(struct device_node *node) sunxi_simple_gates_setup(node, NULL, 0); } +CLK_OF_DECLARE(sun4i_a10_gates, "allwinner,sun4i-a10-gates-clk", + sunxi_simple_gates_init); CLK_OF_DECLARE(sun4i_a10_apb0, "allwinner,sun4i-a10-apb0-gates-clk", sunxi_simple_gates_init); CLK_OF_DECLARE(sun4i_a10_apb1, "allwinner,sun4i-a10-apb1-gates-clk", From 10fc325c03d2b68bdaa4180a1d5efdbf58c49846 Mon Sep 17 00:00:00 2001 From: Jerome Marchand Date: Wed, 3 Feb 2016 13:58:12 +0100 Subject: [PATCH 482/521] crypto: testmgr - fix out of bound read in __test_aead() commit abfa7f4357e3640fdee87dfc276fd0f379fb5ae6 upstream. __test_aead() reads MAX_IVLEN bytes from template[i].iv, but the actual length of the initialisation vector can be shorter. The length of the IV is already calculated earlier in the function. Let's just reuses that. Also the IV length is currently calculated several time for no reason. Let's fix that too. This fix an out-of-bound error detected by KASan. Signed-off-by: Jerome Marchand Signed-off-by: Herbert Xu Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- crypto/testmgr.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index d4944318ca1f..5f15f45fcc9f 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -488,6 +488,8 @@ static int __test_aead(struct crypto_aead *tfm, int enc, aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, tcrypt_complete, &result); + iv_len = crypto_aead_ivsize(tfm); + for (i = 0, j = 0; i < tcount; i++) { if (template[i].np) continue; @@ -508,7 +510,6 @@ static int __test_aead(struct crypto_aead *tfm, int enc, memcpy(input, template[i].input, template[i].ilen); memcpy(assoc, template[i].assoc, template[i].alen); - iv_len = crypto_aead_ivsize(tfm); if (template[i].iv) memcpy(iv, template[i].iv, iv_len); else @@ -617,7 +618,7 @@ static int __test_aead(struct crypto_aead *tfm, int enc, j++; if (template[i].iv) - memcpy(iv, template[i].iv, MAX_IVLEN); + memcpy(iv, template[i].iv, iv_len); else memset(iv, 0, MAX_IVLEN); From 99e96ce5e3153b3543152d33b5773f34003a8892 Mon Sep 17 00:00:00 2001 From: tom will Date: Mon, 16 May 2016 10:31:07 -0400 Subject: [PATCH 483/521] drm/amdgpu: fix array out of bounds commit 484f689fc9d4eb91c68f53e97dc355b1b06c3edb upstream. When the initial value of i is greater than zero, it may cause endless loop, resulting in array out of bounds, fix it. This is a port of the radeon fix to amdgpu. Signed-off-by: tom will Signed-off-by: Alex Deucher Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/kv_dpm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c index 7e9154c7f1db..d1c9525d81eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c +++ b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c @@ -2258,7 +2258,7 @@ static void kv_apply_state_adjust_rules(struct amdgpu_device *adev, if (pi->caps_stable_p_state) { stable_p_state_sclk = (max_limits->sclk * 75) / 100; - for (i = table->count - 1; i >= 0; i++) { + for (i = table->count - 1; i >= 0; i--) { if (stable_p_state_sclk >= table->entries[i].clk) { stable_p_state_sclk = table->entries[i].clk; break; From 28320756e78b3e8f1520bc1c34b23a3df452569e Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Tue, 22 Mar 2016 16:13:15 -0400 Subject: [PATCH 484/521] ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea() commit 9e92f48c34eb2b9af9d12f892e2fe1fce5e8ce35 upstream. We aren't checking to see if the in-inode extended attribute is corrupted before we try to expand the inode's extra isize fields. This can lead to potential crashes caused by the BUG_ON() check in ext4_xattr_shift_entries(). Signed-off-by: Theodore Ts'o Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- fs/ext4/xattr.c | 32 ++++++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 263002f0389d..7c23363ecf19 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -233,6 +233,27 @@ ext4_xattr_check_block(struct inode *inode, struct buffer_head *bh) return error; } +static int +__xattr_check_inode(struct inode *inode, struct ext4_xattr_ibody_header *header, + void *end, const char *function, unsigned int line) +{ + struct ext4_xattr_entry *entry = IFIRST(header); + int error = -EFSCORRUPTED; + + if (((void *) header >= end) || + (header->h_magic != le32_to_cpu(EXT4_XATTR_MAGIC))) + goto errout; + error = ext4_xattr_check_names(entry, end, entry); +errout: + if (error) + __ext4_error_inode(inode, function, line, 0, + "corrupted in-inode xattr"); + return error; +} + +#define xattr_check_inode(inode, header, end) \ + __xattr_check_inode((inode), (header), (end), __func__, __LINE__) + static inline int ext4_xattr_check_entry(struct ext4_xattr_entry *entry, size_t size) { @@ -344,7 +365,7 @@ ext4_xattr_ibody_get(struct inode *inode, int name_index, const char *name, header = IHDR(inode, raw_inode); entry = IFIRST(header); end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; - error = ext4_xattr_check_names(entry, end, entry); + error = xattr_check_inode(inode, header, end); if (error) goto cleanup; error = ext4_xattr_find_entry(&entry, name_index, name, @@ -475,7 +496,7 @@ ext4_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size) raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; - error = ext4_xattr_check_names(IFIRST(header), end, IFIRST(header)); + error = xattr_check_inode(inode, header, end); if (error) goto cleanup; error = ext4_xattr_list_entries(dentry, IFIRST(header), @@ -991,8 +1012,7 @@ int ext4_xattr_ibody_find(struct inode *inode, struct ext4_xattr_info *i, is->s.here = is->s.first; is->s.end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) { - error = ext4_xattr_check_names(IFIRST(header), is->s.end, - IFIRST(header)); + error = xattr_check_inode(inode, header, is->s.end); if (error) return error; /* Find the named attribute. */ @@ -1293,6 +1313,10 @@ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize, last = entry; total_ino = sizeof(struct ext4_xattr_ibody_header); + error = xattr_check_inode(inode, header, end); + if (error) + goto cleanup; + free = ext4_xattr_free_space(last, &min_offs, base, &total_ino); if (free >= isize_diff) { entry = IFIRST(header); From 49b2fe4b020776640268a5d75c5eca712fcc7b01 Mon Sep 17 00:00:00 2001 From: Wei Fang Date: Mon, 21 Mar 2016 19:18:32 +0800 Subject: [PATCH 485/521] md:raid1: fix a dead loop when read from a WriteMostly disk commit 816b0acf3deb6d6be5d0519b286fdd4bafade905 upstream. If first_bad == this_sector when we get the WriteMostly disk in read_balance(), valid disk will be returned with zero max_sectors. It'll lead to a dead loop in make_request(), and OOM will happen because of endless allocation of struct bio. Since we can't get data from this disk in this case, so continue for another disk. Signed-off-by: Wei Fang Signed-off-by: Shaohua Li Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 9be39988bf06..d81be5e471d0 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -570,7 +570,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect if (best_dist_disk < 0) { if (is_badblock(rdev, this_sector, sectors, &first_bad, &bad_sectors)) { - if (first_bad < this_sector) + if (first_bad <= this_sector) /* Cannot use this */ continue; best_good_sectors = first_bad - this_sector; From 2907c91c9f9a69a3c1250dc08a146f255f26d0aa Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Mon, 11 Apr 2016 09:10:19 -0500 Subject: [PATCH 486/521] MIPS: Fix crash registers on non-crashing CPUs commit c80e1b62ffca52e2d1d865ee58bc79c4c0c55005 upstream. As part of handling a crash on an SMP system, an IPI is send to all other CPUs to save their current registers and stop. It was using task_pt_regs(current) to get the registers, but that will only be accurate if the CPU was interrupted running in userland. Instead allow the architecture to pass in the registers (all pass NULL now, but allow for the future) and then use get_irq_regs() which should be accurate as we are in an interrupt. Fall back to task_pt_regs(current) if nothing else is available. Signed-off-by: Corey Minyard Cc: David Daney Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13050/ Signed-off-by: Ralf Baechle Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/crash.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c index d434d5d5ae6e..610f0f3bdb34 100644 --- a/arch/mips/kernel/crash.c +++ b/arch/mips/kernel/crash.c @@ -14,12 +14,22 @@ static int crashing_cpu = -1; static cpumask_t cpus_in_crash = CPU_MASK_NONE; #ifdef CONFIG_SMP -static void crash_shutdown_secondary(void *ignore) +static void crash_shutdown_secondary(void *passed_regs) { - struct pt_regs *regs; + struct pt_regs *regs = passed_regs; int cpu = smp_processor_id(); - regs = task_pt_regs(current); + /* + * If we are passed registers, use those. Otherwise get the + * regs from the last interrupt, which should be correct, as + * we are in an interrupt. But if the regs are not there, + * pull them from the top of the stack. They are probably + * wrong, but we need something to keep from crashing again. + */ + if (!regs) + regs = get_irq_regs(); + if (!regs) + regs = task_pt_regs(current); if (!cpu_online(cpu)) return; From 1d1cb762524f05cfb37994e0d36b7b4b5e957134 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Fri, 15 Jul 2016 16:42:16 -0700 Subject: [PATCH 487/521] net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata commit 8e6ce7ebeb34f0992f56de078c3744fb383657fa upstream. The label lio_xmit_failed is used 3 times through liquidio_xmit() but it always makes a call to dma_unmap_single() using potentially uninitialized variables from "ndata" variable. Out of the 3 gotos, 2 run after ndata has been initialized, and had a prior dma_map_single() call. Fix this by adding a new error label: lio_xmit_dma_failed which does this dma_unmap_single() and then processed with the lio_xmit_failed fallthrough. Fixes: f21fb3ed364bb ("Add support of Cavium Liquidio ethernet adapters") Reported-by: coverity (CID 1309740) Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/cavium/liquidio/lio_main.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index 7445da218bd9..cc1725616f9d 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -2823,7 +2823,7 @@ static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev) if (!g) { netif_info(lio, tx_err, lio->netdev, "Transmit scatter gather: glist null!\n"); - goto lio_xmit_failed; + goto lio_xmit_dma_failed; } cmdsetup.s.gather = 1; @@ -2894,7 +2894,7 @@ static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev) else status = octnet_send_nic_data_pkt(oct, &ndata, xmit_more); if (status == IQ_SEND_FAILED) - goto lio_xmit_failed; + goto lio_xmit_dma_failed; netif_info(lio, tx_queued, lio->netdev, "Transmit queued successfully\n"); @@ -2908,12 +2908,13 @@ static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev) return NETDEV_TX_OK; +lio_xmit_dma_failed: + dma_unmap_single(&oct->pci_dev->dev, ndata.cmd.dptr, + ndata.datasize, DMA_TO_DEVICE); lio_xmit_failed: stats->tx_dropped++; netif_info(lio, tx_err, lio->netdev, "IQ%d Transmit dropped:%llu\n", iq_no, stats->tx_dropped); - dma_unmap_single(&oct->pci_dev->dev, ndata.cmd.dptr, - ndata.datasize, DMA_TO_DEVICE); recv_buffer_free(skb); return NETDEV_TX_OK; } From b9baa0aa66cef9e26b15372d6c56106a4f3b8439 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Mon, 16 May 2016 15:11:18 -0700 Subject: [PATCH 488/521] net_sched: close another race condition in tcf_mirred_release() commit dc327f8931cb9d66191f489eb9a852fc04530546 upstream. We saw the following extra refcount release on veth device: kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1 Since we heavily use mirred action to redirect packets to veth, I think this is caused by the following race condition: CPU0: tcf_mirred_release(): (in RCU callback) struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1); CPU1: mirred_device_event(): spin_lock_bh(&mirred_list_lock); list_for_each_entry(m, &mirred_list, tcfm_list) { if (rcu_access_pointer(m->tcfm_dev) == dev) { dev_put(dev); /* Note : no rcu grace period necessary, as * net_device are already rcu protected. */ RCU_INIT_POINTER(m->tcfm_dev, NULL); } } spin_unlock_bh(&mirred_list_lock); CPU0: tcf_mirred_release(): spin_lock_bh(&mirred_list_lock); list_del(&m->tcfm_list); spin_unlock_bh(&mirred_list_lock); if (dev) // <======== Stil refers to the old m->tcfm_dev dev_put(dev); // <======== dev_put() is called on it again The action init code path is good because it is impossible to modify an action that is being removed. So, fix this by moving everything under the spinlock. Fixes: 2ee22a90c7af ("net_sched: act_mirred: remove spinlock in fast path") Fixes: 6bd00b850635 ("act_mirred: fix a race condition on mirred_list") Cc: Jamal Hadi Salim Signed-off-by: Cong Wang Acked-by: Jamal Hadi Salim Signed-off-by: David S. Miller Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- net/sched/act_mirred.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index e384d6aefa3a..1090a52c03cd 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -36,14 +36,15 @@ static DEFINE_SPINLOCK(mirred_list_lock); static void tcf_mirred_release(struct tc_action *a, int bind) { struct tcf_mirred *m = to_mirred(a); - struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1); + struct net_device *dev; /* We could be called either in a RCU callback or with RTNL lock held. */ spin_lock_bh(&mirred_list_lock); list_del(&m->tcfm_list); - spin_unlock_bh(&mirred_list_lock); + dev = rcu_dereference_protected(m->tcfm_dev, 1); if (dev) dev_put(dev); + spin_unlock_bh(&mirred_list_lock); } static const struct nla_policy mirred_policy[TCA_MIRRED_MAX + 1] = { From 804605eae410ef1d1f904706eaf50400c897434b Mon Sep 17 00:00:00 2001 From: "santosh.shilimkar@oracle.com" Date: Thu, 14 Apr 2016 10:43:27 -0700 Subject: [PATCH 489/521] RDS: Fix the atomicity for congestion map update commit e47db94e10447fc467777a40302f2b393e9af2fa upstream. Two different threads with different rds sockets may be in rds_recv_rcvbuf_delta() via receive path. If their ports both map to the same word in the congestion map, then using non-atomic ops to update it could cause the map to be incorrect. Lets use atomics to avoid such an issue. Full credit to Wengang for finding the issue, analysing it and also pointing out to offending code with spin lock based fix. Reviewed-by: Leon Romanovsky Signed-off-by: Wengang Wang Signed-off-by: Santosh Shilimkar Signed-off-by: David S. Miller Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- net/rds/cong.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/rds/cong.c b/net/rds/cong.c index e6144b8246fd..6641bcf7c185 100644 --- a/net/rds/cong.c +++ b/net/rds/cong.c @@ -299,7 +299,7 @@ void rds_cong_set_bit(struct rds_cong_map *map, __be16 port) i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS; off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS; - __set_bit_le(off, (void *)map->m_page_addrs[i]); + set_bit_le(off, (void *)map->m_page_addrs[i]); } void rds_cong_clear_bit(struct rds_cong_map *map, __be16 port) @@ -313,7 +313,7 @@ void rds_cong_clear_bit(struct rds_cong_map *map, __be16 port) i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS; off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS; - __clear_bit_le(off, (void *)map->m_page_addrs[i]); + clear_bit_le(off, (void *)map->m_page_addrs[i]); } static int rds_cong_test_bit(struct rds_cong_map *map, __be16 port) From 3e19487b9bf5076dcc2cd79da3dbd57b94d4e6b7 Mon Sep 17 00:00:00 2001 From: Jon Hunter Date: Thu, 21 Apr 2016 17:11:58 +0100 Subject: [PATCH 490/521] regulator: core: Clear the supply pointer if enabling fails commit 8e5356a73604f53da6a1e0756727cb8f9f7bba17 upstream. During the resolution of a regulator's supply, we may attempt to enable the supply if the regulator itself is already enabled. If enabling the supply fails, then we will call _regulator_put() for the supply. However, the pointer to the supply has not been cleared for the regulator and this will cause a crash if we then unregister the regulator and attempt to call regulator_put() a second time for the supply. Fix this by clearing the supply pointer if enabling the supply after fails when resolving the supply for a regulator. Signed-off-by: Jon Hunter Signed-off-by: Mark Brown Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- drivers/regulator/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 88dbbeb8569b..f9b8c44677eb 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -1519,6 +1519,7 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) ret = regulator_enable(rdev->supply); if (ret < 0) { _regulator_put(rdev->supply); + rdev->supply = NULL; return ret; } } From 5709321fd962842da490b36b8881f2c96b04608d Mon Sep 17 00:00:00 2001 From: "Felipe F. Tonello" Date: Wed, 9 Mar 2016 19:39:30 +0000 Subject: [PATCH 491/521] usb: gadget: f_midi: Fixed a bug when buflen was smaller than wMaxPacketSize commit 03d27ade4941076b34c823d63d91dc895731a595 upstream. buflen by default (256) is smaller than wMaxPacketSize (512) in high-speed devices. That caused the OUT endpoint to freeze if the host send any data packet of length greater than 256 bytes. This is an example dump of what happended on that enpoint: HOST: [DATA][Length=260][...] DEVICE: [NAK] HOST: [PING] DEVICE: [NAK] HOST: [PING] DEVICE: [NAK] ... HOST: [PING] DEVICE: [NAK] This patch fixes this problem by setting the minimum usb_request's buffer size for the OUT endpoint as its wMaxPacketSize. Acked-by: Michal Nazarewicz Signed-off-by: Felipe F. Tonello Signed-off-by: Felipe Balbi Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/function/f_midi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c index 898a570319f1..af60cc3714c1 100644 --- a/drivers/usb/gadget/function/f_midi.c +++ b/drivers/usb/gadget/function/f_midi.c @@ -361,7 +361,9 @@ static int f_midi_set_alt(struct usb_function *f, unsigned intf, unsigned alt) /* allocate a bunch of read buffers and queue them all at once. */ for (i = 0; i < midi->qlen && err == 0; i++) { struct usb_request *req = - midi_alloc_ep_req(midi->out_ep, midi->buflen); + midi_alloc_ep_req(midi->out_ep, + max_t(unsigned, midi->buflen, + bulk_out_desc.wMaxPacketSize)); if (req == NULL) return -ENOMEM; From c583862e95d26586a1f6a2070411ea0a2023edbc Mon Sep 17 00:00:00 2001 From: Stefano Stabellini Date: Fri, 15 Apr 2016 18:23:00 -0700 Subject: [PATCH 492/521] xen/x86: don't lose event interrupts commit c06b6d70feb32d28f04ba37aa3df17973fd37b6b upstream. On slow platforms with unreliable TSC, such as QEMU emulated machines, it is possible for the kernel to request the next event in the past. In that case, in the current implementation of xen_vcpuop_clockevent, we simply return -ETIME. To be precise the Xen returns -ETIME and we pass it on. However the result of this is a missed event, which simply causes the kernel to hang. Instead it is better to always ask the hypervisor for a timer event, even if the timeout is in the past. That way there are no lost interrupts and the kernel survives. To do that, remove the VCPU_SSHOTTMR_future flag. Signed-off-by: Stefano Stabellini Acked-by: Juergen Gross Cc: Julia Lawall Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/time.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index f1ba6a092854..8846257d8792 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -343,11 +343,11 @@ static int xen_vcpuop_set_next_event(unsigned long delta, WARN_ON(!clockevent_state_oneshot(evt)); single.timeout_abs_ns = get_abs_timeout(delta); - single.flags = VCPU_SSHOTTMR_future; + /* Get an event anyway, even if the timeout is already expired */ + single.flags = 0; ret = HYPERVISOR_vcpu_op(VCPUOP_set_singleshot_timer, cpu, &single); - - BUG_ON(ret != 0 && ret != -ETIME); + BUG_ON(ret != 0); return ret; } From 80ec183214e8bf815ecff8f58d82d67b9842a8de Mon Sep 17 00:00:00 2001 From: bob picco Date: Fri, 10 Mar 2017 14:31:19 -0500 Subject: [PATCH 493/521] sparc64: kern_addr_valid regression [ Upstream commit adfae8a5d833fa2b46577a8081f350e408851f5b ] I encountered this bug when using /proc/kcore to examine the kernel. Plus a coworker inquired about debugging tools. We computed pa but did not use it during the maximum physical address bits test. Instead we used the identity mapped virtual address which will always fail this test. I believe the defect came in here: [bpicco@zareason linus.git]$ git describe --contains bb4e6e85daa52 v3.18-rc1~87^2~4 . Signed-off-by: Bob Picco Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/mm/init_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 3d3414c14792..965655afdbb6 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -1493,7 +1493,7 @@ bool kern_addr_valid(unsigned long addr) if ((long)addr < 0L) { unsigned long pa = __pa(addr); - if ((addr >> max_phys_bits) != 0UL) + if ((pa >> max_phys_bits) != 0UL) return false; return pfn_valid(pa >> PAGE_SHIFT); From 592d0e60a2b76b0a8ea7161d030aeb6e619ab013 Mon Sep 17 00:00:00 2001 From: Tom Hromatka Date: Fri, 31 Mar 2017 16:31:42 -0600 Subject: [PATCH 494/521] sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write() [ Upstream commit 9ae34dbd8afd790cb5f52467e4f816434379eafa ] This commit moves sparc64's prototype of pmd_write() outside of the CONFIG_TRANSPARENT_HUGEPAGE ifdef. In 2013, commit a7b9403f0e6d ("sparc64: Encode huge PMDs using PTE encoding.") exposed a path where pmd_write() could be called without CONFIG_TRANSPARENT_HUGEPAGE defined. This can result in the panic below. The diff is awkward to read, but the changes are straightforward. pmd_write() was moved outside of #ifdef CONFIG_TRANSPARENT_HUGEPAGE. Also, __HAVE_ARCH_PMD_WRITE was defined. kernel BUG at include/asm-generic/pgtable.h:576! \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ oracle_8114_cdb(8114): Kernel bad sw trap 5 [#1] CPU: 120 PID: 8114 Comm: oracle_8114_cdb Not tainted 4.1.12-61.7.1.el6uek.rc1.sparc64 #1 task: fff8400700a24d60 ti: fff8400700bc4000 task.ti: fff8400700bc4000 TSTATE: 0000004411e01607 TPC: 00000000004609f8 TNPC: 00000000004609fc Y: 00000005 Not tainted TPC: g0: 000000000001c000 g1: 0000000000ef3954 g2: 0000000000000000 g3: 0000000000000001 g4: fff8400700a24d60 g5: fff8001fa5c10000 g6: fff8400700bc4000 g7: 0000000000000720 o0: 0000000000bc5058 o1: 0000000000000240 o2: 0000000000006000 o3: 0000000000001c00 o4: 0000000000000000 o5: 0000048000080000 sp: fff8400700bc6ab1 ret_pc: 00000000004609f0 RPC: l0: fff8400700bc74fc l1: 0000000000020000 l2: 0000000000002000 l3: 0000000000000000 l4: fff8001f93250950 l5: 000000000113f800 l6: 0000000000000004 l7: 0000000000000000 i0: fff8400700ca46a0 i1: bd0000085e800453 i2: 000000026a0c4000 i3: 000000026a0c6000 i4: 0000000000000001 i5: fff800070c958de8 i6: fff8400700bc6b61 i7: 0000000000460dd0 I7: Call Trace: [0000000000460dd0] gup_pud_range+0x170/0x1a0 [0000000000460e84] get_user_pages_fast+0x84/0x120 [00000000006f5a18] iov_iter_get_pages+0x98/0x240 [00000000005fa744] do_direct_IO+0xf64/0x1e00 [00000000005fbbc0] __blockdev_direct_IO+0x360/0x15a0 [00000000101f74fc] ext4_ind_direct_IO+0xdc/0x400 [ext4] [00000000101af690] ext4_ext_direct_IO+0x1d0/0x2c0 [ext4] [00000000101af86c] ext4_direct_IO+0xec/0x220 [ext4] [0000000000553bd4] generic_file_read_iter+0x114/0x140 [00000000005bdc2c] __vfs_read+0xac/0x100 [00000000005bf254] vfs_read+0x54/0x100 [00000000005bf368] SyS_pread64+0x68/0x80 Signed-off-by: Tom Hromatka Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/include/asm/pgtable_64.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 408b715c95a5..9d81579f3d54 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -668,6 +668,14 @@ static inline unsigned long pmd_pfn(pmd_t pmd) return pte_pfn(pte); } +#define __HAVE_ARCH_PMD_WRITE +static inline unsigned long pmd_write(pmd_t pmd) +{ + pte_t pte = __pte(pmd_val(pmd)); + + return pte_write(pte); +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline unsigned long pmd_dirty(pmd_t pmd) { @@ -683,13 +691,6 @@ static inline unsigned long pmd_young(pmd_t pmd) return pte_young(pte); } -static inline unsigned long pmd_write(pmd_t pmd) -{ - pte_t pte = __pte(pmd_val(pmd)); - - return pte_write(pte); -} - static inline unsigned long pmd_trans_huge(pmd_t pmd) { pte_t pte = __pte(pmd_val(pmd)); From 428b3cefab22d21013c2a03b8153eefe3df1f576 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 23 Mar 2017 12:39:21 -0700 Subject: [PATCH 495/521] net: neigh: guard against NULL solicit() method [ Upstream commit 48481c8fa16410ffa45939b13b6c53c2ca609e5f ] Dmitry posted a nice reproducer of a bug triggering in neigh_probe() when dereferencing a NULL neigh->ops->solicit method. This can happen for arp_direct_ops/ndisc_direct_ops and similar, which can be used for NUD_NOARP neighbours (created when dev->header_ops is NULL). Admin can then force changing nud_state to some other state that would fire neigh timer. Signed-off-by: Eric Dumazet Reported-by: Dmitry Vyukov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/neighbour.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 769cece9b00b..ae92131c4f89 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -859,7 +859,8 @@ static void neigh_probe(struct neighbour *neigh) if (skb) skb = skb_clone(skb, GFP_ATOMIC); write_unlock(&neigh->lock); - neigh->ops->solicit(neigh, skb); + if (neigh->ops->solicit) + neigh->ops->solicit(neigh, skb); atomic_inc(&neigh->probes); kfree_skb(skb); } From 0e9eeb4676a770acf0d97c93db77bcbdad4eefc5 Mon Sep 17 00:00:00 2001 From: Nathan Sullivan Date: Wed, 22 Mar 2017 15:27:01 -0500 Subject: [PATCH 496/521] net: phy: handle state correctly in phy_stop_machine [ Upstream commit 49d52e8108a21749dc2114b924c907db43358984 ] If the PHY is halted on stop, then do not set the state to PHY_UP. This ensures the phy will be restarted later in phy_start when the machine is started again. Fixes: 00db8189d984 ("This patch adds a PHY Abstraction Layer to the Linux Kernel, enabling ethernet drivers to remain as ignorant as is reasonable of the connected PHY's design and operation details.") Signed-off-by: Nathan Sullivan Signed-off-by: Brad Mouring Acked-by: Xander Huff Acked-by: Kyle Roeschley Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/phy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index bba0ca786aaa..851c0e121807 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -538,7 +538,7 @@ void phy_stop_machine(struct phy_device *phydev) cancel_delayed_work_sync(&phydev->state_queue); mutex_lock(&phydev->lock); - if (phydev->state > PHY_UP) + if (phydev->state > PHY_UP && phydev->state != PHY_HALTED) phydev->state = PHY_UP; mutex_unlock(&phydev->lock); } From 8625dfcfd338254131a7fa650dfbcaf42e4c52ae Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Wed, 29 Mar 2017 08:45:29 +0200 Subject: [PATCH 497/521] l2tp: purge socket queues in the .destruct() callback [ Upstream commit e91793bb615cf6cdd59c0b6749fe173687bb0947 ] The Rx path may grab the socket right before pppol2tp_release(), but nothing guarantees that it will enqueue packets before skb_queue_purge(). Therefore, the socket can be destroyed without its queues fully purged. Fix this by purging queues in pppol2tp_session_destruct() where we're guaranteed nothing is still referencing the socket. Fixes: 9e9cb6221aa7 ("l2tp: fix userspace reception on plain L2TP sockets") Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/l2tp/l2tp_ppp.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index 1ad18c55064c..cb0fa348abf9 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -467,6 +467,10 @@ static void pppol2tp_session_close(struct l2tp_session *session) static void pppol2tp_session_destruct(struct sock *sk) { struct l2tp_session *session = sk->sk_user_data; + + skb_queue_purge(&sk->sk_receive_queue); + skb_queue_purge(&sk->sk_write_queue); + if (session) { sk->sk_user_data = NULL; BUG_ON(session->magic != L2TP_SESSION_MAGIC); @@ -505,9 +509,6 @@ static int pppol2tp_release(struct socket *sock) l2tp_session_queue_purge(session); sock_put(sk); } - skb_queue_purge(&sk->sk_receive_queue); - skb_queue_purge(&sk->sk_write_queue); - release_sock(sk); /* This will delete the session context via From cf71bd41f8091588c7b28ebba6eaa635a6874493 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Wed, 29 Mar 2017 16:11:21 +0200 Subject: [PATCH 498/521] net/packet: fix overflow in check for tp_frame_nr [ Upstream commit 8f8d28e4d6d815a391285e121c3a53a0b6cb9e7b ] When calculating rb->frames_per_block * req->tp_block_nr the result can overflow. Add a check that tp_block_size * tp_block_nr <= UINT_MAX. Since frames_per_block <= tp_block_size, the expression would never overflow. Signed-off-by: Andrey Konovalov Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index d76800108ddb..b4af3f7b57ab 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -4150,6 +4150,8 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u, rb->frames_per_block = req->tp_block_size / req->tp_frame_size; if (unlikely(rb->frames_per_block == 0)) goto out; + if (unlikely(req->tp_block_size > UINT_MAX / req->tp_block_nr)) + goto out; if (unlikely((rb->frames_per_block * req->tp_block_nr) != req->tp_frame_nr)) goto out; From 25adf4e32a890244f4f53d2ab30b423efb5aab41 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Wed, 29 Mar 2017 16:11:22 +0200 Subject: [PATCH 499/521] net/packet: fix overflow in check for tp_reserve [ Upstream commit bcc5364bdcfe131e6379363f089e7b4108d35b70 ] When calculating po->tp_hdrlen + po->tp_reserve the result can overflow. Fix by checking that tp_reserve <= INT_MAX on assign. Signed-off-by: Andrey Konovalov Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index b4af3f7b57ab..f8d6a0ca9c03 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3626,6 +3626,8 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv return -EBUSY; if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT; + if (val > INT_MAX) + return -EINVAL; po->tp_reserve = val; return 0; } From f710dbd92b277232079fc662c1dd8433491b7d6c Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Mon, 3 Apr 2017 12:03:13 +0200 Subject: [PATCH 500/521] l2tp: take reference on sessions being dumped [ Upstream commit e08293a4ccbcc993ded0fdc46f1e57926b833d63 ] Take a reference on the sessions returned by l2tp_session_find_nth() (and rename it l2tp_session_get_nth() to reflect this change), so that caller is assured that the session isn't going to disappear while processing it. For procfs and debugfs handlers, the session is held in the .start() callback and dropped in .show(). Given that pppol2tp_seq_session_show() dereferences the associated PPPoL2TP socket and that l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to call the session's .ref() callback to prevent the socket from going away from under us. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Fixes: 0ad6614048cf ("l2tp: Add debugfs files for dumping l2tp debug info") Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/l2tp/l2tp_core.c | 8 ++++++-- net/l2tp/l2tp_core.h | 3 ++- net/l2tp/l2tp_debugfs.c | 10 +++++++--- net/l2tp/l2tp_netlink.c | 7 +++++-- net/l2tp/l2tp_ppp.c | 10 +++++++--- 5 files changed, 27 insertions(+), 11 deletions(-) diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index ec17cbe8a02b..d3dec414fd44 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -278,7 +278,8 @@ struct l2tp_session *l2tp_session_find(struct net *net, struct l2tp_tunnel *tunn } EXPORT_SYMBOL_GPL(l2tp_session_find); -struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int nth) +struct l2tp_session *l2tp_session_get_nth(struct l2tp_tunnel *tunnel, int nth, + bool do_ref) { int hash; struct l2tp_session *session; @@ -288,6 +289,9 @@ struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int nth) for (hash = 0; hash < L2TP_HASH_SIZE; hash++) { hlist_for_each_entry(session, &tunnel->session_hlist[hash], hlist) { if (++count > nth) { + l2tp_session_inc_refcount(session); + if (do_ref && session->ref) + session->ref(session); read_unlock_bh(&tunnel->hlist_lock); return session; } @@ -298,7 +302,7 @@ struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int nth) return NULL; } -EXPORT_SYMBOL_GPL(l2tp_session_find_nth); +EXPORT_SYMBOL_GPL(l2tp_session_get_nth); /* Lookup a session by interface name. * This is very inefficient but is only used by management interfaces. diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h index 763e8e241ce3..555d962a62d2 100644 --- a/net/l2tp/l2tp_core.h +++ b/net/l2tp/l2tp_core.h @@ -243,7 +243,8 @@ static inline struct l2tp_tunnel *l2tp_sock_to_tunnel(struct sock *sk) struct l2tp_session *l2tp_session_find(struct net *net, struct l2tp_tunnel *tunnel, u32 session_id); -struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int nth); +struct l2tp_session *l2tp_session_get_nth(struct l2tp_tunnel *tunnel, int nth, + bool do_ref); struct l2tp_session *l2tp_session_find_by_ifname(struct net *net, char *ifname); struct l2tp_tunnel *l2tp_tunnel_find(struct net *net, u32 tunnel_id); struct l2tp_tunnel *l2tp_tunnel_find_nth(struct net *net, int nth); diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c index 2d6760a2ae34..d100aed3d06f 100644 --- a/net/l2tp/l2tp_debugfs.c +++ b/net/l2tp/l2tp_debugfs.c @@ -53,7 +53,7 @@ static void l2tp_dfs_next_tunnel(struct l2tp_dfs_seq_data *pd) static void l2tp_dfs_next_session(struct l2tp_dfs_seq_data *pd) { - pd->session = l2tp_session_find_nth(pd->tunnel, pd->session_idx); + pd->session = l2tp_session_get_nth(pd->tunnel, pd->session_idx, true); pd->session_idx++; if (pd->session == NULL) { @@ -238,10 +238,14 @@ static int l2tp_dfs_seq_show(struct seq_file *m, void *v) } /* Show the tunnel or session context */ - if (pd->session == NULL) + if (!pd->session) { l2tp_dfs_seq_tunnel_show(m, pd->tunnel); - else + } else { l2tp_dfs_seq_session_show(m, pd->session); + if (pd->session->deref) + pd->session->deref(pd->session); + l2tp_session_dec_refcount(pd->session); + } out: return 0; diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c index 2caaa84ce92d..665cc74df5c5 100644 --- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -827,7 +827,7 @@ static int l2tp_nl_cmd_session_dump(struct sk_buff *skb, struct netlink_callback goto out; } - session = l2tp_session_find_nth(tunnel, si); + session = l2tp_session_get_nth(tunnel, si, false); if (session == NULL) { ti++; tunnel = NULL; @@ -837,8 +837,11 @@ static int l2tp_nl_cmd_session_dump(struct sk_buff *skb, struct netlink_callback if (l2tp_nl_session_send(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, - session, L2TP_CMD_SESSION_GET) < 0) + session, L2TP_CMD_SESSION_GET) < 0) { + l2tp_session_dec_refcount(session); break; + } + l2tp_session_dec_refcount(session); si++; } diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index cb0fa348abf9..ad80ee5400da 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -1575,7 +1575,7 @@ static void pppol2tp_next_tunnel(struct net *net, struct pppol2tp_seq_data *pd) static void pppol2tp_next_session(struct net *net, struct pppol2tp_seq_data *pd) { - pd->session = l2tp_session_find_nth(pd->tunnel, pd->session_idx); + pd->session = l2tp_session_get_nth(pd->tunnel, pd->session_idx, true); pd->session_idx++; if (pd->session == NULL) { @@ -1702,10 +1702,14 @@ static int pppol2tp_seq_show(struct seq_file *m, void *v) /* Show the tunnel or session context. */ - if (pd->session == NULL) + if (!pd->session) { pppol2tp_seq_tunnel_show(m, pd->tunnel); - else + } else { pppol2tp_seq_session_show(m, pd->session); + if (pd->session->deref) + pd->session->deref(pd->session); + l2tp_session_dec_refcount(pd->session); + } out: return 0; From 593e185eaadeefdc05a5e595428aec468646165a Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Mon, 3 Apr 2017 13:23:15 +0200 Subject: [PATCH 501/521] l2tp: fix PPP pseudo-wire auto-loading [ Upstream commit 249ee819e24c180909f43c1173c8ef6724d21faf ] PPP pseudo-wire type is 7 (11 is L2TP_PWTYPE_IP). Fixes: f1f39f911027 ("l2tp: auto load type modules") Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/l2tp/l2tp_ppp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index ad80ee5400da..8ab9c5d74416 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -1868,4 +1868,4 @@ MODULE_DESCRIPTION("PPP over L2TP over UDP"); MODULE_LICENSE("GPL"); MODULE_VERSION(PPPOL2TP_DRV_VERSION); MODULE_ALIAS("pppox-proto-" __stringify(PX_PROTO_OL2TP)); -MODULE_ALIAS_L2TP_PWTYPE(11); +MODULE_ALIAS_L2TP_PWTYPE(7); From cc5a5c09d32b8bd80477f45f12f47ca43536b19c Mon Sep 17 00:00:00 2001 From: Florian Larysch Date: Mon, 3 Apr 2017 16:46:09 +0200 Subject: [PATCH 502/521] net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given [ Upstream commit a8801799c6975601fd58ae62f48964caec2eb83f ] inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to ip_route_input when iif is given. If a multipath route is present for the designated destination, ip_multipath_icmp_hash ends up being called, which uses the source/destination addresses within the skb to calculate a hash. However, those are not set in the synthetic skb, causing it to return an arbitrary and incorrect result. Instead, use UDP, which gets no such special treatment. Signed-off-by: Florian Larysch Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index da4d68d78590..375248b900ba 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2559,7 +2559,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh) skb_reset_network_header(skb); /* Bugfix: need to give ip_route_input enough of an IP header to not gag. */ - ip_hdr(skb)->protocol = IPPROTO_ICMP; + ip_hdr(skb)->protocol = IPPROTO_UDP; skb_reserve(skb, MAX_HEADER + sizeof(struct iphdr)); src = tb[RTA_SRC] ? nla_get_in_addr(tb[RTA_SRC]) : 0; From 52e33b4e505dedc8708581c3dd539ded37df1c9f Mon Sep 17 00:00:00 2001 From: Xin Long Date: Thu, 6 Apr 2017 13:10:52 +0800 Subject: [PATCH 503/521] sctp: listen on the sock only when it's state is listening or closed [ Upstream commit 34b2789f1d9bf8dcca9b5cb553d076ca2cd898ee ] Now sctp doesn't check sock's state before listening on it. It could even cause changing a sock with any state to become a listening sock when doing sctp_listen. This patch is to fix it by checking sock's state in sctp_listen, so that it will listen on the sock with right state. Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sctp/socket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 5758818435f3..c96d666cef29 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6394,6 +6394,9 @@ int sctp_inet_listen(struct socket *sock, int backlog) if (sock->state != SS_UNCONNECTED) goto out; + if (!sctp_sstate(sk, LISTENING) && !sctp_sstate(sk, CLOSED)) + goto out; + /* If backlog is zero, disable listening. */ if (!backlog) { if (sctp_sstate(sk, CLOSED)) From 78c4e3d4848d86c1ed36181c71c9c96834e910ed Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 8 Apr 2017 08:07:33 -0700 Subject: [PATCH 504/521] tcp: clear saved_syn in tcp_disconnect() [ Upstream commit 17c3060b1701fc69daedb4c90be6325d3d9fca8e ] In the (very unlikely) case a passive socket becomes a listener, we do not want to duplicate its saved SYN headers. This would lead to double frees, use after free, and please hackers and various fuzzers Tested: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, IPPROTO_TCP, TCP_SAVE_SYN, [1], 4) = 0 +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 5) = 0 +0 < S 0:0(0) win 32972 +0 > S. 0:0(0) ack 1 <...> +.1 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 connect(4, AF_UNSPEC, ...) = 0 +0 close(3) = 0 +0 bind(4, ..., ...) = 0 +0 listen(4, 5) = 0 +0 < S 0:0(0) win 32972 +0 > S. 0:0(0) ack 1 <...> +.1 < . 1:1(0) ack 1 win 257 Fixes: cd8ae85299d5 ("tcp: provide SYN headers for passive connections") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 600dcda840d1..e1d51370977b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2260,6 +2260,7 @@ int tcp_disconnect(struct sock *sk, int flags) tcp_init_send_head(sk); memset(&tp->rx_opt, 0, sizeof(tp->rx_opt)); __sk_dst_reset(sk); + tcp_saved_syn_free(tp); WARN_ON(inet->inet_num && !icsk->icsk_bind_hash); From f6b34b1709acec5c7a4d83f43b52595466a32c37 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 18 Apr 2017 22:14:26 +0300 Subject: [PATCH 505/521] dp83640: don't recieve time stamps twice MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 9d386cd9a755c8293e8916264d4d053878a7c9c7 ] This patch is prompted by a static checker warning about a potential use after free. The concern is that netif_rx_ni() can free "skb" and we call it twice. When I look at the commit that added this, it looks like some stray lines were added accidentally. It doesn't make sense to me that we would recieve the same data two times. I asked the author but never recieved a response. I can't test this code, but I'm pretty sure my patch is correct. Fixes: 4b063258ab93 ("dp83640: Delay scheduled work.") Signed-off-by: Dan Carpenter Acked-by: Stefan Sørensen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/dp83640.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c index e6cefd0e3262..84b9cca152eb 100644 --- a/drivers/net/phy/dp83640.c +++ b/drivers/net/phy/dp83640.c @@ -1436,8 +1436,6 @@ static bool dp83640_rxtstamp(struct phy_device *phydev, skb_info->tmo = jiffies + SKB_TIMESTAMP_TIMEOUT; skb_queue_tail(&dp83640->rx_queue, skb); schedule_delayed_work(&dp83640->ts_work, SKB_TIMESTAMP_TIMEOUT); - } else { - netif_rx_ni(skb); } return true; From f6b94906b41497a1fa2c4acd055d550c501cd58b Mon Sep 17 00:00:00 2001 From: David Ahern Date: Wed, 19 Apr 2017 14:19:43 -0700 Subject: [PATCH 506/521] net: ipv6: RTF_PCPU should not be settable from userspace [ Upstream commit 557c44be917c322860665be3d28376afa84aa936 ] Andrey reported a fault in the IPv6 route code: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880069809600 task.stack: ffff880062dc8000 RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975 RSP: 0018:ffff880062dced30 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006 RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018 RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0 Call Trace: ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128 ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212 ... Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit set. Flags passed to the kernel are blindly copied to the allocated rt6_info by ip6_route_info_create making a newly inserted route appear as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set and expects rt->dst.from to be set - which it is not since it is not really a per-cpu copy. The subsequent call to __ip6_dst_alloc then generates the fault. Fix by checking for the flag and failing with EINVAL. Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov Signed-off-by: David Ahern Acked-by: Martin KaFai Lau Tested-by: Andrey Konovalov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/ipv6_route.h | 2 +- net/ipv6/route.c | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/ipv6_route.h b/include/uapi/linux/ipv6_route.h index f6598d1c886e..316e838b7470 100644 --- a/include/uapi/linux/ipv6_route.h +++ b/include/uapi/linux/ipv6_route.h @@ -34,7 +34,7 @@ #define RTF_PREF(pref) ((pref) << 27) #define RTF_PREF_MASK 0x18000000 -#define RTF_PCPU 0x40000000 +#define RTF_PCPU 0x40000000 /* read-only: can not be set by user */ #define RTF_LOCAL 0x80000000 diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 9f0aa255e288..6c91d5c4a92c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1758,6 +1758,10 @@ static struct rt6_info *ip6_route_info_create(struct fib6_config *cfg) int addr_type; int err = -EINVAL; + /* RTF_PCPU is an internal flag; can not be set by userspace */ + if (cfg->fc_flags & RTF_PCPU) + goto out; + if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128) goto out; #ifndef CONFIG_IPV6_SUBTREES From 25c1040233728451a4c56083407ffd398c9f3759 Mon Sep 17 00:00:00 2001 From: Tushar Dave Date: Thu, 20 Apr 2017 15:57:31 -0700 Subject: [PATCH 507/521] netpoll: Check for skb->queue_mapping [ Upstream commit c70b17b775edb21280e9de7531acf6db3b365274 ] Reducing real_num_tx_queues needs to be in sync with skb queue_mapping otherwise skbs with queue_mapping greater than real_num_tx_queues can be sent to the underlying driver and can result in kernel panic. One such event is running netconsole and enabling VF on the same device. Or running netconsole and changing number of tx queues via ethtool on same device. e.g. Unable to handle kernel NULL pointer dereference tsk->{mm,active_mm}->context = 0000000000001525 tsk->{mm,active_mm}->pgd = fff800130ff9a000 \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ kworker/48:1(475): Oops [#1] CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G OE 4.11.0-rc3-davem-net+ #7 Workqueue: events queue_process task: fff80013113299c0 task.stack: fff800131132c000 TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y: 00000000 Tainted: G OE TPC: g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3: 0000000000000001 g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7: 00000000000000c0 o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3: 0000000000000003 o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc: 000000000049ed94 RPC: l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3: 0000000000000000 l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7: fff8001fa7605028 i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7: 00000000103fa4b0 I7: Call Trace: [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe] [0000000000998c74] netpoll_start_xmit+0xf4/0x200 [0000000000998e10] queue_process+0x90/0x160 [0000000000485fa8] process_one_work+0x188/0x480 [0000000000486410] worker_thread+0x170/0x4c0 [000000000048c6b8] kthread+0xd8/0x120 [0000000000406064] ret_from_fork+0x1c/0x2c [0000000000000000] (null) Disabling lock debugging due to kernel taint Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe] Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200 Caller[0000000000998e10]: queue_process+0x90/0x160 Caller[0000000000485fa8]: process_one_work+0x188/0x480 Caller[0000000000486410]: worker_thread+0x170/0x4c0 Caller[000000000048c6b8]: kthread+0xd8/0x120 Caller[0000000000406064]: ret_from_fork+0x1c/0x2c Caller[0000000000000000]: (null) Signed-off-by: Tushar Dave Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/netpoll.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 94acfc89ad97..440aa9f6e0a8 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -105,15 +105,21 @@ static void queue_process(struct work_struct *work) while ((skb = skb_dequeue(&npinfo->txq))) { struct net_device *dev = skb->dev; struct netdev_queue *txq; + unsigned int q_index; if (!netif_device_present(dev) || !netif_running(dev)) { kfree_skb(skb); continue; } - txq = skb_get_tx_queue(dev, skb); - local_irq_save(flags); + /* check if skb->queue_mapping is still valid */ + q_index = skb_get_queue_mapping(skb); + if (unlikely(q_index >= dev->real_num_tx_queues)) { + q_index = q_index % dev->real_num_tx_queues; + skb_set_queue_mapping(skb, q_index); + } + txq = netdev_get_tx_queue(dev, q_index); HARD_TX_LOCK(dev, txq, smp_processor_id()); if (netif_xmit_frozen_or_stopped(txq) || netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) { From bdeb026dfd9f27ec2e4a0ea97dab0e6e3d96e632 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Fri, 21 Apr 2017 20:42:16 +0300 Subject: [PATCH 508/521] ip6mr: fix notification device destruction [ Upstream commit 723b929ca0f79c0796f160c2eeda4597ee98d2b8 ] Andrey Konovalov reported a BUG caused by the ip6mr code which is caused because we call unregister_netdevice_many for a device that is already being destroyed. In IPv4's ipmr that has been resolved by two commits long time ago by introducing the "notify" parameter to the delete function and avoiding the unregister when called from a notifier, so let's do the same for ip6mr. The trace from Andrey: ------------[ cut here ]------------ kernel BUG at net/core/dev.c:6813! invalid opcode: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: netns cleanup_net task: ffff880069208000 task.stack: ffff8800692d8000 RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813 RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297 RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569 RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000 R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070 R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000 FS: 0000000000000000(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0 Call Trace: unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880 ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346 notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647 call_netdevice_notifiers net/core/dev.c:1663 rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881 unregister_netdevice_many net/core/dev.c:7880 default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333 ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144 cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463 process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097 worker_thread+0x223/0x19c0 kernel/workqueue.c:2231 kthread+0x35e/0x430 kernel/kthread.c:231 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89 47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f> 0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00 RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0 ---[ end trace e0b29c57e9b3292c ]--- Reported-by: Andrey Konovalov Signed-off-by: Nikolay Aleksandrov Tested-by: Andrey Konovalov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6mr.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index d9843e5a667f..8361d73ab653 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -774,7 +774,8 @@ static struct net_device *ip6mr_reg_vif(struct net *net, struct mr6_table *mrt) * Delete a VIF entry */ -static int mif6_delete(struct mr6_table *mrt, int vifi, struct list_head *head) +static int mif6_delete(struct mr6_table *mrt, int vifi, int notify, + struct list_head *head) { struct mif_device *v; struct net_device *dev; @@ -820,7 +821,7 @@ static int mif6_delete(struct mr6_table *mrt, int vifi, struct list_head *head) dev->ifindex, &in6_dev->cnf); } - if (v->flags & MIFF_REGISTER) + if ((v->flags & MIFF_REGISTER) && !notify) unregister_netdevice_queue(dev, head); dev_put(dev); @@ -1330,7 +1331,6 @@ static int ip6mr_device_event(struct notifier_block *this, struct mr6_table *mrt; struct mif_device *v; int ct; - LIST_HEAD(list); if (event != NETDEV_UNREGISTER) return NOTIFY_DONE; @@ -1339,10 +1339,9 @@ static int ip6mr_device_event(struct notifier_block *this, v = &mrt->vif6_table[0]; for (ct = 0; ct < mrt->maxvif; ct++, v++) { if (v->dev == dev) - mif6_delete(mrt, ct, &list); + mif6_delete(mrt, ct, 1, NULL); } } - unregister_netdevice_many(&list); return NOTIFY_DONE; } @@ -1551,7 +1550,7 @@ static void mroute_clean_tables(struct mr6_table *mrt, bool all) for (i = 0; i < mrt->maxvif; i++) { if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC)) continue; - mif6_delete(mrt, i, &list); + mif6_delete(mrt, i, 0, &list); } unregister_netdevice_many(&list); @@ -1704,7 +1703,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns if (copy_from_user(&mifi, optval, sizeof(mifi_t))) return -EFAULT; rtnl_lock(); - ret = mif6_delete(mrt, mifi, NULL); + ret = mif6_delete(mrt, mifi, 0, NULL); rtnl_unlock(); return ret; From 114f0c66dab46ab5d26585d6f1a63e517098a0a7 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Thu, 20 Apr 2017 20:55:12 +0800 Subject: [PATCH 509/521] macvlan: Fix device ref leak when purging bc_queue [ Upstream commit f6478218e6edc2a587b8f132f66373baa7b2497c ] When a parent macvlan device is destroyed we end up purging its broadcast queue without dropping the device reference count on the packet source device. This causes the source device to linger. This patch drops that reference count. Fixes: 260916dfb48c ("macvlan: Fix potential use-after free for...") Reported-by: Joe Ghalam Signed-off-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/macvlan.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 06c8bfeaccd6..40cd86614677 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1110,6 +1110,7 @@ static int macvlan_port_create(struct net_device *dev) static void macvlan_port_destroy(struct net_device *dev) { struct macvlan_port *port = macvlan_port_get_rtnl(dev); + struct sk_buff *skb; dev->priv_flags &= ~IFF_MACVLAN_PORT; netdev_rx_handler_unregister(dev); @@ -1118,7 +1119,15 @@ static void macvlan_port_destroy(struct net_device *dev) * but we need to cancel it and purge left skbs if any. */ cancel_work_sync(&port->bc_work); - __skb_queue_purge(&port->bc_queue); + + while ((skb = __skb_dequeue(&port->bc_queue))) { + const struct macvlan_dev *src = MACVLAN_SKB_CB(skb)->src; + + if (src) + dev_put(src->dev); + + kfree_skb(skb); + } kfree_rcu(port, rcu); } From befb9254243937f676433251cf4f8beb37b02c13 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Tue, 25 Apr 2017 14:37:15 -0700 Subject: [PATCH 510/521] ipv6: check skb->protocol before lookup for nexthop [ Upstream commit 199ab00f3cdb6f154ea93fa76fd80192861a821d ] Andrey reported a out-of-bound access in ip6_tnl_xmit(), this is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4 neigh key as an IPv6 address: neigh = dst_neigh_lookup(skb_dst(skb), &ipv6_hdr(skb)->daddr); if (!neigh) goto tx_err_link_failure; addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE addr_type = ipv6_addr_type(addr6); if (addr_type == IPV6_ADDR_ANY) addr6 = &ipv6_hdr(skb)->daddr; memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr)); Also the network header of the skb at this point should be still IPv4 for 4in6 tunnels, we shold not just use it as IPv6 header. This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it is, we are safe to do the nexthop lookup using skb_dst() and ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which dest address we can pick here, we have to rely on callers to fill it from tunnel config, so just fall to ip6_route_output() to make the decision. Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.") Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Cc: Steffen Klassert Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_tunnel.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 6c6161763c2f..97cb02dc5f02 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1049,7 +1049,7 @@ static int ip6_tnl_xmit2(struct sk_buff *skb, struct ip6_tnl *t = netdev_priv(dev); struct net *net = t->net; struct net_device_stats *stats = &t->dev->stats; - struct ipv6hdr *ipv6h = ipv6_hdr(skb); + struct ipv6hdr *ipv6h; struct ipv6_tel_txoption opt; struct dst_entry *dst = NULL, *ndst = NULL; struct net_device *tdev; @@ -1061,26 +1061,28 @@ static int ip6_tnl_xmit2(struct sk_buff *skb, /* NBMA tunnel */ if (ipv6_addr_any(&t->parms.raddr)) { - struct in6_addr *addr6; - struct neighbour *neigh; - int addr_type; + if (skb->protocol == htons(ETH_P_IPV6)) { + struct in6_addr *addr6; + struct neighbour *neigh; + int addr_type; - if (!skb_dst(skb)) - goto tx_err_link_failure; + if (!skb_dst(skb)) + goto tx_err_link_failure; - neigh = dst_neigh_lookup(skb_dst(skb), - &ipv6_hdr(skb)->daddr); - if (!neigh) - goto tx_err_link_failure; + neigh = dst_neigh_lookup(skb_dst(skb), + &ipv6_hdr(skb)->daddr); + if (!neigh) + goto tx_err_link_failure; - addr6 = (struct in6_addr *)&neigh->primary_key; - addr_type = ipv6_addr_type(addr6); + addr6 = (struct in6_addr *)&neigh->primary_key; + addr_type = ipv6_addr_type(addr6); - if (addr_type == IPV6_ADDR_ANY) - addr6 = &ipv6_hdr(skb)->daddr; + if (addr_type == IPV6_ADDR_ANY) + addr6 = &ipv6_hdr(skb)->daddr; - memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr)); - neigh_release(neigh); + memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr)); + neigh_release(neigh); + } } else if (!(t->parms.flags & (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) { /* enable the cache only only if the routing decision does From 5e52fffbb11c28c37fe8e335fdd8e9705daff06d Mon Sep 17 00:00:00 2001 From: Jamie Bainbridge Date: Wed, 26 Apr 2017 10:43:27 +1000 Subject: [PATCH 511/521] ipv6: check raw payload size correctly in ioctl [ Upstream commit 105f5528b9bbaa08b526d3405a5bcd2ff0c953c8 ] In situations where an skb is paged, the transport header pointer and tail pointer can be the same because the skb contents are in frags. This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a length of 0 when the length to receive is actually greater than zero. skb->len is already correctly set in ip6_input_finish() with pskb_pull(), so use skb->len as it always returns the correct result for both linear and paged data. Signed-off-by: Jamie Bainbridge Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/raw.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 8bca90d6d915..a625f69a28dd 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -1144,8 +1144,7 @@ static int rawv6_ioctl(struct sock *sk, int cmd, unsigned long arg) spin_lock_bh(&sk->sk_receive_queue.lock); skb = skb_peek(&sk->sk_receive_queue); if (skb) - amount = skb_tail_pointer(skb) - - skb_transport_header(skb); + amount = skb->len; spin_unlock_bh(&sk->sk_receive_queue.lock); return put_user(amount, (int __user *)arg); } From 8cbaf11c5026abed0f772b6478795c7170d70171 Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Fri, 14 Apr 2017 12:43:01 +0900 Subject: [PATCH 512/521] ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type commit dfb00a56935186171abb5280b3407c3f910011f1 upstream. An abstraction of asynchronous transaction for transmission of MIDI messages was introduced in Linux v4.4. Each driver can utilize this abstraction to transfer MIDI messages via fixed-length payload of transaction to a certain unit address. Filling payload of the transaction is done by callback. In this callback, each driver can return negative error code, however current implementation assigns the return value to unsigned variable. This commit changes type of the variable to fix the bug. Reported-by: Julia Lawall Fixes: 585d7cba5e1f ("ALSA: firewire-lib: add helper functions for asynchronous transactions to transfer MIDI messages") Signed-off-by: Takashi Sakamoto Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/firewire/lib.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/firewire/lib.h b/sound/firewire/lib.h index f3f6f84c48d6..bb5f8cdea3e2 100644 --- a/sound/firewire/lib.h +++ b/sound/firewire/lib.h @@ -42,7 +42,7 @@ struct snd_fw_async_midi_port { struct snd_rawmidi_substream *substream; snd_fw_async_midi_port_fill fill; - unsigned int consume_bytes; + int consume_bytes; }; int snd_fw_async_midi_port_init(struct snd_fw_async_midi_port *port, From 555f77106f77ceb1ad582baf4a3c936c2257fc08 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Sun, 9 Apr 2017 10:41:27 +0200 Subject: [PATCH 513/521] ALSA: seq: Don't break snd_use_lock_sync() loop by timeout commit 4e7655fd4f47c23e5249ea260dc802f909a64611 upstream. The snd_use_lock_sync() (thus its implementation snd_use_lock_sync_helper()) has the 5 seconds timeout to break out of the sync loop. It was introduced from the beginning, just to be "safer", in terms of avoiding the stupid bugs. However, as Ben Hutchings suggested, this timeout rather introduces a potential leak or use-after-free that was apparently fixed by the commit 2d7d54002e39 ("ALSA: seq: Fix race during FIFO resize"): for example, snd_seq_fifo_event_in() -> snd_seq_event_dup() -> copy_from_user() could block for a long time, and snd_use_lock_sync() goes timeout and still leaves the cell at releasing the pool. For fixing such a problem, we remove the break by the timeout while still keeping the warning. Suggested-by: Ben Hutchings Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/seq/seq_lock.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/sound/core/seq/seq_lock.c b/sound/core/seq/seq_lock.c index 3b693e924db7..12ba83367b1b 100644 --- a/sound/core/seq/seq_lock.c +++ b/sound/core/seq/seq_lock.c @@ -28,19 +28,16 @@ /* wait until all locks are released */ void snd_use_lock_sync_helper(snd_use_lock_t *lockp, const char *file, int line) { - int max_count = 5 * HZ; + int warn_count = 5 * HZ; if (atomic_read(lockp) < 0) { pr_warn("ALSA: seq_lock: lock trouble [counter = %d] in %s:%d\n", atomic_read(lockp), file, line); return; } while (atomic_read(lockp) > 0) { - if (max_count == 0) { - pr_warn("ALSA: seq_lock: timeout [%d left] in %s:%d\n", atomic_read(lockp), file, line); - break; - } + if (warn_count-- == 0) + pr_warn("ALSA: seq_lock: waiting [%d left] in %s:%d\n", atomic_read(lockp), file, line); schedule_timeout_uninterruptible(1); - max_count--; } } From 1c26c382c9e7a6a919b5a005e07747c40366e7df Mon Sep 17 00:00:00 2001 From: James Hogan Date: Thu, 30 Mar 2017 16:06:02 +0100 Subject: [PATCH 514/521] MIPS: KGDB: Use kernel context for sleeping threads commit 162b270c664dca2e0944308e92f9fcc887151a72 upstream. KGDB is a kernel debug stub and it can't be used to debug userland as it can only safely access kernel memory. On MIPS however KGDB has always got the register state of sleeping processes from the userland register context at the beginning of the kernel stack. This is meaningless for kernel threads (which never enter userland), and for user threads it prevents the user seeing what it is doing while in the kernel: (gdb) info threads Id Target Id Frame ... 3 Thread 2 (kthreadd) 0x0000000000000000 in ?? () 2 Thread 1 (init) 0x000000007705c4b4 in ?? () 1 Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201 Get the register state instead from the (partial) kernel register context stored in the task's thread_struct for resume() to restore. All threads now correctly appear to be in context_switch(): (gdb) info threads Id Target Id Frame ... 3 Thread 2 (kthreadd) context_switch (rq=, cookie=..., next=, prev=0x0) at kernel/sched/core.c:2903 2 Thread 1 (init) context_switch (rq=, cookie=..., next=, prev=0x0) at kernel/sched/core.c:2903 1 Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201 Call clobbered registers which aren't saved and exception registers (BadVAddr & Cause) which can't be easily determined without stack unwinding are reported as 0. The PC is taken from the return address, such that the state presented matches that found immediately after returning from resume(). Fixes: 8854700115ec ("[MIPS] kgdb: add arch support for the kernel's kgdb core") Signed-off-by: James Hogan Cc: Jason Wessel Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15829/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/kgdb.c | 46 ++++++++++++++++++++++++++++------------- 1 file changed, 32 insertions(+), 14 deletions(-) diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c index de63d36af895..732d6171ac6a 100644 --- a/arch/mips/kernel/kgdb.c +++ b/arch/mips/kernel/kgdb.c @@ -244,9 +244,6 @@ static int compute_signal(int tt) void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p) { int reg; - struct thread_info *ti = task_thread_info(p); - unsigned long ksp = (unsigned long)ti + THREAD_SIZE - 32; - struct pt_regs *regs = (struct pt_regs *)ksp - 1; #if (KGDB_GDB_REG_SIZE == 32) u32 *ptr = (u32 *)gdb_regs; #else @@ -254,25 +251,46 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p) #endif for (reg = 0; reg < 16; reg++) - *(ptr++) = regs->regs[reg]; + *(ptr++) = 0; /* S0 - S7 */ - for (reg = 16; reg < 24; reg++) - *(ptr++) = regs->regs[reg]; + *(ptr++) = p->thread.reg16; + *(ptr++) = p->thread.reg17; + *(ptr++) = p->thread.reg18; + *(ptr++) = p->thread.reg19; + *(ptr++) = p->thread.reg20; + *(ptr++) = p->thread.reg21; + *(ptr++) = p->thread.reg22; + *(ptr++) = p->thread.reg23; for (reg = 24; reg < 28; reg++) *(ptr++) = 0; /* GP, SP, FP, RA */ - for (reg = 28; reg < 32; reg++) - *(ptr++) = regs->regs[reg]; + *(ptr++) = (long)p; + *(ptr++) = p->thread.reg29; + *(ptr++) = p->thread.reg30; + *(ptr++) = p->thread.reg31; - *(ptr++) = regs->cp0_status; - *(ptr++) = regs->lo; - *(ptr++) = regs->hi; - *(ptr++) = regs->cp0_badvaddr; - *(ptr++) = regs->cp0_cause; - *(ptr++) = regs->cp0_epc; + *(ptr++) = p->thread.cp0_status; + + /* lo, hi */ + *(ptr++) = 0; + *(ptr++) = 0; + + /* + * BadVAddr, Cause + * Ideally these would come from the last exception frame up the stack + * but that requires unwinding, otherwise we can't know much for sure. + */ + *(ptr++) = 0; + *(ptr++) = 0; + + /* + * PC + * use return address (RA), i.e. the moment after return from resume() + */ + *(ptr++) = p->thread.reg31; } void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc) From 3bf0809930b8948650e04cef29143803f87073df Mon Sep 17 00:00:00 2001 From: James Cowgill Date: Tue, 11 Apr 2017 13:51:07 +0100 Subject: [PATCH 515/521] MIPS: Avoid BUG warning in arch_check_elf commit c46f59e90226fa5bfcc83650edebe84ae47d454b upstream. arch_check_elf contains a usage of current_cpu_data that will call smp_processor_id() with preemption enabled and therefore triggers a "BUG: using smp_processor_id() in preemptible" warning when an fpxx executable is loaded. As a follow-up to commit b244614a60ab ("MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)"), apply the same fix to arch_check_elf by using raw_current_cpu_data instead. The rationale quoted from the previous commit: "It is assumed throughout the kernel that if any CPU has an FPU, then all CPUs would have an FPU as well, so it is safe to perform the check with preemption enabled - change the code to use raw_ variant of the check to avoid the warning." Fixes: 46490b572544 ("MIPS: kernel: elf: Improve the overall ABI and FPU mode checks") Signed-off-by: James Cowgill Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15951/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/elf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/kernel/elf.c b/arch/mips/kernel/elf.c index 4a4d9e067c89..3afffc30ee12 100644 --- a/arch/mips/kernel/elf.c +++ b/arch/mips/kernel/elf.c @@ -206,7 +206,7 @@ int arch_check_elf(void *_ehdr, bool has_interpreter, else if ((prog_req.fr1 && prog_req.frdefault) || (prog_req.single && !prog_req.frdefault)) /* Make sure 64-bit MIPS III/IV/64R1 will not pick FR1 */ - state->overall_fp_mode = ((current_cpu_data.fpu_id & MIPS_FPIR_F64) && + state->overall_fp_mode = ((raw_current_cpu_data.fpu_id & MIPS_FPIR_F64) && cpu_has_mips_r2_r6) ? FP_FR1 : FP_FR0; else if (prog_req.fr1) From 91ce8d13faeb32d0b5544674a1ac8d955c8f4891 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Fri, 14 Apr 2017 17:22:18 -0400 Subject: [PATCH 516/521] p9_client_readdir() fix commit 71d6ad08379304128e4bdfaf0b4185d54375423e upstream. Don't assume that server is sane and won't return more data than asked for. Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- net/9p/client.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/9p/client.c b/net/9p/client.c index ea79ee9a7348..f5feac4ff4ec 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -2101,6 +2101,10 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset) trace_9p_protocol_dump(clnt, req->rc); goto free_and_error; } + if (rsize < count) { + pr_err("bogus RREADDIR count (%d > %d)\n", count, rsize); + count = rsize; + } p9_debug(P9_DEBUG_9P, "<<< RREADDIR count %d\n", count); From 2032eebe2384dfd5a5c572acfdaeed4371b30958 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Thu, 13 Apr 2017 15:36:31 -0700 Subject: [PATCH 517/521] Input: i8042 - add Clevo P650RS to the i8042 reset list MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 7c5bb4ac2b76d2a09256aec8a7d584bf3e2b0466 upstream. Clevo P650RS and other similar devices require i8042 to be reset in order to detect Synaptics touchpad. Reported-by: Paweł Bylica Tested-by: Ed Bordin Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=190301 Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/serio/i8042-x86ia64io.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index 25eab453f2b2..e7b96f1ac2c5 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -685,6 +685,13 @@ static const struct dmi_system_id __initconst i8042_dmi_reset_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "20046"), }, }, + { + /* Clevo P650RS, 650RP6, Sager NP8152-S, and others */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Notebook"), + DMI_MATCH(DMI_PRODUCT_NAME, "P65xRP"), + }, + }, { } }; From 82a0d8aabe043ac94efa255502754c70363dab0e Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 21 Apr 2017 16:10:18 -0400 Subject: [PATCH 518/521] nfsd: check for oversized NFSv2/v3 arguments MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit e6838a29ecb484c97e4efef9429643b9851fba6e upstream. A client can append random data to the end of an NFSv2 or NFSv3 RPC call without our complaining; we'll just stop parsing at the end of the expected data and ignore the rest. Encoded arguments and replies are stored together in an array of pages, and if a call is too large it could leave inadequate space for the reply. This is normally OK because NFS RPC's typically have either short arguments and long replies (like READ) or long arguments and short replies (like WRITE). But a client that sends an incorrectly long reply can violate those assumptions. This was observed to cause crashes. Also, several operations increment rq_next_page in the decode routine before checking the argument size, which can leave rq_next_page pointing well past the end of the page array, causing trouble later in svc_free_pages. So, following a suggestion from Neil Brown, add a central check to enforce our expectation that no NFSv2/v3 call has both a large call and a large reply. As followup we may also want to rewrite the encoding routines to check more carefully that they aren't running off the end of the page array. We may also consider rejecting calls that have any extra garbage appended. That would be safer, and within our rights by spec, but given the age of our server and the NFS protocol, and the fact that we've never enforced this before, we may need to balance that against the possibility of breaking some oddball client. Reported-by: Tuomas Haanpää Reported-by: Ari Kauppi Reviewed-by: NeilBrown Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/nfssvc.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index ad4e2377dd63..5be1fa6b676d 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -656,6 +656,37 @@ static __be32 map_new_errors(u32 vers, __be32 nfserr) return nfserr; } +/* + * A write procedure can have a large argument, and a read procedure can + * have a large reply, but no NFSv2 or NFSv3 procedure has argument and + * reply that can both be larger than a page. The xdr code has taken + * advantage of this assumption to be a sloppy about bounds checking in + * some cases. Pending a rewrite of the NFSv2/v3 xdr code to fix that + * problem, we enforce these assumptions here: + */ +static bool nfs_request_too_big(struct svc_rqst *rqstp, + struct svc_procedure *proc) +{ + /* + * The ACL code has more careful bounds-checking and is not + * susceptible to this problem: + */ + if (rqstp->rq_prog != NFS_PROGRAM) + return false; + /* + * Ditto NFSv4 (which can in theory have argument and reply both + * more than a page): + */ + if (rqstp->rq_vers >= 4) + return false; + /* The reply will be small, we're OK: */ + if (proc->pc_xdrressize > 0 && + proc->pc_xdrressize < XDR_QUADLEN(PAGE_SIZE)) + return false; + + return rqstp->rq_arg.len > PAGE_SIZE; +} + int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) { @@ -668,6 +699,11 @@ nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) rqstp->rq_vers, rqstp->rq_proc); proc = rqstp->rq_procinfo; + if (nfs_request_too_big(rqstp, proc)) { + dprintk("nfsd: NFSv%d argument too large\n", rqstp->rq_vers); + *statp = rpc_garbage_args; + return 1; + } /* * Give the xdr decoder a chance to change this if it wants * (necessary in the NFSv4.0 compound case) From 1aefe328a68dc1ffff1adf81912dacaa83a066fa Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Sun, 8 Jan 2017 19:45:48 -0800 Subject: [PATCH 519/521] ARCv2: save r30 on kernel entry as gcc uses it for code-gen commit ecd43afdbe72017aefe48080631eb625e177ef4d upstream. This is not exposed to userspace debugers yet, which can be done independently as a seperate patch ! Signed-off-by: Vineet Gupta Signed-off-by: Greg Kroah-Hartman --- arch/arc/include/asm/entry-arcv2.h | 2 ++ arch/arc/include/asm/ptrace.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arc/include/asm/entry-arcv2.h b/arch/arc/include/asm/entry-arcv2.h index b5ff87e6f4b7..aee1a77934cf 100644 --- a/arch/arc/include/asm/entry-arcv2.h +++ b/arch/arc/include/asm/entry-arcv2.h @@ -16,6 +16,7 @@ ; ; Now manually save: r12, sp, fp, gp, r25 + PUSH r30 PUSH r12 ; Saving pt_regs->sp correctly requires some extra work due to the way @@ -72,6 +73,7 @@ POPAX AUX_USER_SP 1: POP r12 + POP r30 .endm diff --git a/arch/arc/include/asm/ptrace.h b/arch/arc/include/asm/ptrace.h index 69095da1fcfd..47111d565a95 100644 --- a/arch/arc/include/asm/ptrace.h +++ b/arch/arc/include/asm/ptrace.h @@ -84,7 +84,7 @@ struct pt_regs { unsigned long fp; unsigned long sp; /* user/kernel sp depending on where we came from */ - unsigned long r12; + unsigned long r12, r30; /*------- Below list auto saved by h/w -----------*/ unsigned long r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11; From 9c4a4755d9c552a640bef60de40b46bda3674431 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Thu, 13 Apr 2017 17:53:55 -0500 Subject: [PATCH 520/521] ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram commit 34a477e5297cbaa6ecc6e17c042a866e1cbe80d6 upstream. On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable function graph tracing and then suspend to RAM, it will triple fault and reboot when it resumes. The first fault happens when booting a secondary CPU: startup_32_smp() load_ucode_ap() prepare_ftrace_return() ftrace_graph_is_dead() (accesses 'kill_ftrace_graph') The early head_32.S code calls into load_ucode_ap(), which has an an ftrace hook, so it calls prepare_ftrace_return(), which calls ftrace_graph_is_dead(), which tries to access the global 'kill_ftrace_graph' variable with a virtual address, causing a fault because the CPU is still in real mode. The fix is to add a check in prepare_ftrace_return() to make sure it's running in protected mode before continuing. The check makes sure the stack pointer is a virtual kernel address. It's a bit of a hack, but it's not very intrusive and it works well enough. For reference, here are a few other (more difficult) ways this could have potentially been fixed: - Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging is enabled. (No idea what that would break.) - Track down load_ucode_ap()'s entire callee tree and mark all the functions 'notrace'. (Probably not realistic.) - Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu() or __cpu_up(), and ensure that the pause facility can be queried from real mode. Reported-by: Paul Menzel Signed-off-by: Josh Poimboeuf Tested-by: Paul Menzel Reviewed-by: Steven Rostedt (VMware) Cc: "Rafael J . Wysocki" Cc: linux-acpi@vger.kernel.org Cc: Borislav Petkov Cc: Len Brown Link: http://lkml.kernel.org/r/5c1272269a580660703ed2eccf44308e790c7a98.1492123841.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/ftrace.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 311bcf338f07..bfc587579dc3 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -977,6 +977,18 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent, unsigned long return_hooker = (unsigned long) &return_to_handler; + /* + * When resuming from suspend-to-ram, this function can be indirectly + * called from early CPU startup code while the CPU is in real mode, + * which would fail miserably. Make sure the stack pointer is a + * virtual address. + * + * This check isn't as accurate as virt_addr_valid(), but it should be + * good enough for this purpose, and it's fast. + */ + if (unlikely((long)__builtin_frame_address(0) >= 0)) + return; + if (unlikely(ftrace_graph_is_dead())) return; From 0c49a2c16ca93e2e4e68e21dbc95e9466a01bbdb Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 2 May 2017 21:20:09 -0700 Subject: [PATCH 521/521] Linux 4.4.66 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index ddaef04f528a..1cd052823c03 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 65 +SUBLEVEL = 66 EXTRAVERSION = NAME = Blurry Fish Butt