Commit Graph

28618 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
c97d2b535c This is the 4.19.26 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlx2U68ACgkQONu9yGCS
 aT6rHA/+PGrMAAzSp193bme2rap7P/EzDD0JF0aQOLwizdit5A4zIYSZApVVSaw5
 U8KIQg+fcwIjTvzBmkHddDVVJBEv/VdE8Xkf0EOIOidIQNqMoRQN7ctU1bnKfUyC
 20/xZBlc78BEU4TB4JpNiKkzD3bsl/Gsb2A3VenhHe0H8dbj4oJYyNWWY99/433y
 u1SNmTogLJ0p64+1XWMCTqPJA2xkUIOciFSLiO7d51MfpbguMwcW+T9XSwYat3tp
 uTdfvZYKc4iGPa4pZCZmZY7+Lrxne2OjMhp3bUlp0coKrjlNO848X+kG7gsVZ7EN
 MA7YOXUj/Ngjl+E9EilNY5gybY2hK0IzaaV8W9TRVYvPBjgSwe5Mj9/EXYO0L8q0
 Arla5HsJdRhLX7HDJ41w8BIngko87hetAaTZVCct+F+nnMYGrrMsle+BgYYbnd7r
 NYWB8HwZZ3krkWmAZb+Phpleo3Oj7beum5MLoydSH4pXhTVft5D/V8K2fwLmZtgP
 FsIC1J09jDvXJ5qHTFBY4ugTq8l21YZ2oMPdwcB6X8hBIKji4XydQ0HPrdMyAtXB
 tqUkzyLZlP1uzm9T3E+hyATWeM4ue7qLFacp7cX5LokDWIxlWi7tcIqBHqRZez+y
 S7un8/Lmz8ZvzC674JTZl1iXSHuuK6TFLrwI3mw/CVhg2qRvZtk=
 =EDHP
 -----END PGP SIGNATURE-----

Merge 4.19.26 into android-4.19

Changes in 4.19.26
	ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction
	tracing: Fix number of entries in trace header
	MIPS: eBPF: Always return sign extended 32b values
	gpio: MT7621: use a per instance irq_chip structure
	gpio: pxa: avoid attempting to set pin direction via pinctrl on MMP2
	mac80211: Restore vif beacon interval if start ap fails
	mac80211: Use linked list instead of rhashtable walk for mesh tables
	mac80211: Free mpath object when rhashtable insertion fails
	libceph: handle an empty authorize reply
	ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
	numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
	proc, oom: do not report alien mms when setting oom_score_adj
	ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5
	ALSA: hda/realtek: Disable PC beep in passthrough on alc285
	KEYS: allow reaching the keys quotas exactly
	backlight: pwm_bl: Fix devicetree parsing with auto-generated brightness tables
	mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells
	pvcalls-front: read all data before closing the connection
	pvcalls-front: don't try to free unallocated rings
	pvcalls-front: properly allocate sk
	pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
	mfd: twl-core: Fix section annotations on {,un}protect_pm_master
	mfd: db8500-prcmu: Fix some section annotations
	mfd: mt6397: Do not call irq_domain_remove if PMIC unsupported
	mfd: ab8500-core: Return zero in get_register_interruptible()
	mfd: bd9571mwv: Add volatile register to make DVFS work
	mfd: qcom_rpm: write fw_version to CTRL_REG
	mfd: wm5110: Add missing ASRC rate register
	mfd: axp20x: Add AC power supply cell for AXP813
	mfd: axp20x: Re-align MFD cell entries
	mfd: axp20x: Add supported cells for AXP803
	mfd: cros_ec_dev: Add missing mfd_remove_devices() call in remove
	mfd: tps65218: Use devm_regmap_add_irq_chip and clean up error path in probe()
	mfd: mc13xxx: Fix a missing check of a register-read failure
	xen/pvcalls: remove set but not used variable 'intf'
	qed: Fix qed_chain_set_prod() for PBL chains with non power of 2 page count
	qed: Fix qed_ll2_post_rx_buffer_notify_fw() by adding a write memory barrier
	net: hns: Fix use after free identified by SLUB debug
	bpf: Fix [::] -> [::1] rewrite in sys_sendmsg
	selftests/bpf: Test [::] -> [::1] rewrite in sys_sendmsg in test_sock_addr
	watchdog: mt7621_wdt/rt2880_wdt: Fix compilation problem
	net/mlx4: Get rid of page operation after dma_alloc_coherent
	MIPS: ath79: Enable OF serial ports in the default config
	xprtrdma: Double free in rpcrdma_sendctxs_create()
	mlxsw: spectrum_acl: Add cleanup after C-TCAM update error condition
	selftests: forwarding: Add a test for VLAN deletion
	netfilter: nf_tables: fix leaking object reference count
	scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
	scsi: isci: initialize shost fully before calling scsi_add_host()
	include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
	MIPS: jazz: fix 64bit build
	netfilter: nft_flow_offload: Fix reverse route lookup
	bpf: correctly set initial window on active Fast Open sender
	pvcalls-front: Avoid get_free_pages(GFP_KERNEL) under spinlock
	bpf: fix panic in stack_map_get_build_id() on i386 and arm32
	netfilter: nft_flow_offload: fix interaction with vrf slave device
	RDMA/mthca: Clear QP objects during their allocation
	powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.
	acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id()
	net: stmmac: Fix PCI module removal leak
	net: stmmac: dwxgmac2: Only clear interrupts that are active
	net: stmmac: Check if CBS is supported before configuring
	net: stmmac: Fix the logic of checking if RX Watchdog must be enabled
	net: stmmac: Prevent RX starvation in stmmac_napi_poll()
	isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
	scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes
	scsi: ufs: Fix system suspend status
	scsi: qedi: Add ep_state for login completion on un-reachable targets
	scsi: ufs: Fix geometry descriptor size
	scsi: cxgb4i: add wait_for_completion()
	netfilter: nft_flow_offload: fix checking method of conntrack helper
	always clear the X2APIC_ENABLE bit for PV guest
	drm/meson: add missing of_node_put
	drm/amdkfd: Don't assign dGPUs to APU topology devices
	drm/amd/display: fix PME notification not working in RV desktop
	vhost: return EINVAL if iovecs size does not match the message size
	drm/sun4i: backend: add missing of_node_puts
	pvcalls-front: fix potential null dereference
	selftests: tc-testing: drop test on missing tunnel key id
	selftests: tc-testing: fix tunnel_key failure if dst_port is unspecified
	selftests: tc-testing: fix parsing of ife type
	afs: Don't set vnode->cb_s_break in afs_validate()
	afs: Fix key refcounting in file locking code
	bpf: don't assume build-id length is always 20 bytes
	bpf: zero out build_id for BPF_STACK_BUILD_ID_IP
	selftests/bpf: retry tests that expect build-id
	atm: he: fix sign-extension overflow on large shift
	hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table
	leds: lp5523: fix a missing check of return value of lp55xx_read
	bpf: bpf_setsockopt: reset sock dst on SO_MARK changes
	dpaa_eth: NETIF_F_LLTX requires to do our own update of trans_start
	mlxsw: pci: Return error on PCI reset timeout
	net: bridge: Mark FDB entries that were added by user as such
	mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
	selftests: forwarding: Add a test case for externally learned FDB entries
	net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
	isdn: avm: Fix string plus integer warning from Clang
	batman-adv: fix uninit-value in batadv_interface_tx()
	inet_diag: fix reporting cgroup classid and fallback to priority
	ipv6: propagate genlmsg_reply return code
	net: ena: fix race between link up and device initalization
	net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
	net/mlx5e: Don't overwrite pedit action when multiple pedit used
	net/packet: fix 4gb buffer limit due to overflow check
	net: sfp: do not probe SFP module before we're attached
	sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
	sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
	team: avoid complex list operations in team_nl_cmd_options_set()
	Revert "socket: fix struct ifreq size in compat ioctl"
	Revert "kill dev_ifsioc()"
	net: socket: fix SIOCGIFNAME in compat
	net: socket: make bond ioctls go through compat_ifreq_ioctl()
	geneve: should not call rt6_lookup() when ipv6 was disabled
	sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
	net_sched: fix a race condition in tcindex_destroy()
	net_sched: fix a memory leak in cls_tcindex
	net_sched: fix two more memory leaks in cls_tcindex
	net/mlx5e: XDP, fix redirect resources availability check
	RDMA/srp: Rework SCSI device reset handling
	KEYS: user: Align the payload buffer
	KEYS: always initialize keyring_index_key::desc_len
	parisc: Fix ptrace syscall number modification
	ARCv2: Enable unaligned access in early ASM code
	ARC: U-boot: check arguments paranoidly
	ARC: define ARCH_SLAB_MINALIGN = 8
	drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
	gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
	drm/i915/fbdev: Actually configure untiled displays
	drm/amd/display: Fix MST reboot/poweroff sequence
	mac80211: allocate tailroom for forwarded mesh packets
	kvm: x86: Return LA57 feature based on hardware capability
	net: validate untrusted gso packets without csum offload
	net: avoid false positives in untrusted gso validation
	staging: erofs: fix a bug when appling cache strategy
	staging: erofs: complete error handing of z_erofs_do_read_page
	staging: erofs: replace BUG_ON with DBG_BUGON in data.c
	staging: erofs: drop multiref support temporarily
	staging: erofs: remove the redundant d_rehash() for the root dentry
	staging: erofs: atomic_cond_read_relaxed on ref-locked workgroup
	staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}'
	staging: erofs: add a full barrier in erofs_workgroup_unfreeze
	staging: erofs: {dir,inode,super}.c: rectify BUG_ONs
	staging: erofs: unzip_{pagevec.h,vle.c}: rectify BUG_ONs
	staging: erofs: unzip_vle_lz4.c,utils.c: rectify BUG_ONs
	Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
	netfilter: nf_tables: fix flush after rule deletion in the same batch
	netfilter: nft_compat: use-after-free when deleting targets
	netfilter: ipv6: Don't preserve original oif for loopback address
	netfilter: nfnetlink_osf: add missing fmatch check
	netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put()
	udlfb: handle unplug properly
	pinctrl: max77620: Use define directive for max77620_pinconf_param values
	net: phylink: avoid resolving link state too early
	Linux 4.19.26

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-27 10:23:18 +01:00
Stanislav Fomichev
5c6fdd877e bpf: zero out build_id for BPF_STACK_BUILD_ID_IP
[ Upstream commit 4af396ae48 ]

When returning BPF_STACK_BUILD_ID_IP from stack_map_get_build_id_offset,
make sure that build_id field is empty. Since we are using percpu
free list, there is a possibility that we might reuse some previous
bpf_stack_build_id with non-zero build_id.

Fixes: 615755a77b ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-27 10:08:56 +01:00
Stanislav Fomichev
c4555b9f28 bpf: don't assume build-id length is always 20 bytes
[ Upstream commit 0b698005a9 ]

Build-id length is not fixed to 20, it can be (`man ld` /--build-id):
  * 128-bit (uuid)
  * 160-bit (sha1)
  * any length specified in ld --build-id=0xhexstring

To fix the issue of missing BPF_STACK_BUILD_ID_VALID for shorter build-ids,
assume that build-id is somewhere in the range of 1 .. 20.
Set the remaining bytes to zero.

v2:
* don't introduce new "len = min(BPF_BUILD_ID_SIZE, nhdr->n_descsz)",
  we already know that nhdr->n_descsz <= BPF_BUILD_ID_SIZE if we enter
  this 'if' condition

Fixes: 615755a77b ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-27 10:08:56 +01:00
Song Liu
2f3480e340 bpf: fix panic in stack_map_get_build_id() on i386 and arm32
[ Upstream commit beaf3d1901 ]

As Naresh reported, test_stacktrace_build_id() causes panic on i386 and
arm32 systems. This is caused by page_address() returns NULL in certain
cases.

This patch fixes this error by using kmap_atomic/kunmap_atomic instead
of page_address.

Fixes: 615755a77b (" bpf: extend stackmap to save binary_build_id+offset instead of address")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-27 10:08:54 +01:00
Quentin Perret
b5e57dbb5a tracing: Fix number of entries in trace header
commit 9e7382153f upstream.

The following commit

  441dae8f2f ("tracing: Add support for display of tgid in trace output")

removed the call to print_event_info() from print_func_help_header_irq()
which results in the ftrace header not reporting the number of entries
written in the buffer. As this wasn't the original intent of the patch,
re-introduce the call to print_event_info() to restore the orginal
behaviour.

Link: http://lkml.kernel.org/r/20190214152950.4179-1-quentin.perret@arm.com

Acked-by: Joel Fernandes <joelaf@google.com>
Cc: stable@vger.kernel.org
Fixes: 441dae8f2f ("tracing: Add support for display of tgid in trace output")
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 10:08:49 +01:00
Greg Kroah-Hartman
cca7d2df6d This is the 4.19.24 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxtHSUACgkQONu9yGCS
 aT6WIw//Xdd2P7aIWSDnmS6L+RlEtdz2hnO07O2lMVIhsH1MgetAk0nJbP8b9sYp
 zJY4kW6gEue1/J7vHIiTnDpnhaylxESOxOb49O/E95bnPIdhE9EqR1aiMmo8R5rU
 8F2lH6kZncAawZ/7WGnQdHqukO9GtwFO0hwtsDT5RcwiIPe/fHkIpp1QfN21ntib
 nV7xQOfDFEMVcbyzAFa+59A4UUQSWKTkld9tcJFG62pjcN9DlpPej86M2qy9mG8U
 0d7hEkYGqHthh092TAxgobsJDq5j0jUUVjK80/Fq0TKe/OSvsgQ8MRBryVsgBf8J
 hJBCLHay4Ps3ONtNI3IXDVZu9dgdL8UynBEfbWdgJHlGgKJEvMGxFCckzI8OP3yr
 HJtFm6gG24kySFt+QnvwHg1/zwAho5FDBMfx02RkukJwDiN16DZSbKRimrTVAioV
 bBTmL+JyoCBLYGfUzq4gM1+3L5TRyMBqjgcjLvXWuXF18fdpVdb0exKlNtLHMUf6
 aqzhroF90sn5vUybpw02ouduxmRPWDaDel7ArA3zb16Dp+ODCxYtpkV7dmRBjPVK
 YxwweO604Rd2musacQkVAB5JkshjPdV7WA/FmhJf7Fn1/CaaXliAVrfMNbEGW15a
 NFQvTMrJva2pihXYJY4p/UfhnsBmxXmY30wYlLX2VxRFb+qoy3Y=
 =MO0g
 -----END PGP SIGNATURE-----

Merge 4.19.24 into android-4.19

Changes in 4.19.24
	dt-bindings: eeprom: at24: add "atmel,24c2048" compatible string
	eeprom: at24: add support for 24c2048
	blk-mq: fix a hung issue when fsync
	ARM: 8789/1: signal: copy registers using __copy_to_user()
	ARM: 8790/1: signal: always use __copy_to_user to save iwmmxt context
	ARM: 8791/1: vfp: use __copy_to_user() when saving VFP state
	ARM: 8792/1: oabi-compat: copy oabi events using __copy_to_user()
	ARM: 8793/1: signal: replace __put_user_error with __put_user
	ARM: 8794/1: uaccess: Prevent speculative use of the current addr_limit
	ARM: 8795/1: spectre-v1.1: use put_user() for __put_user()
	ARM: 8796/1: spectre-v1,v1.1: provide helpers for address sanitization
	ARM: 8797/1: spectre-v1.1: harden __copy_to_user
	ARM: 8810/1: vfp: Fix wrong assignement to ufp_exc
	ARM: make lookup_processor_type() non-__init
	ARM: split out processor lookup
	ARM: clean up per-processor check_bugs method call
	ARM: add PROC_VTABLE and PROC_TABLE macros
	ARM: spectre-v2: per-CPU vtables to work around big.Little systems
	ARM: ensure that processor vtables is not lost after boot
	ARM: fix the cockup in the previous patch
	drm/amdgpu/sriov:Correct pfvf exchange logic
	ACPI: NUMA: Use correct type for printing addresses on i386-PAE
	perf report: Fix wrong iteration count in --branch-history
	perf test shell: Use a fallback to get the pathname in vfs_getname
	tools uapi: fix RISC-V 64-bit support
	riscv: fix trace_sys_exit hook
	cpufreq: check if policy is inactive early in __cpufreq_get()
	drm/bridge: tc358767: add bus flags
	drm/bridge: tc358767: add defines for DP1_SRCCTRL & PHY_2LANE
	drm/bridge: tc358767: fix single lane configuration
	drm/bridge: tc358767: fix initial DP0/1_SRCCTRL value
	drm/bridge: tc358767: reject modes which require too much BW
	drm/bridge: tc358767: fix output H/V syncs
	nvme-pci: use the same attributes when freeing host_mem_desc_bufs.
	nvme-pci: fix out of bounds access in nvme_cqe_pending
	nvme-multipath: zero out ANA log buffer
	nvme: pad fake subsys NQN vid and ssvid with zeros
	drm/amdgpu: set WRITE_BURST_LENGTH to 64B to workaround SDMA1 hang
	ARM: dts: da850-evm: Correct the audio codec regulators
	ARM: dts: da850-evm: Correct the sound card name
	ARM: dts: da850-lcdk: Correct the audio codec regulators
	ARM: dts: da850-lcdk: Correct the sound card name
	ARM: dts: kirkwood: Fix polarity of GPIO fan lines
	gpio: pl061: handle failed allocations
	drm/nouveau: Don't disable polling in fallback mode
	drm/nouveau/falcon: avoid touching registers if engine is off
	cifs: Limit memory used by lock request calls to a page
	kvm: sev: Fail KVM_SEV_INIT if already initialized
	CIFS: Do not assume one credit for async responses
	gpio: mxc: move gpio noirq suspend/resume to syscore phase
	Revert "Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G"
	Input: elan_i2c - add ACPI ID for touchpad in Lenovo V330-15ISK
	ARM: OMAP5+: Fix inverted nirq pin interrupts with irq_set_type
	perf/core: Fix impossible ring-buffer sizes warning
	perf/x86: Add check_period PMU callback
	ALSA: hda - Add quirk for HP EliteBook 840 G5
	ALSA: usb-audio: Fix implicit fb endpoint setup by quirk
	ASoC: hdmi-codec: fix oops on re-probe
	tools uapi: fix Alpha support
	riscv: Add pte bit to distinguish swap from invalid
	x86/kvm/nVMX: read from MSR_IA32_VMX_PROCBASED_CTLS2 only when it is available
	kvm: vmx: Fix entry number check for add_atomic_switch_msr()
	mmc: sunxi: Filter out unsupported modes declared in the device tree
	mmc: block: handle complete_work on separate workqueue
	Input: bma150 - register input device after setting private data
	Input: elantech - enable 3rd button support on Fujitsu CELSIUS H780
	Revert "nfsd4: return default lease period"
	Revert "mm: don't reclaim inodes with many attached pages"
	Revert "mm: slowly shrink slabs with a relatively small number of objects"
	alpha: fix page fault handling for r16-r18 targets
	alpha: Fix Eiger NR_IRQS to 128
	s390/zcrypt: fix specification exception on z196 during ap probe
	tracing/uprobes: Fix output for multiple string arguments
	x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls
	scsi: sd: fix entropy gathering for most rotational disks
	signal: Restore the stop PTRACE_EVENT_EXIT
	md/raid1: don't clear bitmap bits on interrupted recovery.
	x86/a.out: Clear the dump structure initially
	dm crypt: don't overallocate the integrity tag space
	dm thin: fix bug where bio that overwrites thin block ignores FUA
	drm: Use array_size() when creating lease
	drm/vkms: Fix license inconsistent
	drm/i915: Block fbdev HPD processing during suspend
	drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set
	mm: proc: smaps_rollup: fix pss_locked calculation
	Linux 4.19.24

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-20 10:37:09 +01:00
Eric W. Biederman
a2b3e2c0f5 signal: Restore the stop PTRACE_EVENT_EXIT
commit cf43a757fd upstream.

In the middle of do_exit() there is there is a call
"ptrace_event(PTRACE_EVENT_EXIT, code);" That call places the process
in TACKED_TRACED aka "(TASK_WAKEKILL | __TASK_TRACED)" and waits for
for the debugger to release the task or SIGKILL to be delivered.

Skipping past dequeue_signal when we know a fatal signal has already
been delivered resulted in SIGKILL remaining pending and
TIF_SIGPENDING remaining set.  This in turn caused the
scheduler to not sleep in PTACE_EVENT_EXIT as it figured
a fatal signal was pending.  This also caused ptrace_freeze_traced
in ptrace_check_attach to fail because it left a per thread
SIGKILL pending which is what fatal_signal_pending tests for.

This difference in signal state caused strace to report
strace: Exit of unknown pid NNNNN ignored

Therefore update the signal handling state like dequeue_signal
would when removing a per thread SIGKILL, by removing SIGKILL
from the per thread signal mask and clearing TIF_SIGPENDING.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Ivan Delalande <colona@arista.com>
Cc: stable@vger.kernel.org
Fixes: 35634ffa17 ("signal: Always notice exiting tasks")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20 10:25:48 +01:00
Andreas Ziegler
45649b9996 tracing/uprobes: Fix output for multiple string arguments
commit 0722069a53 upstream.

When printing multiple uprobe arguments as strings the output for the
earlier arguments would also include all later string arguments.

This is best explained in an example:

Consider adding a uprobe to a function receiving two strings as
parameters which is at offset 0xa0 in strlib.so and we want to print
both parameters when the uprobe is hit (on x86_64):

$ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \
    /sys/kernel/debug/tracing/uprobe_events

When the function is called as func("foo", "bar") and we hit the probe,
the trace file shows a line like the following:

  [...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar"

Note the extra "bar" printed as part of arg1. This behaviour stacks up
for additional string arguments.

The strings are stored in a dynamically growing part of the uprobe
buffer by fetch_store_string() after copying them from userspace via
strncpy_from_user(). The return value of strncpy_from_user() is then
directly used as the required size for the string. However, this does
not take the terminating null byte into account as the documentation
for strncpy_from_user() cleary states that it "[...] returns the
length of the string (not including the trailing NUL)" even though the
null byte will be copied to the destination.

Therefore, subsequent calls to fetch_store_string() will overwrite
the terminating null byte of the most recently fetched string with
the first character of the current string, leading to the
"accumulation" of strings in earlier arguments in the output.

Fix this by incrementing the return value of strncpy_from_user() by
one if we did not hit the maximum buffer size.

Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 5baaa59ef0 ("tracing/probes: Implement 'memory' fetch method for uprobes")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20 10:25:48 +01:00
Jiri Olsa
74cbb754d6 perf/x86: Add check_period PMU callback
commit 81ec3f3c4c upstream.

Vince (and later on Ravi) reported crashes in the BTS code during
fuzzing with the following backtrace:

  general protection fault: 0000 [#1] SMP PTI
  ...
  RIP: 0010:perf_prepare_sample+0x8f/0x510
  ...
  Call Trace:
   <IRQ>
   ? intel_pmu_drain_bts_buffer+0x194/0x230
   intel_pmu_drain_bts_buffer+0x160/0x230
   ? tick_nohz_irq_exit+0x31/0x40
   ? smp_call_function_single_interrupt+0x48/0xe0
   ? call_function_single_interrupt+0xf/0x20
   ? call_function_single_interrupt+0xa/0x20
   ? x86_schedule_events+0x1a0/0x2f0
   ? x86_pmu_commit_txn+0xb4/0x100
   ? find_busiest_group+0x47/0x5d0
   ? perf_event_set_state.part.42+0x12/0x50
   ? perf_mux_hrtimer_restart+0x40/0xb0
   intel_pmu_disable_event+0xae/0x100
   ? intel_pmu_disable_event+0xae/0x100
   x86_pmu_stop+0x7a/0xb0
   x86_pmu_del+0x57/0x120
   event_sched_out.isra.101+0x83/0x180
   group_sched_out.part.103+0x57/0xe0
   ctx_sched_out+0x188/0x240
   ctx_resched+0xa8/0xd0
   __perf_event_enable+0x193/0x1e0
   event_function+0x8e/0xc0
   remote_function+0x41/0x50
   flush_smp_call_function_queue+0x68/0x100
   generic_smp_call_function_single_interrupt+0x13/0x30
   smp_call_function_single_interrupt+0x3e/0xe0
   call_function_single_interrupt+0xf/0x20
   </IRQ>

The reason is that while event init code does several checks
for BTS events and prevents several unwanted config bits for
BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
to create BTS event without those checks being done.

Following sequence will cause the crash:

If we create an 'almost' BTS event with precise_ip and callchains,
and it into a BTS event it will crash the perf_prepare_sample()
function because precise_ip events are expected to come
in with callchain data initialized, but that's not the
case for intel_pmu_drain_bts_buffer() caller.

Adding a check_period callback to be called before the period
is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
if the event would become BTS. Plus adding also the limit_period
check as well.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190204123532.GA4794@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20 10:25:45 +01:00
Ingo Molnar
d10e77c260 perf/core: Fix impossible ring-buffer sizes warning
commit 528871b456 upstream.

The following commit:

  9dff0aa95a ("perf/core: Don't WARN() for impossible ring-buffer sizes")

results in perf recording failures with larger mmap areas:

  root@skl:/tmp# perf record -g -a
  failed to mmap with 12 (Cannot allocate memory)

The root cause is that the following condition is buggy:

	if (order_base_2(size) >= MAX_ORDER)
		goto fail;

The problem is that @size is in bytes and MAX_ORDER is in pages,
so the right test is:

	if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
		goto fail;

Fix it.

Reported-by: "Jin, Yao" <yao.jin@linux.intel.com>
Bisected-by: Borislav Petkov <bp@alien8.de>
Analyzed-by: Peter Zijlstra <peterz@infradead.org>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Fixes: 9dff0aa95a ("perf/core: Don't WARN() for impossible ring-buffer sizes")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20 10:25:44 +01:00
Greg Kroah-Hartman
0755dc9375 This is the 4.19.22 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxmZdUACgkQONu9yGCS
 aT7dtg/9G+mhds8xaPiFxqd+0np4NJ01neNeAvF5Mosf5OMC9yDgNe8iZqrfjXCG
 aJ/1mmnT+VXLihEZnjTrLuIOLF9iO83820lOwafPJWina8r6a0I2Uly+52Znv8Vz
 +Y860MQO4JncWcqBKR2En2690HgNLVv3zg3iDtYx/fzrDaq0gvtFHQFUJzjWp0yT
 vxmWMmEkEPVJfRLaRIb5Mgtmo5y4lpLeCZNj67Q/FKR0qPdTvnmqS8yS9O/CK1kR
 cdckosanno/HtbR2dhMrqyYuWmqj7JGAb4cCsvD2Y5X25ooFY/NRUSzxb6xSH1o8
 +xk7daqXZ75cnIA5DlJZJJFq5i8iUvltCQGTquk+nRXlDINwOvWxOuxNhWcEzR7Y
 pmp8A/mC/w5vat96rvpA3HmIrLLZC5ZLSYtSB1kuTDRKm7NZW1sT6u4kwPKvXJci
 S42QK4hJDmKzt+rWegu4K8kBQ8ihrOyERxtXA8xKG2VSaWvwSxrTFa/gInWrlB5W
 vswYWmul/8f6bpBIj+L+rYvJdaYFnSXhOobliCoUFK4ZQpQK8Vr/uAwPU7yamHh+
 3r40NzSpULfLHSEcSKYu4EII8hppaSosP37dHjGKET6bs4sOS2rTQVik/NfOHs3u
 fBIDIxklkBAcgOvue23KuD7xbAXONZI966qYoHExZOS5nDLlzVo=
 =tekf
 -----END PGP SIGNATURE-----

Merge 4.19.22 into android-4.19

Changes in 4.19.22
	mtd: Make sure mtd->erasesize is valid even if the partition is of size 0
	mtd: spinand: Handle the case where PROGRAM LOAD does not reset the cache
	mtd: spinand: Fix the error/cleanup path in spinand_init()
	mtd: rawnand: gpmi: fix MX28 bus master lockup problem
	libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
	tools: iio: iio_generic_buffer: make num_loops signed
	iio: adc: axp288: Fix TS-pin handling
	iio: chemical: atlas-ph-sensor: correct IIO_TEMP values to millicelsius
	iio: ti-ads8688: Update buffer allocation for timestamps
	signal: Always notice exiting tasks
	signal: Better detection of synchronous signals
	misc: vexpress: Off by one in vexpress_syscfg_exec()
	mei: me: add ice lake point device id.
	samples: mei: use /dev/mei0 instead of /dev/mei
	debugfs: fix debugfs_rename parameter checking
	pinctrl: sunxi: Correct number of IRQ banks on H6 main pin controller
	pinctrl: cherryview: fix Strago DMI workaround
	tracing: uprobes: Fix typo in pr_fmt string
	mips: cm: reprime error cause
	MIPS: OCTEON: don't set octeon_dma_bar_type if PCI is disabled
	MIPS: VDSO: Use same -m%-float cflag as the kernel proper
	mips: loongson64: remove unreachable(), fix loongson_poweroff().
	MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds
	ARM: iop32x/n2100: fix PCI IRQ mapping
	ARM: tango: Improve ARCH_MULTIPLATFORM compatibility
	ARM: dts: da850: fix interrupt numbers for clocksource
	firmware: arm_scmi: provide the mandatory device release callback
	powerpc/radix: Fix kernel crash with mremap()
	mic: vop: Fix use-after-free on remove
	mac80211: ensure that mgmt tx skbs have tailroom for encryption
	drm/modes: Prevent division by zero htotal
	drm/amd/powerplay: Fix missing break in switch
	drm/i915: always return something on DDI clock selection
	drm/vmwgfx: Fix setting of dma masks
	drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user
	SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT
	xfrm: Make set-mark default behavior backward compatible
	Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
	libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()
	xfrm: refine validation of template and selector families
	batman-adv: Avoid WARN on net_device without parent in netns
	batman-adv: Force mac header to start of data on xmit
	svcrdma: Reduce max_send_sges
	svcrdma: Remove max_sge check at connect time
	Linux 4.19.22

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-15 09:02:44 +01:00
Andreas Ziegler
7e44aab927 tracing: uprobes: Fix typo in pr_fmt string
commit ea6eb5e7d1 upstream.

The subsystem-specific message prefix for uprobes was also
"trace_kprobe: " instead of "trace_uprobe: " as described in
the original commit message.

Link: http://lkml.kernel.org/r/20190117133023.19292-1-andreas.ziegler@fau.de

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 7257634135 ("tracing/probe: Show subsystem name in messages")
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 08:10:11 +01:00
Eric W. Biederman
959e46afec signal: Better detection of synchronous signals
commit 7146db3317 upstream.

Recently syzkaller was able to create unkillablle processes by
creating a timer that is delivered as a thread local signal on SIGHUP,
and receiving SIGHUP SA_NODEFERER.  Ultimately causing a loop failing
to deliver SIGHUP but always trying.

When the stack overflows delivery of SIGHUP fails and force_sigsegv is
called.  Unfortunately because SIGSEGV is numerically higher than
SIGHUP next_signal tries again to deliver a SIGHUP.

From a quality of implementation standpoint attempting to deliver the
timer SIGHUP signal is wrong.  We should attempt to deliver the
synchronous SIGSEGV signal we just forced.

We can make that happening in a fairly straight forward manner by
instead of just looking at the signal number we also look at the
si_code.  In particular for exceptions (aka synchronous signals) the
si_code is always greater than 0.

That still has the potential to pick up a number of asynchronous
signals as in a few cases the same si_codes that are used
for synchronous signals are also used for asynchronous signals,
and SI_KERNEL is also included in the list of possible si_codes.

Still the heuristic is much better and timer signals are definitely
excluded.  Which is enough to prevent all known ways for someone
sending a process signals fast enough to cause unexpected and
arguably incorrect behavior.

Cc: stable@vger.kernel.org
Fixes: a27341cd5f ("Prioritize synchronous signals over 'normal' signals")
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 08:10:11 +01:00
Eric W. Biederman
f681f2684f signal: Always notice exiting tasks
commit 35634ffa17 upstream.

Recently syzkaller was able to create unkillablle processes by
creating a timer that is delivered as a thread local signal on SIGHUP,
and receiving SIGHUP SA_NODEFERER.  Ultimately causing a loop
failing to deliver SIGHUP but always trying.

Upon examination it turns out part of the problem is actually most of
the solution.  Since 2.5 signal delivery has found all fatal signals,
marked the signal group for death, and queued SIGKILL in every threads
thread queue relying on signal->group_exit_code to preserve the
information of which was the actual fatal signal.

The conversion of all fatal signals to SIGKILL results in the
synchronous signal heuristic in next_signal kicking in and preferring
SIGHUP to SIGKILL.  Which is especially problematic as all
fatal signals have already been transformed into SIGKILL.

Instead of dequeueing signals and depending upon SIGKILL to
be the first signal dequeued, first test if the signal group
has already been marked for death.  This guarantees that
nothing in the signal queue can prevent a process that needs
to exit from exiting.

Cc: stable@vger.kernel.org
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Ref: ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 08:10:11 +01:00
Greg Kroah-Hartman
6e0411bdc2 This is the 4.19.21 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxjFL8ACgkQONu9yGCS
 aT643xAAk+mnsrOfU6/LBFjcJUUUYohK01UAU+PRvTjy4uH8rrA4G01xvp11ftIu
 jjEikA2582ScR5a2Ww+E4SfqOdKT4z9hOGLTnyI0P4xN9jeVidvu9+C90AYyBYhi
 orHm1osVQIj6n9+OQ5db+DzZYbZLbyfCoqNXbq9EoLvNRS3FUUH0y2VXqcz9Ghcj
 obpdHTVMKRaFkRWdCglo+3hSpoKrncSVpKrwUXR18GCt8jjZjj39kI9t6UoGNfc7
 nN3GOd26U1tpGo6ShZJYu6aPjV+zoYNlsg1o2zn9qJANIdulYe30vqNhWCeJ+/T7
 WcT3EHv4pPEO3Lvgfp+l10Nc6IbYdJEFUpAP3CvfP+MvRfKvz8Vo3Nm/BQlr20+q
 +MUYJb+wxhlHPRLV192XbnYFkEzZg7vzymoMPL034XheAkOkPbOK0IIVo41p5Rai
 LxmOdvhzfAktbtD/VWnLTUbexjs2EJ05bvZRjdPKKIMBNKnAWz4ux3KcHxpdsUF8
 KMCKwpJE8KDM4uiaKVdyfMhFeIg37pmy+7Uv9cUFjWwtsL3K+CiAXg8uaSvnajKr
 bOhwbFIgxoOI9VRBK8M0wvzouphA0miVbxOY81sdfbMeWNpbCLdb968AFz25llta
 rVlWp7bSAUQTsvTkVBv6mrTKPwzO4jnfHYN8yj5gO7pr5fmC2ig=
 =L+Mp
 -----END PGP SIGNATURE-----

Merge 4.19.21 into android-4.19

Changes in 4.19.21
	devres: Align data[] to ARCH_KMALLOC_MINALIGN
	drm/bufs: Fix Spectre v1 vulnerability
	staging: iio: adc: ad7280a: handle error from __ad7280_read32()
	drm/vgem: Fix vgem_init to get drm device available.
	pinctrl: bcm2835: Use raw spinlock for RT compatibility
	ASoC: Intel: mrfld: fix uninitialized variable access
	gpiolib: Fix possible use after free on label
	drm/sun4i: Initialize registers in tcon-top driver
	genirq/affinity: Spread IRQs to all available NUMA nodes
	gpu: ipu-v3: image-convert: Prevent race between run and unprepare
	nds32: Fix gcc 8.0 compiler option incompatible.
	wil6210: fix reset flow for Talyn-mb
	wil6210: fix memory leak in wil_find_tx_bcast_2
	ath10k: assign 'n_cipher_suites' for WCN3990
	ath9k: dynack: use authentication messages for 'late' ack
	scsi: lpfc: Correct LCB RJT handling
	scsi: mpt3sas: Call sas_remove_host before removing the target devices
	scsi: lpfc: Fix LOGO/PLOGI handling when triggerd by ABTS Timeout event
	ARM: 8808/1: kexec:offline panic_smp_self_stop CPU
	clk: boston: fix possible memory leak in clk_boston_setup()
	dlm: Don't swamp the CPU with callbacks queued during recovery
	x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux)
	powerpc/pseries: add of_node_put() in dlpar_detach_node()
	crypto: aes_ti - disable interrupts while accessing S-box
	drm/vc4: ->x_scaling[1] should never be set to VC4_SCALING_NONE
	serial: fsl_lpuart: clear parity enable bit when disable parity
	ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl
	MIPS: Boston: Disable EG20T prefetch
	dpaa2-ptp: defer probe when portal allocation failed
	iwlwifi: fw: do not set sgi bits for HE connection
	staging:iio:ad2s90: Make probe handle spi_setup failure
	fpga: altera-cvp: Fix registration for CvP incapable devices
	Tools: hv: kvp: Fix a warning of buffer overflow with gcc 8.0.1
	fpga: altera-cvp: fix 'bad IO access' on x86_64
	vbox: fix link error with 'gcc -Og'
	platform/chrome: don't report EC_MKBP_EVENT_SENSOR_FIFO as wakeup
	i40e: prevent overlapping tx_timeout recover
	scsi: hisi_sas: change the time of SAS SSP connection
	staging: iio: ad7780: update voltage on read
	usbnet: smsc95xx: fix rx packet alignment
	drm/rockchip: fix for mailbox read size
	ARM: OMAP2+: hwmod: Fix some section annotations
	drm/amd/display: fix gamma not being applied correctly
	drm/amd/display: calculate stream->phy_pix_clk before clock mapping
	bpf: libbpf: retry map creation without the name
	net/mlx5: EQ, Use the right place to store/read IRQ affinity hint
	modpost: validate symbol names also in find_elf_symbol
	perf tools: Add Hygon Dhyana support
	soc/tegra: Don't leak device tree node reference
	media: rc: ensure close() is called on rc_unregister_device
	media: video-i2c: avoid accessing released memory area when removing driver
	media: mtk-vcodec: Release device nodes in mtk_vcodec_init_enc_pm()
	staging: erofs: fix the definition of DBG_BUGON
	clk: meson: meson8b: do not use cpu_div3 for cpu_scale_out_sel
	clk: meson: meson8b: fix the width of the cpu_scale_div clock
	clk: meson: meson8b: mark the CPU clock as CLK_IS_CRITICAL
	ptp: Fix pass zero to ERR_PTR() in ptp_clock_register
	dmaengine: xilinx_dma: Remove __aligned attribute on zynqmp_dma_desc_ll
	powerpc/32: Add .data..Lubsan_data*/.data..Lubsan_type* sections explicitly
	iio: adc: meson-saradc: check for devm_kasprintf failure
	iio: adc: meson-saradc: fix internal clock names
	iio: accel: kxcjk1013: Add KIOX010A ACPI Hardware-ID
	media: adv*/tc358743/ths8200: fill in min width/height/pixelclock
	ACPI: SPCR: Consider baud rate 0 as preconfigured state
	staging: pi433: fix potential null dereference
	f2fs: move dir data flush to write checkpoint process
	f2fs: fix race between write_checkpoint and write_begin
	f2fs: fix wrong return value of f2fs_acl_create
	i2c: sh_mobile: add support for r8a77990 (R-Car E3)
	arm64: io: Ensure calls to delay routines are ordered against prior readX()
	net: aquantia: return 'err' if set MPI_DEINIT state fails
	sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN
	soc: bcm: brcmstb: Don't leak device tree node reference
	nfsd4: fix crash on writing v4_end_grace before nfsd startup
	drm: Clear state->acquire_ctx before leaving drm_atomic_helper_commit_duplicated_state()
	perf: arm_spe: handle devm_kasprintf() failure
	arm64: io: Ensure value passed to __iormb() is held in a 64-bit register
	Thermal: do not clear passive state during system sleep
	thermal: Fix locking in cooling device sysfs update cur_state
	firmware/efi: Add NULL pointer checks in efivars API functions
	s390/zcrypt: improve special ap message cmd handling
	mt76x0: dfs: fix IBI_R11 configuration on non-radar channels
	arm64: ftrace: don't adjust the LR value
	drm/v3d: Fix prime imports of buffers from other drivers.
	ARM: dts: mmp2: fix TWSI2
	ARM: dts: aspeed: add missing memory unit-address
	x86/fpu: Add might_fault() to user_insn()
	media: i2c: TDA1997x: select CONFIG_HDMI
	media: DaVinci-VPBE: fix error handling in vpbe_initialize()
	smack: fix access permissions for keyring
	xtensa: xtfpga.dtsi: fix dtc warnings about SPI
	usb: dwc3: Correct the logic for checking TRB full in __dwc3_prepare_one_trb()
	usb: dwc2: Disable power down feature on Samsung SoCs
	usb: hub: delay hub autosuspend if USB3 port is still link training
	timekeeping: Use proper seqcount initializer
	usb: mtu3: fix the issue about SetFeature(U1/U2_Enable)
	clk: sunxi-ng: a33: Set CLK_SET_RATE_PARENT for all audio module clocks
	media: imx274: select REGMAP_I2C
	drm/amdgpu/powerplay: fix clock stretcher limits on polaris (v2)
	tipc: fix node keep alive interval calculation
	driver core: Move async_synchronize_full call
	kobject: return error code if writing /sys/.../uevent fails
	IB/hfi1: Unreserve a reserved request when it is completed
	usb: dwc3: trace: add missing break statement to make compiler happy
	gpio: mt7621: report failure of devm_kasprintf()
	gpio: mt7621: pass mediatek_gpio_bank_probe() failure up the stack
	pinctrl: sx150x: handle failure case of devm_kstrdup
	iommu/amd: Fix amd_iommu=force_isolation
	ARM: dts: Fix OMAP4430 SDP Ethernet startup
	mips: bpf: fix encoding bug for mm_srlv32_op
	media: coda: fix H.264 deblocking filter controls
	ARM: dts: Fix up the D-Link DIR-685 MTD partition info
	watchdog: renesas_wdt: don't set divider while watchdog is running
	ARM: dts: imx51-zii-rdu1: Do not specify "power-gpio" for hpa1
	usb: dwc3: gadget: Disable CSP for stream OUT ep
	iommu/arm-smmu-v3: Avoid memory corruption from Hisilicon MSI payloads
	iommu/arm-smmu: Add support for qcom,smmu-v2 variant
	iommu/arm-smmu-v3: Use explicit mb() when moving cons pointer
	sata_rcar: fix deferred probing
	clk: imx6sl: ensure MMDC CH0 handshake is bypassed
	platform/x86: mlx-platform: Fix tachometer registers
	cpuidle: big.LITTLE: fix refcount leak
	OPP: Use opp_table->regulators to verify no regulator case
	tee: optee: avoid possible double list_del()
	drm/msm/dsi: fix dsi clock names in DSI 10nm PLL driver
	drm/msm: dpu: Only check flush register against pending flushes
	lightnvm: pblk: fix resubmission of overwritten write err lbas
	lightnvm: pblk: add lock protection to list operations
	i2c-axxia: check for error conditions first
	phy: sun4i-usb: add support for missing USB PHY index
	mlxsw: spectrum_acl: Limit priority value
	udf: Fix BUG on corrupted inode
	switchtec: Fix SWITCHTEC_IOCTL_EVENT_IDX_ALL flags overwrite
	selftests/bpf: use __bpf_constant_htons in test_prog.c
	ARM: pxa: avoid section mismatch warning
	ASoC: fsl: Fix SND_SOC_EUKREA_TLV320 build error on i.MX8M
	KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
	mmc: bcm2835: Recover from MMC_SEND_EXT_CSD
	mmc: bcm2835: reset host on timeout
	mmc: meson-mx-sdio: check devm_kasprintf for failure
	memstick: Prevent memstick host from getting runtime suspended during card detection
	mmc: sdhci-of-esdhc: Fix timeout checks
	mmc: sdhci-omap: Fix timeout checks
	mmc: sdhci-xenon: Fix timeout checks
	mmc: jz4740: Get CD/WP GPIOs from descriptors
	usb: renesas_usbhs: add support for RZ/G2E
	btrfs: harden agaist duplicate fsid on scanned devices
	serial: sh-sci: Fix locking in sci_submit_rx()
	serial: sh-sci: Resume PIO in sci_rx_interrupt() on DMA failure
	tty: serial: samsung: Properly set flags in autoCTS mode
	perf test: Fix perf_event_attr test failure
	perf dso: Fix unchecked usage of strncpy()
	perf header: Fix unchecked usage of strncpy()
	btrfs: use tagged writepage to mitigate livelock of snapshot
	perf probe: Fix unchecked usage of strncpy()
	i2c: sh_mobile: Add support for r8a774c0 (RZ/G2E)
	bnxt_en: Disable MSIX before re-reserving NQs/CMPL rings.
	tools/power/x86/intel_pstate_tracer: Fix non root execution for post processing a trace file
	livepatch: check kzalloc return values
	arm64: KVM: Skip MMIO insn after emulation
	usb: musb: dsps: fix otg state machine
	usb: musb: dsps: fix runtime pm for peripheral mode
	perf header: Fix up argument to ctime()
	perf tools: Cast off_t to s64 to avoid warning on bionic libc
	percpu: convert spin_lock_irq to spin_lock_irqsave.
	net: hns3: fix incomplete uninitialization of IRQ in the hns3_nic_uninit_vector_data()
	drm/amd/display: Add retry to read ddc_clock pin
	Bluetooth: hci_bcm: Handle deferred probing for the clock supply
	drm/amd/display: fix YCbCr420 blank color
	powerpc/uaccess: fix warning/error with access_ok()
	mac80211: fix radiotap vendor presence bitmap handling
	xfrm6_tunnel: Fix spi check in __xfrm6_tunnel_alloc_spi
	mlxsw: spectrum: Properly cleanup LAG uppers when removing port from LAG
	scsi: smartpqi: correct host serial num for ssa
	scsi: smartpqi: correct volume status
	scsi: smartpqi: increase fw status register read timeout
	cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()
	net: hns3: add max vector number check for pf
	powerpc/perf: Fix thresholding counter data for unknown type
	iwlwifi: mvm: fix setting HE ppe FW config
	powerpc/powernv/ioda: Allocate indirect TCE levels of cached userspace addresses on demand
	mlx5: update timecounter at least twice per counter overflow
	drbd: narrow rcu_read_lock in drbd_sync_handshake
	drbd: disconnect, if the wrong UUIDs are attached on a connected peer
	drbd: skip spurious timeout (ping-timeo) when failing promote
	drbd: Avoid Clang warning about pointless switch statment
	drm/amd/display: validate extended dongle caps
	video: clps711x-fb: release disp device node in probe()
	md: fix raid10 hang issue caused by barrier
	fbdev: fbmem: behave better with small rotated displays and many CPUs
	i40e: define proper net_device::neigh_priv_len
	ice: Do not enable NAPI on q_vectors that have no rings
	igb: Fix an issue that PME is not enabled during runtime suspend
	ACPI/APEI: Clear GHES block_status before panic()
	fbdev: fbcon: Fix unregister crash when more than one framebuffer
	powerpc/mm: Fix reporting of kernel execute faults on the 8xx
	pinctrl: meson: meson8: fix the GPIO function for the GPIOAO pins
	pinctrl: meson: meson8b: fix the GPIO function for the GPIOAO pins
	KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupported
	powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
	kvm: Change offset in kvm_write_guest_offset_cached to unsigned
	NFS: nfs_compare_mount_options always compare auth flavors.
	perf build: Don't unconditionally link the libbfd feature test to -liberty and -lz
	hwmon: (lm80) fix a missing check of the status of SMBus read
	hwmon: (lm80) fix a missing check of bus read in lm80 probe
	seq_buf: Make seq_buf_puts() null-terminate the buffer
	crypto: ux500 - Use proper enum in cryp_set_dma_transfer
	crypto: ux500 - Use proper enum in hash_set_dma_transfer
	MIPS: ralink: Select CONFIG_CPU_MIPSR2_IRQ_VI on MT7620/8
	cifs: check ntwrk_buf_start for NULL before dereferencing it
	f2fs: fix use-after-free issue when accessing sbi->stat_info
	um: Avoid marking pages with "changed protection"
	niu: fix missing checks of niu_pci_eeprom_read
	f2fs: fix sbi->extent_list corruption issue
	cgroup: fix parsing empty mount option string
	perf python: Do not force closing original perf descriptor in evlist.get_pollfd()
	scripts/decode_stacktrace: only strip base path when a prefix of the path
	arch/sh/boards/mach-kfr2r09/setup.c: fix struct mtd_oob_ops build warning
	ocfs2: don't clear bh uptodate for block read
	ocfs2: improve ocfs2 Makefile
	mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
	zram: fix lockdep warning of free block handling
	isdn: hisax: hfc_pci: Fix a possible concurrency use-after-free bug in HFCPCI_l1hw()
	gdrom: fix a memory leak bug
	fsl/fman: Use GFP_ATOMIC in {memac,tgec}_add_hash_mac_address()
	block/swim3: Fix -EBUSY error when re-opening device after unmount
	thermal: bcm2835: enable hwmon explicitly
	kdb: Don't back trace on a cpu that didn't round up
	PCI: imx: Enable MSI from downstream components
	thermal: generic-adc: Fix adc to temp interpolation
	HID: lenovo: Add checks to fix of_led_classdev_register
	arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition
	kernel/hung_task.c: break RCU locks based on jiffies
	proc/sysctl: fix return error for proc_doulongvec_minmax()
	kernel/hung_task.c: force console verbose before panic
	fs/epoll: drop ovflist branch prediction
	exec: load_script: don't blindly truncate shebang string
	kernel/kcov.c: mark write_comp_data() as notrace
	scripts/gdb: fix lx-version string output
	xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat
	xfs: cancel COW blocks before swapext
	xfs: Fix error code in 'xfs_ioc_getbmap()'
	xfs: fix overflow in xfs_attr3_leaf_verify
	xfs: fix shared extent data corruption due to missing cow reservation
	xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
	xfs: delalloc -> unwritten COW fork allocation can go wrong
	fs/xfs: fix f_ffree value for statfs when project quota is set
	xfs: fix PAGE_MASK usage in xfs_free_file_space
	xfs: fix inverted return from xfs_btree_sblock_verify_crc
	thermal: hwmon: inline helpers when CONFIG_THERMAL_HWMON is not set
	dccp: fool proof ccid_hc_[rt]x_parse_options()
	enic: fix checksum validation for IPv6
	lib/test_rhashtable: Make test_insert_dup() allocate its hash table dynamically
	net: dp83640: expire old TX-skb
	net: dsa: Fix lockdep false positive splat
	net: dsa: Fix NULL checking in dsa_slave_set_eee()
	net: dsa: mv88e6xxx: Fix counting of ATU violations
	net: dsa: slave: Don't propagate flag changes on down slave interfaces
	net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
	net: systemport: Fix WoL with password after deep sleep
	rds: fix refcount bug in rds_sock_addref
	Revert "net: phy: marvell: avoid pause mode on SGMII-to-Copper for 88e151x"
	rxrpc: bad unlock balance in rxrpc_recvmsg
	sctp: check and update stream->out_curr when allocating stream_out
	sctp: walk the list of asoc safely
	skge: potential memory corruption in skge_get_regs()
	virtio_net: Account for tx bytes and packets on sending xdp_frames
	net/mlx5e: FPGA, fix Innova IPsec TX offload data path performance
	xfs: eof trim writeback mapping as soon as it is cached
	ALSA: compress: Fix stop handling on compressed capture streams
	ALSA: usb-audio: Add support for new T+A USB DAC
	ALSA: hda - Serialize codec registrations
	ALSA: hda/realtek - Fix lose hp_pins for disable auto mute
	ALSA: hda/realtek - Use a common helper for hp pin reference
	ALSA: hda/realtek - Headset microphone support for System76 darp5
	fuse: call pipe_buf_release() under pipe lock
	fuse: decrement NR_WRITEBACK_TEMP on the right page
	fuse: handle zero sized retrieve correctly
	HID: debug: fix the ring buffer implementation
	dmaengine: bcm2835: Fix interrupt race on RT
	dmaengine: bcm2835: Fix abort of transactions
	dmaengine: imx-dma: fix wrong callback invoke
	futex: Handle early deadlock return correctly
	irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID
	usb: phy: am335x: fix race condition in _probe
	usb: dwc3: gadget: Handle 0 xfer length for OUT EP
	usb: gadget: udc: net2272: Fix bitwise and boolean operations
	usb: gadget: musb: fix short isoc packets with inventra dma
	staging: speakup: fix tty-operation NULL derefs
	scsi: cxlflash: Prevent deadlock when adapter probe fails
	scsi: aic94xx: fix module loading
	KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
	kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
	KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
	cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
	perf/x86/intel/uncore: Add Node ID mask
	x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()
	perf/core: Don't WARN() for impossible ring-buffer sizes
	perf tests evsel-tp-sched: Fix bitwise operator
	serial: fix race between flush_to_ldisc and tty_open
	serial: 8250_pci: Make PCI class test non fatal
	serial: sh-sci: Do not free irqs that have already been freed
	cacheinfo: Keep the old value if of_property_read_u32 fails
	IB/hfi1: Add limit test for RC/UC send via loopback
	perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()
	ath9k: dynack: make ewma estimation faster
	ath9k: dynack: check da->enabled first in sampling routines
	Linux 4.19.21

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-12 20:37:21 +01:00
Mark Rutland
1aeeb17668 perf/core: Don't WARN() for impossible ring-buffer sizes
commit 9dff0aa95a upstream.

The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine how
large its ringbuffer mmap should be. This can be configured to arbitrary
values, which can be larger than the maximum possible allocation from
kmalloc.

When this is configured to a suitably large value (e.g. thanks to the
perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() in
__alloc_pages_nodemask():

   WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 __alloc_pages_nodemask+0x3f8/0xbc8

Let's avoid this by checking that the requested allocation is possible
before calling kzalloc.

Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12 19:47:26 +01:00
Josh Poimboeuf
97a7fa90ea cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
commit b284909aba upstream.

With the following commit:

  73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")

... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled.  However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.

The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case.  So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.

Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:

  1) /sys/devices/system/cpu/smt/control showed "on" instead of
     "notsupported"; and

  2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.

I'd propose that we instead consider #1 above to not actually be a
problem.  Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later.  So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).

The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.

So fix it by:

  a) reverting the original "fix" and its followup fix:

     73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
     bc2d8d262c ("cpu/hotplug: Fix SMT supported evaluation")

     and

  b) changing vmx_vm_init() to query the actual current SMT state --
     instead of the sysfs control value -- to determine whether the L1TF
     warning is needed.  This also requires the 'sched_smt_present'
     variable to exported, instead of 'cpu_smt_control'.

Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12 19:47:25 +01:00
Thomas Gleixner
ee73954d9a futex: Handle early deadlock return correctly
commit 1a1fb985f2 upstream.

commit 56222b212e ("futex: Drop hb->lock before enqueueing on the
rtmutex") changed the locking rules in the futex code so that the hash
bucket lock is not longer held while the waiter is enqueued into the
rtmutex wait list. This made the lock and the unlock path symmetric, but
unfortunately the possible early exit from __rt_mutex_proxy_start() due to
a detected deadlock was not updated accordingly. That allows a concurrent
unlocker to observe inconsitent state which triggers the warning in the
unlock path.

futex_lock_pi()                         futex_unlock_pi()
  lock(hb->lock)
  queue(hb_waiter)				lock(hb->lock)
  lock(rtmutex->wait_lock)
  unlock(hb->lock)
                                        // acquired hb->lock
                                        hb_waiter = futex_top_waiter()
                                        lock(rtmutex->wait_lock)
  __rt_mutex_proxy_start()
     ---> fail
          remove(rtmutex_waiter);
     ---> returns -EDEADLOCK
  unlock(rtmutex->wait_lock)
                                        // acquired wait_lock
                                        wake_futex_pi()
                                        rt_mutex_next_owner()
					  --> returns NULL
                                          --> WARN

  lock(hb->lock)
  unqueue(hb_waiter)

The problem is caused by the remove(rtmutex_waiter) in the failure case of
__rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the
hash bucket but no waiter on the rtmutex, i.e. inconsistent state.

The original commit handles this correctly for the other early return cases
(timeout, signal) by delaying the removal of the rtmutex waiter until the
returning task reacquired the hash bucket lock.

Treat the failure case of __rt_mutex_proxy_start() in the same way and let
the existing cleanup code handle the eventual handover of the rtmutex
gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter
removal for the failure case, so that the other callsites are still
operating correctly.

Add proper comments to the code so all these details are fully documented.

Thanks to Peter for helping with the analysis and writing the really
valuable code comments.

Fixes: 56222b212e ("futex: Drop hb->lock before enqueueing on the rtmutex")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Stefan Liebler <stli@linux.ibm.com>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12 19:47:24 +01:00
Anders Roxell
58e57bcbc1 kernel/kcov.c: mark write_comp_data() as notrace
[ Upstream commit 6347244316 ]

Since __sanitizer_cov_trace_const_cmp4 is marked as notrace, the
function called from __sanitizer_cov_trace_const_cmp4 shouldn't be
traceable either.  ftrace_graph_caller() gets called every time func
write_comp_data() gets called if it isn't marked 'notrace'.  This is the
backtrace from gdb:

 #0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 #1  0xffffff8010201920 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
 #2  0xffffff8010439714 in write_comp_data (type=5, arg1=0, arg2=0, ip=18446743524224276596) at ../kernel/kcov.c:116
 #3  0xffffff8010439894 in __sanitizer_cov_trace_const_cmp4 (arg1=<optimized out>, arg2=<optimized out>) at ../kernel/kcov.c:188
 #4  0xffffff8010201874 in prepare_ftrace_return (self_addr=18446743524226602768, parent=0xffffff801014b918, frame_pointer=18446743524223531344) at ./include/generated/atomic-instrumented.h:27
 #5  0xffffff801020194c in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182

Rework so that write_comp_data() that are called from
__sanitizer_cov_trace_*_cmp*() are marked as 'notrace'.

Commit 903e8ff867 ("kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace")
missed to mark write_comp_data() as 'notrace'. When that patch was
created gcc-7 was used. In lib/Kconfig.debug
config KCOV_ENABLE_COMPARISONS
	depends on $(cc-option,-fsanitize-coverage=trace-cmp)

That code path isn't hit with gcc-7. However, it were that with gcc-8.

Link: http://lkml.kernel.org/r/20181206143011.23719-1-anders.roxell@linaro.org
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:20 +01:00
Liu, Chuansheng
f0d32c54ff kernel/hung_task.c: force console verbose before panic
[ Upstream commit 168e06f793 ]

Based on commit 401c636a0e ("kernel/hung_task.c: show all hung tasks
before panic"), we could get the call stack of hung task.

However, if the console loglevel is not high, we still can not see the
useful panic information in practice, and in most cases users don't set
console loglevel to high level.

This patch is to force console verbose before system panic, so that the
real useful information can be seen in the console, instead of being
like the following, which doesn't have hung task information.

  INFO: task init:1 blocked for more than 120 seconds.
        Tainted: G     U  W         4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Kernel panic - not syncing: hung_task: blocked tasks
  CPU: 2 PID: 479 Comm: khungtaskd Tainted: G     U  W         4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
  Call Trace:
   dump_stack+0x4f/0x65
   panic+0xde/0x231
   watchdog+0x290/0x410
   kthread+0x12c/0x150
   ret_from_fork+0x35/0x40
  reboot: panic mode set: p,w
  Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A6015B675@SHSMSX101.ccr.corp.intel.com
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:19 +01:00
Cheng Lin
9beb84c027 proc/sysctl: fix return error for proc_doulongvec_minmax()
[ Upstream commit 09be178400 ]

If the number of input parameters is less than the total parameters, an
EINVAL error will be returned.

For example, we use proc_doulongvec_minmax to pass up to two parameters
with kern_table:

{
	.procname       = "monitor_signals",
	.data           = &monitor_sigs,
	.maxlen         = 2*sizeof(unsigned long),
	.mode           = 0644,
	.proc_handler   = proc_doulongvec_minmax,
},

Reproduce:

When passing two parameters, it's work normal.  But passing only one
parameter, an error "Invalid argument"(EINVAL) is returned.

  [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  1       2
  [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
  -bash: echo: write error: Invalid argument
  [root@cl150 ~]# echo $?
  1
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  3       2
  [root@cl150 ~]#

The following is the result after apply this patch.  No error is
returned when the number of input parameters is less than the total
parameters.

  [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  1       2
  [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
  [root@cl150 ~]# echo $?
  0
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  3       2
  [root@cl150 ~]#

There are three processing functions dealing with digital parameters,
__do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax.

This patch deals with __do_proc_doulongvec_minmax, just as
__do_proc_dointvec does, adding a check for parameters 'left'.  In
__do_proc_douintvec, its code implementation explicitly does not support
multiple inputs.

static int __do_proc_douintvec(...){
         ...
         /*
          * Arrays are not supported, keep this simple. *Do not* add
          * support for them.
          */
         if (vleft != 1) {
                 *lenp = 0;
                 return -EINVAL;
         }
         ...
}

So, just __do_proc_doulongvec_minmax has the problem.  And most use of
proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one
parameter.

Link: http://lkml.kernel.org/r/1544081775-15720-1-git-send-email-cheng.lin130@zte.com.cn
Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:19 +01:00
Tetsuo Handa
9c8939b03b kernel/hung_task.c: break RCU locks based on jiffies
[ Upstream commit 304ae42739 ]

check_hung_uninterruptible_tasks() is currently calling rcu_lock_break()
for every 1024 threads.  But check_hung_task() is very slow if printk()
was called, and is very fast otherwise.

If many threads within some 1024 threads called printk(), the RCU grace
period might be extended enough to trigger RCU stall warnings.
Therefore, calling rcu_lock_break() for every some fixed jiffies will be
safer.

Link: http://lkml.kernel.org/r/1544800658-11423-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:19 +01:00
Douglas Anderson
3818c29a65 kdb: Don't back trace on a cpu that didn't round up
[ Upstream commit 162bc7f5af ]

If you have a CPU that fails to round up and then run 'btc' you'll end
up crashing in kdb becaue we dereferenced NULL.  Let's add a check.
It's wise to also set the task to NULL when leaving the debugger so
that if we fail to round up on a later entry into the debugger we
won't backtrace a stale task.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:19 +01:00
Ondrej Mosnacek
4b5abffd63 cgroup: fix parsing empty mount option string
[ Upstream commit e250d91d65 ]

This fixes the case where all mount options specified are consumed by an
LSM and all that's left is an empty string. In this case cgroupfs should
accept the string and not fail.

How to reproduce (with SELinux enabled):

    # umount /sys/fs/cgroup/unified
    # mount -o context=system_u:object_r:cgroup_t:s0 -t cgroup2 cgroup2 /sys/fs/cgroup/unified
    mount: /sys/fs/cgroup/unified: wrong fs type, bad option, bad superblock on cgroup2, missing codepage or helper program, or other error.
    # dmesg | tail -n 1
    [   31.575952] cgroup: cgroup2: unknown option ""

Fixes: 67e9c74b8a ("cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type")
[NOTE: should apply on top of commit 5136f6365c ("cgroup: implement "nsdelegate" mount option"), older versions need manual rebase]
Suggested-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:17 +01:00
Peter Rajnoha
f7debeebcd kobject: return error code if writing /sys/.../uevent fails
[ Upstream commit df44b47965 ]

Propagate error code back to userspace if writing the /sys/.../uevent
file fails. Before, the write operation always returned with success,
even if we failed to recognize the input string or if we failed to
generate the uevent itself.

With the error codes properly propagated back to userspace, we are
able to react in userspace accordingly by not assuming and awaiting
a uevent that is not delivered.

Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:06 +01:00
Bart Van Assche
0105d80dd1 timekeeping: Use proper seqcount initializer
[ Upstream commit ce10a5b395 ]

tk_core.seq is initialized open coded, but that misses to initialize the
lockdep map when lockdep is enabled. Lockdep splats involving tk_core seq
consequently lack a name and are hard to read.

Use the proper initializer which takes care of the lockdep map
initialization.

[ tglx: Massaged changelog ]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: tj@kernel.org
Cc: johannes.berg@intel.com
Link: https://lkml.kernel.org/r/20181128234325.110011-12-bvanassche@acm.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:05 +01:00
Long Li
46ed4f4fa1 genirq/affinity: Spread IRQs to all available NUMA nodes
[ Upstream commit b825921990 ]

If the number of NUMA nodes exceeds the number of MSI/MSI-X interrupts
which are allocated for a device, the interrupt affinity spreading code
fails to spread them across all nodes.

The reason is, that the spreading code starts from node 0 and continues up
to the number of interrupts requested for allocation. This leaves the nodes
past the last interrupt unused.

This results in interrupt concentration on the first nodes which violates
the assumption of the block layer that all nodes are covered evenly. As a
consequence the NUMA nodes above the number of interrupts are all assigned
to hardware queue 0 and therefore NUMA node 0, which results in bad
performance and has CPU hotplug implications, because queue 0 gets shut
down when the last CPU of node 0 is offlined.

Go over all NUMA nodes and assign them round-robin to all requested
interrupts to solve this.

[ tglx: Massaged changelog ]

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Link: https://lkml.kernel.org/r/20181102180248.13583-1-longli@linuxonhyperv.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:46:57 +01:00
Greg Kroah-Hartman
c28f73fe42 This is the 4.19.20 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxbC5gACgkQONu9yGCS
 aT4DYQ//Uqm/Q63KQuExgd7W+61FoP4NFHlYXZ31B5Rkydryyk2K5P2ONSdVd5n9
 k3wjzRrxvlPvjOwbh9PHv+pLkGxBDqpT1X8IAXPe36bYUkvXoH71BE4YSRPRUJAf
 sdzw/vs7WE/Kx41iT3SXiQih8ok0y3LoACBKmUsEXoLI1cJZCUnnSFpP++QNe1Iz
 B/y04BigL8R7OWR/jow6OPWe9uXOI8iEe9QKVX26g4oaakzly4vkp6OwROSwM31q
 0wut8jF/AtDcZpZXjJLjDCj10k5DRN8jwGcLD7iZeIKqexOabjUrsvfIHfbpUtXr
 e7pJw2aUM8BFb8Ba2lsB7gkqvdHQohqVKQE4Qy59aPyesm2G5miH4gAbncoixjCa
 u3eQV5ACpFLksUFR4RAMKq+10k7swsutyyJr5vG4qdbRpcTCNJirEwAGGqgI6IEP
 SDqtw6u8gMP8+SicwA9p71Wwntcq9RR6fx0gX/3wi2DQp6F8Txem00SqaciE7uQ1
 uIOUrhcpWzIq4m58SGhgTSQcBkm5qBD5S154/xRKIo0mvME+NwBub/x3fIsixN/u
 AzWQmQPXBajHbYXbKGC7t2jNHkU5d9FedZ4iDmJk/+ZZsWyFByY1bH1cg4Qnq89e
 tDxL114YmSujbZD/mFlbGWcqdmGNT355BmyetKDx6w0rNiU/RBU=
 =oprJ
 -----END PGP SIGNATURE-----

Merge 4.19.20 into android-4.19

Changes in 4.19.20
	Fix "net: ipv4: do not handle duplicate fragments as overlapping"
	drm/msm/gpu: fix building without debugfs
	ipv6: Consider sk_bound_dev_if when binding a socket to an address
	ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
	ipvlan, l3mdev: fix broken l3s mode wrt local routes
	l2tp: copy 4 more bytes to linear part if necessary
	l2tp: fix reading optional fields of L2TPv3
	net: ip_gre: always reports o_key to userspace
	net: ip_gre: use erspan key field for tunnel lookup
	net/mlx4_core: Add masking for a few queries on HCA caps
	netrom: switch to sock timer API
	net/rose: fix NULL ax25_cb kernel panic
	net: set default network namespace in init_dummy_netdev()
	ravb: expand rx descriptor data to accommodate hw checksum
	sctp: improve the events for sctp stream reset
	tun: move the call to tun_set_real_num_queues
	ucc_geth: Reset BQL queue when stopping device
	vhost: fix OOB in get_rx_bufs()
	net: ip6_gre: always reports o_key to userspace
	sctp: improve the events for sctp stream adding
	net/mlx5e: Allow MAC invalidation while spoofchk is ON
	ip6mr: Fix notifiers call on mroute_clean_tables()
	Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
	sctp: set chunk transport correctly when it's a new asoc
	sctp: set flow sport from saddr only when it's 0
	virtio_net: Don't enable NAPI when interface is down
	virtio_net: Don't call free_old_xmit_skbs for xdp_frames
	virtio_net: Fix not restoring real_num_rx_queues
	virtio_net: Fix out of bounds access of sq
	virtio_net: Don't process redirected XDP frames when XDP is disabled
	virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
	virtio_net: Differentiate sk_buff and xdp_frame on freeing
	CIFS: Do not count -ENODATA as failure for query directory
	CIFS: Fix trace command logging for SMB2 reads and writes
	CIFS: Do not consider -ENODATA as stat failure for reads
	fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
	iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
	selftests/seccomp: Enhance per-arch ptrace syscall skip tests
	NFS: Fix up return value on fatal errors in nfs_page_async_flush()
	ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
	arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
	arm64: Do not issue IPIs for user executable ptes
	arm64: hyp-stub: Forbid kprobing of the hyp-stub
	arm64: hibernate: Clean the __hyp_text to PoC after resume
	gpio: altera-a10sr: Set proper output level for direction_output
	gpiolib: fix line event timestamps for nested irqs
	gpio: pcf857x: Fix interrupts on multiple instances
	gpio: sprd: Fix the incorrect data register
	gpio: sprd: Fix incorrect irq type setting for the async EIC
	gfs2: Revert "Fix loop in gfs2_rbm_find"
	mmc: bcm2835: Fix DMA channel leak on probe error
	mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay
	ALSA: usb-audio: Add Opus #3 to quirks for native DSD support
	ALSA: hda/realtek - Fixed hp_pin no value
	IB/hfi1: Remove overly conservative VM_EXEC flag check
	platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
	platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
	mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
	Btrfs: fix deadlock when allocating tree block during leaf/node split
	btrfs: On error always free subvol_name in btrfs_mount
	kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
	mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
	oom, oom_reaper: do not enqueue same task twice
	mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
	mm, oom: fix use-after-free in oom_kill_process
	mm: hwpoison: use do_send_sig_info() instead of force_sig()
	mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
	of: Convert to using %pOFn instead of device_node.name
	of: overlay: add tests to validate kfrees from overlay removal
	of: overlay: add missing of_node_get() in __of_attach_node_sysfs
	of: overlay: use prop add changeset entry for property in new nodes
	of: overlay: do not duplicate properties from overlay for new nodes
	md/raid5: fix 'out of memory' during raid cache recovery
	cifs: Always resolve hostname before reconnecting
	Linux 4.19.20

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-07 08:40:17 +01:00
Andrei Vagin
c7122344f9 kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
commit 8fb335e078 upstream.

Currently, exit_ptrace() adds all ptraced tasks in a dead list, then
zap_pid_ns_processes() waits on all tasks in a current pidns, and only
then are tasks from the dead list released.

zap_pid_ns_processes() can get stuck on waiting tasks from the dead
list.  In this case, we will have one unkillable process with one or
more dead children.

Thanks to Oleg for the advice to release tasks in find_child_reaper().

Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com
Fixes: 7c8bd2322c ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:14 +01:00
Greg Kroah-Hartman
18ba00a34e This is the 4.19.19 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxSoGgACgkQONu9yGCS
 aT5IShAAmr4efAHaepYtsEM5q7hpYwdIXauhfmVzYS1KITQ0L+FSO6b6r18/X/Xu
 ZnGr87x9s7rzTwbgW9EK+cPBE7tVI3Tlem1prTSAQLneiv2zN/iTvJigWGUmifB/
 +xldQxiM1S8j6dlVTu0lFl9P8voJ/zFA1II1DV4KJYiPX2lfVis77Tolcd/3TUJ4
 V67abXHeAsLc+bU8kcMxamVievyQndwVlMT4XjStJyl6xy1zDozgiNwphyLqT7yc
 GY0H4jCbFNLfhZlMQgpHanvXzHshbJ4VtMNnjmUApplftzrVf864rgi+sRcoHWK/
 Q6ER8LtgFYoqwG1ZjLUIEMChjhs/Xv+FLHWsCvCIkINyzo0PSgubnbYQVccXkBmB
 XSZ62YTh9sdQtdihWUly0Gr53yMvSn9+ndwJjvBDEErjC9b/D9jOLMcY1L/cCDaQ
 J34dp6ES+6YdQjAu0TEwuuJHMdGR+BvtKgGIL7V6ujxZhuU39eaJ0QLZktsmIO12
 qWQyUKaLA450Qqiqza7foTWAVM6nFu9fd9xsZUKZV6lNnFjKYy/vWjhHsMCR8eKi
 aojGiRVRNnrlIwgk9h7H4EiuRkK3CJFc9jyZhA6u05NLMdIjD9YuLaMLHuSN0R+m
 lSgdmM3dVC4gyosKZHGdwzX/ytHTQiA8QRb4SWEnck7piKyWplg=
 =/rR0
 -----END PGP SIGNATURE-----

Merge 4.19.19 into android-4.19

Changes in 4.19.19
	amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs
	net: bridge: Fix ethernet header pointer before check skb forwardable
	net: Fix usage of pskb_trim_rcsum
	net: phy: marvell: Errata for mv88e6390 internal PHYs
	net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling
	net/sched: act_tunnel_key: fix memory leak in case of action replace
	net_sched: refetch skb protocol for each filter
	openvswitch: Avoid OOB read when parsing flow nlattrs
	vhost: log dirty page correctly
	mlxsw: pci: Increase PCI SW reset timeout
	net: ipv4: Fix memory leak in network namespace dismantle
	mlxsw: spectrum_fid: Update dummy FID index
	mlxsw: pci: Ring CQ's doorbell before RDQ's
	net/sched: cls_flower: allocate mask dynamically in fl_change()
	udp: with udp_segment release on error path
	ip6_gre: fix tunnel list corruption for x-netns
	erspan: build the header with the right proto according to erspan_ver
	net: phy: marvell: Fix deadlock from wrong locking
	ip6_gre: update version related info when changing link
	tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state
	mei: me: mark LBG devices as having dma support
	mei: me: add denverton innovation engine device IDs
	USB: leds: fix regression in usbport led trigger
	USB: serial: simple: add Motorola Tetra TPG2200 device id
	USB: serial: pl2303: add new PID to support PL2303TB
	ceph: clear inode pointer when snap realm gets dropped by its inode
	ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages
	ASoC: rt5514-spi: Fix potential NULL pointer dereference
	ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
	clk: socfpga: stratix10: fix rate calculation for pll clocks
	clk: socfpga: stratix10: fix naming convention for the fixed-clocks
	inotify: Fix fd refcount leak in inotify_add_watch().
	ALSA: hda/realtek - Fix typo for ALC225 model
	ALSA: hda - Add mute LED support for HP ProBook 470 G5
	ARCv2: lib: memeset: fix doing prefetchw outside of buffer
	ARC: adjust memblock_reserve of kernel memory
	ARC: perf: map generic branches to correct hardware condition
	s390/mm: always force a load of the primary ASCE on context switch
	s390/early: improve machine detection
	s390/smp: fix CPU hotplug deadlock with CPU rescan
	misc: ibmvsm: Fix potential NULL pointer dereference
	char/mwave: fix potential Spectre v1 vulnerability
	mmc: dw_mmc-bluefield: : Fix the license information
	mmc: meson-gx: Free irq in release() callback
	staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1
	tty: Handle problem if line discipline does not have receive_buf
	uart: Fix crash in uart_write and uart_put_char
	tty/n_hdlc: fix __might_sleep warning
	hv_balloon: avoid touching uninitialized struct page during tail onlining
	Drivers: hv: vmbus: Check for ring when getting debug info
	vgacon: unconfuse vc_origin when using soft scrollback
	CIFS: Fix possible hang during async MTU reads and writes
	CIFS: Fix credits calculations for reads with errors
	CIFS: Fix credit calculation for encrypted reads with errors
	CIFS: Do not reconnect TCP session in add_credits()
	smb3: add credits we receive from oplock/break PDUs
	Input: xpad - add support for SteelSeries Stratus Duo
	Input: input_event - provide override for sparc64
	Input: uinput - fix undefined behavior in uinput_validate_absinfo()
	acpi/nfit: Block function zero DSMs
	acpi/nfit: Fix command-supported detection
	scsi: ufs: Use explicit access size in ufshcd_dump_regs
	dm thin: fix passdown_double_checking_shared_status()
	dm crypt: fix parsing of extended IV arguments
	drm/amdgpu: Add APTX quirk for Lenovo laptop
	KVM: x86: Fix single-step debugging
	KVM: x86: Fix PV IPIs for 32-bit KVM host
	KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
	kvm: x86/vmx: Use kzalloc for cached_vmcs12
	KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
	x86/pkeys: Properly copy pkey state at fork()
	x86/selftests/pkeys: Fork() to check for state being preserved
	x86/kaslr: Fix incorrect i8254 outb() parameters
	x86/entry/64/compat: Fix stack switching for XEN PV
	posix-cpu-timers: Unbreak timer rearming
	net: sun: cassini: Cleanup license conflict
	irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
	can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it
	can: bcm: check timer values before ktime conversion
	can: flexcan: fix NULL pointer exception during bringup
	vt: make vt_console_print() compatible with the unicode screen buffer
	vt: always call notifier with the console lock held
	vt: invoke notifier on screen size change
	drm/meson: Fix atomic mode switching regression
	bpf: improve verifier branch analysis
	bpf: add per-insn complexity limit
	bpf: move {prev_,}insn_idx into verifier env
	bpf: move tmp variable into ax register in interpreter
	bpf: enable access to ax register also from verifier rewrite
	bpf: restrict map value pointer arithmetic for unprivileged
	bpf: restrict stack pointer arithmetic for unprivileged
	bpf: restrict unknown scalars of mixed signed bounds for unprivileged
	bpf: fix check_map_access smin_value test when pointer contains offset
	bpf: prevent out of bounds speculation on pointer arithmetic
	bpf: fix sanitation of alu op with pointer / scalar type from different paths
	bpf: fix inner map masking to prevent oob under speculation
	s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
	nvmet-rdma: Add unlikely for response allocated check
	nvmet-rdma: fix null dereference under heavy load
	Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
	usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
	ide: fix a typo in the settings proc file name
	Input: input_event - fix the CONFIG_SPARC64 mixup
	Linux 4.19.19

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-31 08:29:40 +01:00
Daniel Borkmann
37c9e3ee42 bpf: fix inner map masking to prevent oob under speculation
[ commit 9d5564ddcf upstream ]

During review I noticed that inner meta map setup for map in
map is buggy in that it does not propagate all needed data
from the reference map which the verifier is later accessing.

In particular one such case is index masking to prevent out of
bounds access under speculative execution due to missing the
map's unpriv_array/index_mask field propagation. Fix this such
that the verifier is generating the correct code for inlined
lookups in case of unpriviledged use.

Before patch (test_verifier's 'map in map access' dump):

  # bpftool prog dump xla id 3
     0: (62) *(u32 *)(r10 -4) = 0
     1: (bf) r2 = r10
     2: (07) r2 += -4
     3: (18) r1 = map[id:4]
     5: (07) r1 += 272                |
     6: (61) r0 = *(u32 *)(r2 +0)     |
     7: (35) if r0 >= 0x1 goto pc+6   | Inlined map in map lookup
     8: (54) (u32) r0 &= (u32) 0      | with index masking for
     9: (67) r0 <<= 3                 | map->unpriv_array.
    10: (0f) r0 += r1                 |
    11: (79) r0 = *(u64 *)(r0 +0)     |
    12: (15) if r0 == 0x0 goto pc+1   |
    13: (05) goto pc+1                |
    14: (b7) r0 = 0                   |
    15: (15) if r0 == 0x0 goto pc+11
    16: (62) *(u32 *)(r10 -4) = 0
    17: (bf) r2 = r10
    18: (07) r2 += -4
    19: (bf) r1 = r0
    20: (07) r1 += 272                |
    21: (61) r0 = *(u32 *)(r2 +0)     | Index masking missing (!)
    22: (35) if r0 >= 0x1 goto pc+3   | for inner map despite
    23: (67) r0 <<= 3                 | map->unpriv_array set.
    24: (0f) r0 += r1                 |
    25: (05) goto pc+1                |
    26: (b7) r0 = 0                   |
    27: (b7) r0 = 0
    28: (95) exit

After patch:

  # bpftool prog dump xla id 1
     0: (62) *(u32 *)(r10 -4) = 0
     1: (bf) r2 = r10
     2: (07) r2 += -4
     3: (18) r1 = map[id:2]
     5: (07) r1 += 272                |
     6: (61) r0 = *(u32 *)(r2 +0)     |
     7: (35) if r0 >= 0x1 goto pc+6   | Same inlined map in map lookup
     8: (54) (u32) r0 &= (u32) 0      | with index masking due to
     9: (67) r0 <<= 3                 | map->unpriv_array.
    10: (0f) r0 += r1                 |
    11: (79) r0 = *(u64 *)(r0 +0)     |
    12: (15) if r0 == 0x0 goto pc+1   |
    13: (05) goto pc+1                |
    14: (b7) r0 = 0                   |
    15: (15) if r0 == 0x0 goto pc+12
    16: (62) *(u32 *)(r10 -4) = 0
    17: (bf) r2 = r10
    18: (07) r2 += -4
    19: (bf) r1 = r0
    20: (07) r1 += 272                |
    21: (61) r0 = *(u32 *)(r2 +0)     |
    22: (35) if r0 >= 0x1 goto pc+4   | Now fixed inlined inner map
    23: (54) (u32) r0 &= (u32) 0      | lookup with proper index masking
    24: (67) r0 <<= 3                 | for map->unpriv_array.
    25: (0f) r0 += r1                 |
    26: (05) goto pc+1                |
    27: (b7) r0 = 0                   |
    28: (b7) r0 = 0
    29: (95) exit

Fixes: b2157399cc ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:41 +01:00
Daniel Borkmann
eed84f94ff bpf: fix sanitation of alu op with pointer / scalar type from different paths
[ commit d3bd7413e0 upstream ]

While 979d63d50c ("bpf: prevent out of bounds speculation on pointer
arithmetic") took care of rejecting alu op on pointer when e.g. pointer
came from two different map values with different map properties such as
value size, Jann reported that a case was not covered yet when a given
alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from
different branches where we would incorrectly try to sanitize based
on the pointer's limit. Catch this corner case and reject the program
instead.

Fixes: 979d63d50c ("bpf: prevent out of bounds speculation on pointer arithmetic")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:41 +01:00
Daniel Borkmann
f92a819b4c bpf: prevent out of bounds speculation on pointer arithmetic
[ commit 979d63d50c upstream ]

Jann reported that the original commit back in b2157399cc
("bpf: prevent out-of-bounds speculation") was not sufficient
to stop CPU from speculating out of bounds memory access:
While b2157399cc only focussed on masking array map access
for unprivileged users for tail calls and data access such
that the user provided index gets sanitized from BPF program
and syscall side, there is still a more generic form affected
from BPF programs that applies to most maps that hold user
data in relation to dynamic map access when dealing with
unknown scalars or "slow" known scalars as access offset, for
example:

  - Load a map value pointer into R6
  - Load an index into R7
  - Do a slow computation (e.g. with a memory dependency) that
    loads a limit into R8 (e.g. load the limit from a map for
    high latency, then mask it to make the verifier happy)
  - Exit if R7 >= R8 (mispredicted branch)
  - Load R0 = R6[R7]
  - Load R0 = R6[R0]

For unknown scalars there are two options in the BPF verifier
where we could derive knowledge from in order to guarantee
safe access to the memory: i) While </>/<=/>= variants won't
allow to derive any lower or upper bounds from the unknown
scalar where it would be safe to add it to the map value
pointer, it is possible through ==/!= test however. ii) another
option is to transform the unknown scalar into a known scalar,
for example, through ALU ops combination such as R &= <imm>
followed by R |= <imm> or any similar combination where the
original information from the unknown scalar would be destroyed
entirely leaving R with a constant. The initial slow load still
precedes the latter ALU ops on that register, so the CPU
executes speculatively from that point. Once we have the known
scalar, any compare operation would work then. A third option
only involving registers with known scalars could be crafted
as described in [0] where a CPU port (e.g. Slow Int unit)
would be filled with many dependent computations such that
the subsequent condition depending on its outcome has to wait
for evaluation on its execution port and thereby executing
speculatively if the speculated code can be scheduled on a
different execution port, or any other form of mistraining
as described in [1], for example. Given this is not limited
to only unknown scalars, not only map but also stack access
is affected since both is accessible for unprivileged users
and could potentially be used for out of bounds access under
speculation.

In order to prevent any of these cases, the verifier is now
sanitizing pointer arithmetic on the offset such that any
out of bounds speculation would be masked in a way where the
pointer arithmetic result in the destination register will
stay unchanged, meaning offset masked into zero similar as
in array_index_nospec() case. With regards to implementation,
there are three options that were considered: i) new insn
for sanitation, ii) push/pop insn and sanitation as inlined
BPF, iii) reuse of ax register and sanitation as inlined BPF.

Option i) has the downside that we end up using from reserved
bits in the opcode space, but also that we would require
each JIT to emit masking as native arch opcodes meaning
mitigation would have slow adoption till everyone implements
it eventually which is counter-productive. Option ii) and iii)
have both in common that a temporary register is needed in
order to implement the sanitation as inlined BPF since we
are not allowed to modify the source register. While a push /
pop insn in ii) would be useful to have in any case, it
requires once again that every JIT needs to implement it
first. While possible, amount of changes needed would also
be unsuitable for a -stable patch. Therefore, the path which
has fewer changes, less BPF instructions for the mitigation
and does not require anything to be changed in the JITs is
option iii) which this work is pursuing. The ax register is
already mapped to a register in all JITs (modulo arm32 where
it's mapped to stack as various other BPF registers there)
and used in constant blinding for JITs-only so far. It can
be reused for verifier rewrites under certain constraints.
The interpreter's tmp "register" has therefore been remapped
into extending the register set with hidden ax register and
reusing that for a number of instructions that needed the
prior temporary variable internally (e.g. div, mod). This
allows for zero increase in stack space usage in the interpreter,
and enables (restricted) generic use in rewrites otherwise as
long as such a patchlet does not make use of these instructions.
The sanitation mask is dynamic and relative to the offset the
map value or stack pointer currently holds.

There are various cases that need to be taken under consideration
for the masking, e.g. such operation could look as follows:
ptr += val or val += ptr or ptr -= val. Thus, the value to be
sanitized could reside either in source or in destination
register, and the limit is different depending on whether
the ALU op is addition or subtraction and depending on the
current known and bounded offset. The limit is derived as
follows: limit := max_value_size - (smin_value + off). For
subtraction: limit := umax_value + off. This holds because
we do not allow any pointer arithmetic that would
temporarily go out of bounds or would have an unknown
value with mixed signed bounds where it is unclear at
verification time whether the actual runtime value would
be either negative or positive. For example, we have a
derived map pointer value with constant offset and bounded
one, so limit based on smin_value works because the verifier
requires that statically analyzed arithmetic on the pointer
must be in bounds, and thus it checks if resulting
smin_value + off and umax_value + off is still within map
value bounds at time of arithmetic in addition to time of
access. Similarly, for the case of stack access we derive
the limit as follows: MAX_BPF_STACK + off for subtraction
and -off for the case of addition where off := ptr_reg->off +
ptr_reg->var_off.value. Subtraction is a special case for
the masking which can be in form of ptr += -val, ptr -= -val,
or ptr -= val. In the first two cases where we know that
the value is negative, we need to temporarily negate the
value in order to do the sanitation on a positive value
where we later swap the ALU op, and restore original source
register if the value was in source.

The sanitation of pointer arithmetic alone is still not fully
sufficient as is, since a scenario like the following could
happen ...

  PTR += 0x1000 (e.g. K-based imm)
  PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
  PTR += 0x1000
  PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
  [...]

... which under speculation could end up as ...

  PTR += 0x1000
  PTR -= 0 [ truncated by mitigation ]
  PTR += 0x1000
  PTR -= 0 [ truncated by mitigation ]
  [...]

... and therefore still access out of bounds. To prevent such
case, the verifier is also analyzing safety for potential out
of bounds access under speculative execution. Meaning, it is
also simulating pointer access under truncation. We therefore
"branch off" and push the current verification state after the
ALU operation with known 0 to the verification stack for later
analysis. Given the current path analysis succeeded it is
likely that the one under speculation can be pruned. In any
case, it is also subject to existing complexity limits and
therefore anything beyond this point will be rejected. In
terms of pruning, it needs to be ensured that the verification
state from speculative execution simulation must never prune
a non-speculative execution path, therefore, we mark verifier
state accordingly at the time of push_stack(). If verifier
detects out of bounds access under speculative execution from
one of the possible paths that includes a truncation, it will
reject such program.

Given we mask every reg-based pointer arithmetic for
unprivileged programs, we've been looking into how it could
affect real-world programs in terms of size increase. As the
majority of programs are targeted for privileged-only use
case, we've unconditionally enabled masking (with its alu
restrictions on top of it) for privileged programs for the
sake of testing in order to check i) whether they get rejected
in its current form, and ii) by how much the number of
instructions and size will increase. We've tested this by
using Katran, Cilium and test_l4lb from the kernel selftests.
For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
we've used test_l4lb.o as well as test_l4lb_noinline.o. We
found that none of the programs got rejected by the verifier
with this change, and that impact is rather minimal to none.
balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
7,797 bytes JITed before and after the change. Most complex
program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
and 18,538 bytes JITed before and after and none of the other
tail call programs in bpf_lxc.o had any changes either. For
the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
after the change. Other programs from that object file had
similar small increase. Both test_l4lb.o had no change and
remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
JITed and for test_l4lb_noinline.o constant at 5,080 bytes
(634 insns) xlated and 3,313 bytes JITed. This can be explained
in that LLVM typically optimizes stack based pointer arithmetic
by using K-based operations and that use of dynamic map access
is not overly frequent. However, in future we may decide to
optimize the algorithm further under known guarantees from
branch and value speculation. Latter seems also unclear in
terms of prediction heuristics that today's CPUs apply as well
as whether there could be collisions in e.g. the predictor's
Value History/Pattern Table for triggering out of bounds access,
thus masking is performed unconditionally at this point but could
be subject to relaxation later on. We were generally also
brainstorming various other approaches for mitigation, but the
blocker was always lack of available registers at runtime and/or
overhead for runtime tracking of limits belonging to a specific
pointer. Thus, we found this to be minimally intrusive under
given constraints.

With that in place, a simple example with sanitized access on
unprivileged load at post-verification time looks as follows:

  # bpftool prog dump xlated id 282
  [...]
  28: (79) r1 = *(u64 *)(r7 +0)
  29: (79) r2 = *(u64 *)(r7 +8)
  30: (57) r1 &= 15
  31: (79) r3 = *(u64 *)(r0 +4608)
  32: (57) r3 &= 1
  33: (47) r3 |= 1
  34: (2d) if r2 > r3 goto pc+19
  35: (b4) (u32) r11 = (u32) 20479  |
  36: (1f) r11 -= r2                | Dynamic sanitation for pointer
  37: (4f) r11 |= r2                | arithmetic with registers
  38: (87) r11 = -r11               | containing bounded or known
  39: (c7) r11 s>>= 63              | scalars in order to prevent
  40: (5f) r11 &= r2                | out of bounds speculation.
  41: (0f) r4 += r11                |
  42: (71) r4 = *(u8 *)(r4 +0)
  43: (6f) r4 <<= r1
  [...]

For the case where the scalar sits in the destination register
as opposed to the source register, the following code is emitted
for the above example:

  [...]
  16: (b4) (u32) r11 = (u32) 20479
  17: (1f) r11 -= r2
  18: (4f) r11 |= r2
  19: (87) r11 = -r11
  20: (c7) r11 s>>= 63
  21: (5f) r2 &= r11
  22: (0f) r2 += r0
  23: (61) r0 = *(u32 *)(r2 +0)
  [...]

JIT blinding example with non-conflicting use of r10:

  [...]
   d5:	je     0x0000000000000106    _
   d7:	mov    0x0(%rax),%edi       |
   da:	mov    $0xf153246,%r10d     | Index load from map value and
   e0:	xor    $0xf153259,%r10      | (const blinded) mask with 0x1f.
   e7:	and    %r10,%rdi            |_
   ea:	mov    $0x2f,%r10d          |
   f0:	sub    %rdi,%r10            | Sanitized addition. Both use r10
   f3:	or     %rdi,%r10            | but do not interfere with each
   f6:	neg    %r10                 | other. (Neither do these instructions
   f9:	sar    $0x3f,%r10           | interfere with the use of ax as temp
   fd:	and    %r10,%rdi            | in interpreter.)
  100:	add    %rax,%rdi            |_
  103:	mov    0x0(%rdi),%eax
 [...]

Tested that it fixes Jann's reproducer, and also checked that test_verifier
and test_progs suite with interpreter, JIT and JIT with hardening enabled
on x86-64 and arm64 runs successfully.

  [0] Speculose: Analyzing the Security Implications of Speculative
      Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
      https://arxiv.org/pdf/1801.04084.pdf

  [1] A Systematic Evaluation of Transient Execution Attacks and
      Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
      Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
      Dmitry Evtyushkin, Daniel Gruss,
      https://arxiv.org/pdf/1811.05441.pdf

Fixes: b2157399cc ("bpf: prevent out-of-bounds speculation")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:41 +01:00
Daniel Borkmann
4f7f708d0e bpf: fix check_map_access smin_value test when pointer contains offset
[ commit b7137c4eab upstream ]

In check_map_access() we probe actual bounds through __check_map_access()
with offset of reg->smin_value + off for lower bound and offset of
reg->umax_value + off for the upper bound. However, even though the
reg->smin_value could have a negative value, the final result of the
sum with off could be positive when pointer arithmetic with known and
unknown scalars is combined. In this case we reject the program with
an error such as "R<x> min value is negative, either use unsigned index
or do a if (index >=0) check." even though the access itself would be
fine. Therefore extend the check to probe whether the actual resulting
reg->smin_value + off is less than zero.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:41 +01:00
Daniel Borkmann
44f8fc6499 bpf: restrict unknown scalars of mixed signed bounds for unprivileged
[ commit 9d7eceede7 upstream ]

For unknown scalars of mixed signed bounds, meaning their smin_value is
negative and their smax_value is positive, we need to reject arithmetic
with pointer to map value. For unprivileged the goal is to mask every
map pointer arithmetic and this cannot reliably be done when it is
unknown at verification time whether the scalar value is negative or
positive. Given this is a corner case, the likelihood of breaking should
be very small.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:41 +01:00
Daniel Borkmann
5332dda94f bpf: restrict stack pointer arithmetic for unprivileged
[ commit e4298d2583 upstream ]

Restrict stack pointer arithmetic for unprivileged users in that
arithmetic itself must not go out of bounds as opposed to the actual
access later on. Therefore after each adjust_ptr_min_max_vals() with
a stack pointer as a destination we simulate a check_stack_access()
of 1 byte on the destination and once that fails the program is
rejected for unprivileged program loads. This is analog to map
value pointer arithmetic and needed for masking later on.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:41 +01:00
Daniel Borkmann
9e57b2969d bpf: restrict map value pointer arithmetic for unprivileged
[ commit 0d6303db79 upstream ]

Restrict map value pointer arithmetic for unprivileged users in that
arithmetic itself must not go out of bounds as opposed to the actual
access later on. Therefore after each adjust_ptr_min_max_vals() with a
map value pointer as a destination it will simulate a check_map_access()
of 1 byte on the destination and once that fails the program is rejected
for unprivileged program loads. We use this later on for masking any
pointer arithmetic with the remainder of the map value space. The
likelihood of breaking any existing real-world unprivileged eBPF
program is very small for this corner case.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:40 +01:00
Daniel Borkmann
232ac70dd3 bpf: enable access to ax register also from verifier rewrite
[ commit 9b73bfdd08 upstream ]

Right now we are using BPF ax register in JIT for constant blinding as
well as in interpreter as temporary variable. Verifier will not be able
to use it simply because its use will get overridden from the former in
bpf_jit_blind_insn(). However, it can be made to work in that blinding
will be skipped if there is prior use in either source or destination
register on the instruction. Taking constraints of ax into account, the
verifier is then open to use it in rewrites under some constraints. Note,
ax register already has mappings in every eBPF JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:40 +01:00
Daniel Borkmann
b855e31037 bpf: move tmp variable into ax register in interpreter
[ commit 144cd91c4c upstream ]

This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run()
into the hidden ax register. The latter is currently only used in JITs
for constant blinding as a temporary scratch register, meaning the BPF
interpreter will never see the use of ax. Therefore it is safe to use
it for the cases where tmp has been used earlier. This is needed to later
on allow restricted hidden use of ax in both interpreter and JITs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:40 +01:00
Daniel Borkmann
333a31c89a bpf: move {prev_,}insn_idx into verifier env
[ commit c08435ec7f upstream ]

Move prev_insn_idx and insn_idx from the do_check() function into
the verifier environment, so they can be read inside the various
helper functions for handling the instructions. It's easier to put
this into the environment rather than changing all call-sites only
to pass it along. insn_idx is useful in particular since this later
on allows to hold state in env->insn_aux_data[env->insn_idx].

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:40 +01:00
Alexei Starovoitov
4371129462 bpf: add per-insn complexity limit
[ commit ceefbc96fa upstream ]

malicious bpf program may try to force the verifier to remember
a lot of distinct verifier states.
Put a limit to number of per-insn 'struct bpf_verifier_state'.
Note that hitting the limit doesn't reject the program.
It potentially makes the verifier do more steps to analyze the program.
It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner
instead of spending cpu time walking long link list.

The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs
with slight increase in number of "steps" it takes to successfully verify
the programs:
                       before    after
bpf_lb-DLB_L3.o         1940      1940
bpf_lb-DLB_L4.o         3089      3089
bpf_lb-DUNKNOWN.o       1065      1065
bpf_lxc-DDROP_ALL.o     28052  |  28162
bpf_lxc-DUNKNOWN.o      35487  |  35541
bpf_netdev.o            10864     10864
bpf_overlay.o           6643      6643
bpf_lcx_jit.o           38437     38437

But it also makes malicious program to be rejected in 0.4 seconds vs 6.5
Hence apply this limit to unprivileged programs only.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:40 +01:00
Alexei Starovoitov
7da6cd690c bpf: improve verifier branch analysis
[ commit 4f7b3e8258 upstream ]

pathological bpf programs may try to force verifier to explode in
the number of branch states:
  20: (d5) if r1 s<= 0x24000028 goto pc+0
  21: (b5) if r0 <= 0xe1fa20 goto pc+2
  22: (d5) if r1 s<= 0x7e goto pc+0
  23: (b5) if r0 <= 0xe880e000 goto pc+0
  24: (c5) if r0 s< 0x2100ecf4 goto pc+0
  25: (d5) if r1 s<= 0xe880e000 goto pc+1
  26: (c5) if r0 s< 0xf4041810 goto pc+0
  27: (d5) if r1 s<= 0x1e007e goto pc+0
  28: (b5) if r0 <= 0xe86be000 goto pc+0
  29: (07) r0 += 16614
  30: (c5) if r0 s< 0x6d0020da goto pc+0
  31: (35) if r0 >= 0x2100ecf4 goto pc+0

Teach verifier to recognize always taken and always not taken branches.
This analysis is already done for == and != comparison.
Expand it to all other branches.

It also helps real bpf programs to be verified faster:
                       before  after
bpf_lb-DLB_L3.o         2003    1940
bpf_lb-DLB_L4.o         3173    3089
bpf_lb-DUNKNOWN.o       1080    1065
bpf_lxc-DDROP_ALL.o     29584   28052
bpf_lxc-DUNKNOWN.o      36916   35487
bpf_netdev.o            11188   10864
bpf_overlay.o           6679    6643
bpf_lcx_jit.o           39555   38437

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-31 08:14:40 +01:00
Thomas Gleixner
21c0d1621b posix-cpu-timers: Unbreak timer rearming
commit 93ad0fc088 upstream.

The recent commit which prevented a division by 0 issue in the alarm timer
code broke posix CPU timers as an unwanted side effect.

The reason is that the common rearm code checks for timer->it_interval
being 0 now. What went unnoticed is that the posix cpu timer setup does not
initialize timer->it_interval as it stores the interval in CPU timer
specific storage. The reason for the separate storage is historical as the
posix CPU timers always had a 64bit nanoseconds representation internally
while timer->it_interval is type ktime_t which used to be a modified
timespec representation on 32bit machines.

Instead of reverting the offending commit and fixing the alarmtimer issue
in the alarmtimer code, store the interval in timer->it_interval at CPU
timer setup time so the common code check works. This also repairs the
existing inconistency of the posix CPU timer code which kept a single shot
timer armed despite of the interval being 0.

The separate storage can be removed in mainline, but that needs to be a
separate commit as the current one has to be backported to stable kernels.

Fixes: 0e334db6bb ("posix-timers: Fix division by zero bug")
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190111133500.840117406@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:14:39 +01:00
Quentin Perret
5f5a57750b FROMGIT: PM / EM: Expose the Energy Model in debugfs
The recently introduced Energy Model (EM) framework manages power cost
tables of CPUs. These tables are currently only visible from kernel
space. However, in order to debug the behaviour of subsystems that use
the EM (EAS for example), it is often required to know what the power
costs are from userspace.

For this reason, introduce under /sys/kernel/debug/energy_model a set of
directories representing the performance domains of the system. Each
performance domain contains a set of sub-directories representing the
different capacity states (cs) and their attributes, as well as a file
exposing the related CPUs.

The resulting hierarchy is as follows on Arm juno r0 for example:

    /sys/kernel/debug/energy_model
    ├── pd0
    │   ├── cpus
    │   ├── cs:450000
    │   │   ├── cost
    │   │   ├── frequency
    │   │   └── power
    │   ├── cs:575000
    │   │   ├── cost
    │   │   ├── frequency
    │   │   └── power
    │   ├── cs:700000
    │   │   ├── cost
    │   │   ├── frequency
    │   │   └── power
    │   ├── cs:775000
    │   │   ├── cost
    │   │   ├── frequency
    │   │   └── power
    │   └── cs:850000
    │       ├── cost
    │       ├── frequency
    │       └── power
    └── pd1
        ├── cpus
        ├── cs:1100000
        │   ├── cost
        │   ├── frequency
        │   └── power
        ├── cs:450000
        │   ├── cost
        │   ├── frequency
        │   └── power
        ├── cs:625000
        │   ├── cost
        │   ├── frequency
        │   └── power
        ├── cs:800000
        │   ├── cost
        │   ├── frequency
        │   └── power
        └── cs:950000
            ├── cost
            ├── frequency
            └── power

Bug: 120440300
Change-Id: I4098c3d3e07fc4a14514751ac881422131b759e9
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 9cac42d064
 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2019-01-28 17:05:00 +00:00
Greg Kroah-Hartman
26bf816608 This is the 4.19.18 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxMGy0ACgkQONu9yGCS
 aT5ppQ/8COjyZg1aTrCrd0ttMHYotw3Lb4B6E/SCf2ub4X38SxGz9irhQ7r2FKdK
 w0ZXlLOF2ddqWe6BUnIfWago4Pk1GBpg3bgnp5XyYTjlJbfI2yZ9ggiO0iNYBPaL
 fN2JwM9eze/7cDlpYbhwGpF4+Wz8wTrzh+NIputcvC6n3SQH/cTGmOUa9rlamQju
 uukkvLanAYY3sqDCl4B415Ds44ROU4filqHYIkvZC81jc3Q0YZ8M7cTmpLcDQKGz
 8Z+Veil07jEM9bF2W8iX79nwxMT+edFC62HMuRCoxJKq+1kccw1TVMWpQ8TWbv13
 zeLOqXxNP6VcNaC251q3QzlInRDp1dtr8KtzA/OG0WFnZBTEDng/iChhiL8qZt0R
 9+Sz7n9uZ5pMRK3tr03Ccjg3AneKWRqad2iaTB/kOwAdu7Uqxz8U9qUuRDFPV7OY
 KTMCCfdS8XpMHl/S+Cvg2dqSNiBEkNmowYO6NvQClG0aoN4/6wH+m2TZ0hCl6PVq
 pNFOTJmp7FOaztEZC4rqW8DoOGeGaNo5DP9A2XKKDR20F7EiAE437ApEQ4p5QGVk
 ek4uslZkwJWU/UOzXRl/Hoz0OlI0ixsdZy1vw88HCl7SD1E7xHJpnRUkOjigTT1Q
 nbCt0Nm/A2+c1tKbzU+PVW8FtIbutZhW1BtrqaIbbHr9NBTICR0=
 =Yg+/
 -----END PGP SIGNATURE-----

Merge 4.19.18 into android-4.19

Changes in 4.19.18
	ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
	mlxsw: spectrum: Disable lag port TX before removing it
	mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
	net: dsa: mv88x6xxx: mv88e6390 errata
	net, skbuff: do not prefer skb allocation fails early
	qmi_wwan: add MTU default to qmap network interface
	r8169: Add support for new Realtek Ethernet
	ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
	net: clear skb->tstamp in bridge forwarding path
	netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets
	gpio: pl061: Move irq_chip definition inside struct pl061
	drm/amd/display: Guard against null stream_state in set_crc_source
	drm/amdkfd: fix interrupt spin lock
	ixgbe: allow IPsec Tx offload in VEPA mode
	platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey
	e1000e: allow non-monotonic SYSTIM readings
	usb: typec: tcpm: Do not disconnect link for self powered devices
	selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
	of: overlay: add missing of_node_put() after add new node to changeset
	writeback: don't decrement wb->refcnt if !wb->bdi
	serial: set suppress_bind_attrs flag only if builtin
	bpf: Allow narrow loads with offset > 0
	ALSA: oxfw: add support for APOGEE duet FireWire
	x86/mce: Fix -Wmissing-prototypes warnings
	MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
	crypto: ecc - regularize scalar for scalar multiplication
	arm64: perf: set suppress_bind_attrs flag to true
	drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
	clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
	samples: bpf: fix: error handling regarding kprobe_events
	usb: gadget: udc: renesas_usb3: add a safety connection way for forced_b_device
	fpga: altera-cvp: fix probing for multiple FPGAs on the bus
	selinux: always allow mounting submounts
	ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
	scsi: qedi: Check for session online before getting iSCSI TLV data.
	drm/amdgpu: Reorder uvd ring init before uvd resume
	rxe: IB_WR_REG_MR does not capture MR's iova field
	efi/libstub: Disable some warnings for x86{,_64}
	jffs2: Fix use of uninitialized delayed_work, lockdep breakage
	clk: imx: make mux parent strings const
	pstore/ram: Do not treat empty buffers as valid
	media: uvcvideo: Refactor teardown of uvc on USB disconnect
	powerpc/xmon: Fix invocation inside lock region
	powerpc/pseries/cpuidle: Fix preempt warning
	media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
	ASoC: use dma_ops of parent device for acp_audio_dma
	media: venus: core: Set dma maximum segment size
	staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io'
	net: call sk_dst_reset when set SO_DONTROUTE
	scsi: target: use consistent left-aligned ASCII INQUIRY data
	scsi: target/core: Make sure that target_wait_for_sess_cmds() waits long enough
	selftests: do not macro-expand failed assertion expressions
	arm64: kasan: Increase stack size for KASAN_EXTRA
	clk: imx6q: reset exclusive gates on init
	arm64: Fix minor issues with the dcache_by_line_op macro
	bpf: relax verifier restriction on BPF_MOV | BPF_ALU
	kconfig: fix file name and line number of warn_ignored_character()
	kconfig: fix memory leak when EOF is encountered in quotation
	mmc: atmel-mci: do not assume idle after atmci_request_end
	btrfs: volumes: Make sure there is no overlap of dev extents at mount time
	btrfs: alloc_chunk: fix more DUP stripe size handling
	btrfs: fix use-after-free due to race between replace start and cancel
	btrfs: improve error handling of btrfs_add_link
	tty/serial: do not free trasnmit buffer page under port lock
	perf intel-pt: Fix error with config term "pt=0"
	perf tests ARM: Disable breakpoint tests 32-bit
	perf svghelper: Fix unchecked usage of strncpy()
	perf parse-events: Fix unchecked usage of strncpy()
	perf vendor events intel: Fix Load_Miss_Real_Latency on SKL/SKX
	netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set
	netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine
	netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
	x86/topology: Use total_cpus for max logical packages calculation
	dm crypt: use u64 instead of sector_t to store iv_offset
	dm kcopyd: Fix bug causing workqueue stalls
	perf stat: Avoid segfaults caused by negated options
	tools lib subcmd: Don't add the kernel sources to the include path
	dm snapshot: Fix excessive memory usage and workqueue stalls
	perf cs-etm: Correct packets swapping in cs_etm__flush()
	perf tools: Add missing sigqueue() prototype for systems lacking it
	perf tools: Add missing open_memstream() prototype for systems lacking it
	quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.
	clocksource/drivers/integrator-ap: Add missing of_node_put()
	dm: Check for device sector overflow if CONFIG_LBDAF is not set
	Bluetooth: btusb: Add support for Intel bluetooth device 8087:0029
	ALSA: bebob: fix model-id of unit for Apogee Ensemble
	sysfs: Disable lockdep for driver bind/unbind files
	IB/usnic: Fix potential deadlock
	scsi: mpt3sas: fix memory ordering on 64bit writes
	scsi: smartpqi: correct lun reset issues
	ath10k: fix peer stats null pointer dereference
	scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
	scsi: megaraid: fix out-of-bound array accesses
	iomap: don't search past page end in iomap_is_partially_uptodate
	ocfs2: fix panic due to unrecovered local alloc
	mm/page-writeback.c: don't break integrity writeback on ->writepage() error
	mm/swap: use nr_node_ids for avail_lists in swap_info_struct
	userfaultfd: clear flag if remap event not enabled
	mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
	iwlwifi: mvm: Send LQ command as async when necessary
	Bluetooth: Fix unnecessary error message for HCI request completion
	ipmi: fix use-after-free of user->release_barrier.rda
	ipmi: msghandler: Fix potential Spectre v1 vulnerabilities
	ipmi: Prevent use-after-free in deliver_response
	ipmi:ssif: Fix handling of multi-part return messages
	ipmi: Don't initialize anything in the core until something uses it
	Linux 4.19.18

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-26 11:58:37 +01:00
Jiong Wang
344b51e7ce bpf: relax verifier restriction on BPF_MOV | BPF_ALU
[ Upstream commit e434b8cdf7 ]

Currently, the destination register is marked as unknown for 32-bit
sub-register move (BPF_MOV | BPF_ALU) whenever the source register type is
SCALAR_VALUE.

This is too conservative that some valid cases will be rejected.
Especially, this may turn a constant scalar value into unknown value that
could break some assumptions of verifier.

For example, test_l4lb_noinline.c has the following C code:

    struct real_definition *dst

1:  if (!get_packet_dst(&dst, &pckt, vip_info, is_ipv6))
2:    return TC_ACT_SHOT;
3:
4:  if (dst->flags & F_IPV6) {

get_packet_dst is responsible for initializing "dst" into valid pointer and
return true (1), otherwise return false (0). The compiled instruction
sequence using alu32 will be:

  412: (54) (u32) r7 &= (u32) 1
  413: (bc) (u32) r0 = (u32) r7
  414: (95) exit

insn 413, a BPF_MOV | BPF_ALU, however will turn r0 into unknown value even
r7 contains SCALAR_VALUE 1.

This causes trouble when verifier is walking the code path that hasn't
initialized "dst" inside get_packet_dst, for which case 0 is returned and
we would then expect verifier concluding line 1 in the above C code pass
the "if" check, therefore would skip fall through path starting at line 4.
Now, because r0 returned from callee has became unknown value, so verifier
won't skip analyzing path starting at line 4 and "dst->flags" requires
dereferencing the pointer "dst" which actually hasn't be initialized for
this path.

This patch relaxed the code marking sub-register move destination. For a
SCALAR_VALUE, it is safe to just copy the value from source then truncate
it into 32-bit.

A unit test also included to demonstrate this issue. This test will fail
before this patch.

This relaxation could let verifier skipping more paths for conditional
comparison against immediate. It also let verifier recording a more
accurate/strict value for one register at one state, if this state end up
with going through exit without rejection and it is used for state
comparison later, then it is possible an inaccurate/permissive value is
better. So the real impact on verifier processed insn number is complex.
But in all, without this fix, valid program could be rejected.

>From real benchmarking on kernel selftests and Cilium bpf tests, there is
no impact on processed instruction number when tests ares compiled with
default compilation options. There is slightly improvements when they are
compiled with -mattr=+alu32 after this patch.

Also, test_xdp_noinline/-mattr=+alu32 now passed verification. It is
rejected before this fix.

Insn processed before/after this patch:

                        default     -mattr=+alu32

Kernel selftest

===
test_xdp.o              371/371      369/369
test_l4lb.o             6345/6345    5623/5623
test_xdp_noinline.o     2971/2971    rejected/2727
test_tcp_estates.o      429/429      430/430

Cilium bpf
===
bpf_lb-DLB_L3.o:        2085/2085     1685/1687
bpf_lb-DLB_L4.o:        2287/2287     1986/1982
bpf_lb-DUNKNOWN.o:      690/690       622/622
bpf_lxc.o:              95033/95033   N/A
bpf_netdev.o:           7245/7245     N/A
bpf_overlay.o:          2898/2898     3085/2947

NOTE:
  - bpf_lxc.o and bpf_netdev.o compiled by -mattr=+alu32 are rejected by
    verifier due to another issue inside verifier on supporting alu32
    binary.
  - Each cilium bpf program could generate several processed insn number,
    above number is sum of them.

v1->v2:
 - Restrict the change on SCALAR_VALUE.
 - Update benchmark numbers on Cilium bpf tests.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:39 +01:00
Andrey Ignatov
464b01e440 bpf: Allow narrow loads with offset > 0
[ Upstream commit 46f53a65d2 ]

Currently BPF verifier allows narrow loads for a context field only with
offset zero. E.g. if there is a __u32 field then only the following
loads are permitted:
  * off=0, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=0, size=4 (full).

On the other hand LLVM can generate a load with offset different than
zero that make sense from program logic point of view, but verifier
doesn't accept it.

E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code:

  #define DST_IP4			0xC0A801FEU /* 192.168.1.254 */
  ...
  	if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&

where ctx is struct bpf_sock_addr.

Some versions of LLVM can produce the following byte code for it:

       8:       71 12 07 00 00 00 00 00         r2 = *(u8 *)(r1 + 7)
       9:       67 02 00 00 18 00 00 00         r2 <<= 24
      10:       18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00         r3 = 4261412864 ll
      12:       5d 32 07 00 00 00 00 00         if r2 != r3 goto +7 <LBB0_6>

where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1
and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently
rejected by verifier.

Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok()
what means any is_valid_access implementation, that uses the function,
works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or
sock_addr_is_valid_access() for bpf_sock_addr.

The patch makes such loads supported. Offset can be in [0; size_default)
but has to be multiple of load size. E.g. for __u32 field the following
loads are supported now:
  * off=0, size=1 (narrow);
  * off=1, size=1 (narrow);
  * off=2, size=1 (narrow);
  * off=3, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=2, size=2 (narrow);
  * off=0, size=4 (full).

Reported-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:34 +01:00
Greg Kroah-Hartman
caf54339d3 This is the 4.19.15 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlw6+/8ACgkQONu9yGCS
 aT6VKw/9FUsbfy4MzFMH4XmTn/k9AHhcYdQ+gSEIcJbt/JLT13fU64e/O8QlQ3PF
 5GWNY5ObA+HKlReCufSuW+AuAw5s/FLVaGLn8HZQ/FU27ZgTrGpFjb3vcnYSjsU0
 vurXjstzndiRmpSahNufU6t2X7fkgyd41M94572pyidcT5NcP+ngVICwXtQOsXjH
 QkIaMZHTmr4le0Z1oNvDraNkESJnxo7+D2eJebx5yDReD/Mdm3gAl2q0UkDXpZzk
 qb3tH1oronm7ZfiEBCZYrewxMfz78ugJW3hpOu//JCbrVI2Ja0sBSh3VB6EFceoY
 WI9z8JkZ3xQeLQnCdiabdQ66mGQa9XiLUwj7+sR//P7OduwJEv8HTYpDi8iqA6Vj
 SigQmjEunjSHccqBWaPy1ZMAIXoNWQBC4EJ2erv3pAPyJr2FBw9o2Bmu6JAV18ow
 iX94YnQtllZp8cJsEKEUWEmXZPLcTy6mXLMLoQ922P4p4KRJVQUhde4EeZZLFn27
 6sPwASnrfEW9RS/i1XuxdDPbnMYg6uE0UoRfxp1tAUBKaVArjMglyIAj7t9GA07W
 4480c3AegmDFZ+GxX+w5+duKRZnxBi+sHw8aBbZRi5m9mlxeFCSWSe0hPPRR2LIQ
 fZrFySHmgbl1NtTP4cvZOb7bTxoyfjcIQfiqu7cwNsYGXtbfOuk=
 =A6Ro
 -----END PGP SIGNATURE-----

Merge 4.19.15 into android-4.19

Changes in 4.19.15
	ARM: dts: sun8i: a83t: bananapi-m3: increase vcc-pd voltage to 3.3V
	pinctrl: meson: fix pull enable register calculation
	arm64: dts: mt7622: fix no more console output on rfb1
	powerpc: Fix COFF zImage booting on old powermacs
	powerpc/mm: Fix linux page tables build with some configs
	HID: ite: Add USB id match for another ITE based keyboard rfkill key quirk
	ARM: dts: imx7d-pico: Describe the Wifi clock
	ARM: imx: update the cpu power up timing setting on i.mx6sx
	ARM: dts: imx7d-nitrogen7: Fix the description of the Wifi clock
	IB/mlx5: Block DEVX umem from the non applicable cases
	Input: restore EV_ABS ABS_RESERVED
	powerpc/mm: Fallback to RAM if the altmap is unusable
	drm/amdgpu: Fix DEBUG_LOCKS_WARN_ON(depth <= 0) in amdgpu_ctx.lock
	IB/core: Fix oops in netdev_next_upper_dev_rcu()
	checkstack.pl: fix for aarch64
	xfrm: Fix error return code in xfrm_output_one()
	xfrm: Fix bucket count reported to userspace
	xfrm: Fix NULL pointer dereference in xfrm_input when skb_dst_force clears the dst_entry.
	ieee802154: hwsim: fix off-by-one in parse nested
	netfilter: nf_tables: fix suspicious RCU usage in nft_chain_stats_replace()
	netfilter: seqadj: re-load tcp header pointer after possible head reallocation
	Revert "scsi: qla2xxx: Fix NVMe Target discovery"
	scsi: bnx2fc: Fix NULL dereference in error handling
	Input: omap-keypad - fix idle configuration to not block SoC idle states
	Input: synaptics - enable RMI on ThinkPad T560
	ibmvnic: Convert reset work item mutex to spin lock
	ibmvnic: Fix non-atomic memory allocation in IRQ context
	ieee802154: ca8210: fix possible u8 overflow in ca8210_rx_done
	x86/mm: Fix guard hole handling
	x86/dump_pagetables: Fix LDT remap address marker
	i40e: fix mac filter delete when setting mac address
	ixgbe: Fix race when the VF driver does a reset
	netfilter: ipset: do not call ipset_nest_end after nla_nest_cancel
	netfilter: nat: can't use dst_hold on noref dst
	netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()
	bnx2x: Clear fip MAC when fcoe offload support is disabled
	bnx2x: Remove configured vlans as part of unload sequence.
	bnx2x: Send update-svid ramrod with retry/poll flags enabled
	scsi: target: iscsi: cxgbit: fix csk leak
	scsi: target: iscsi: cxgbit: add missing spin_lock_init()
	mt76: fix potential NULL pointer dereference in mt76_stop_tx_queues
	x86, hyperv: remove PCI dependency
	drivers: net: xgene: Remove unnecessary forward declarations
	net/tls: Init routines in create_ctx
	w90p910_ether: remove incorrect __init annotation
	net: hns: Incorrect offset address used for some registers.
	net: hns: All ports can not work when insmod hns ko after rmmod.
	net: hns: Some registers use wrong address according to the datasheet.
	net: hns: Fixed bug that netdev was opened twice
	net: hns: Clean rx fbd when ae stopped.
	net: hns: Free irq when exit from abnormal branch
	net: hns: Avoid net reset caused by pause frames storm
	net: hns: Fix ntuple-filters status error.
	net: hns: Add mac pcs config when enable|disable mac
	net: hns: Fix ping failed when use net bridge and send multicast
	mac80211: fix a kernel panic when TXing after TXQ teardown
	SUNRPC: Fix a race with XPRT_CONNECTING
	qed: Fix an error code qed_ll2_start_xmit()
	net: macb: fix random memory corruption on RX with 64-bit DMA
	net: macb: fix dropped RX frames due to a race
	net: macb: add missing barriers when reading descriptors
	lan743x: Expand phy search for LAN7431
	lan78xx: Resolve issue with changing MAC address
	vxge: ensure data0 is initialized in when fetching firmware version information
	nl80211: fix memory leak if validate_pae_over_nl80211() fails
	mac80211: free skb fraglist before freeing the skb
	kbuild: fix false positive warning/error about missing libelf
	m68k: Fix memblock-related crashes
	virtio: fix test build after uio.h change
	lan743x: Remove MAC Reset from initialization
	gpio: mvebu: only fail on missing clk if pwm is actually to be used
	Input: synaptics - enable SMBus for HP EliteBook 840 G4
	net: netxen: fix a missing check and an uninitialized use
	qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup
	serial/sunsu: fix refcount leak
	auxdisplay: charlcd: fix x/y command parsing
	scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
	scsi: lpfc: do not set queue->page_count to 0 if pc_sli4_params.wqpcnt is invalid
	fork: record start_time late
	zram: fix double free backing device
	hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
	mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
	mm, devm_memremap_pages: kill mapping "System RAM" support
	mm, devm_memremap_pages: fix shutdown handling
	mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support
	mm, hmm: use devm semantics for hmm_devmem_{add, remove}
	mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL
	mm, swap: fix swapoff with KSM pages
	memcg, oom: notify on oom killer invocation from the charge path
	sunrpc: fix cache_head leak due to queued request
	sunrpc: use SVC_NET() in svcauth_gss_* functions
	powerpc: remove old GCC version checks
	powerpc: consolidate -mno-sched-epilog into FTRACE flags
	powerpc: avoid -mno-sched-epilog on GCC 4.9 and newer
	powerpc: Disable -Wbuiltin-requires-header when setjmp is used
	kbuild: add -no-integrated-as Clang option unconditionally
	kbuild: consolidate Clang compiler flags
	Makefile: Export clang toolchain variables
	powerpc/boot: Set target when cross-compiling for clang
	raid6/ppc: Fix build for clang
	dma-direct: do not include SME mask in the DMA supported check
	mt76x0: init hw capabilities
	media: cx23885: only reset DMA on problematic CPUs
	ALSA: cs46xx: Potential NULL dereference in probe
	ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit()
	ALSA: usb-audio: Check mixer unit descriptors more strictly
	ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks
	ALSA: usb-audio: Always check descriptor sizes in parser code
	srcu: Lock srcu_data structure in srcu_gp_start()
	driver core: Add missing dev->bus->need_parent_lock checks
	Fix failure path in alloc_pid()
	block: deactivate blk_stat timer in wbt_disable_default()
	block: mq-deadline: Fix write completion handling
	dlm: fixed memory leaks after failed ls_remove_names allocation
	dlm: possible memory leak on error path in create_lkb()
	dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
	dlm: memory leaks on error path in dlm_user_request()
	gfs2: Get rid of potential double-freeing in gfs2_create_inode
	gfs2: Fix loop in gfs2_rbm_find
	b43: Fix error in cordic routine
	selinux: policydb - fix byte order and alignment issues
	PCI / PM: Allow runtime PM without callback functions
	lockd: Show pid of lockd for remote locks
	nfsd4: zero-length WRITE should succeed
	arm64: drop linker script hack to hide __efistub_ symbols
	arm64: relocatable: fix inconsistencies in linker script and options
	leds: pwm: silently error out on EPROBE_DEFER
	Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing"
	powerpc/tm: Set MSR[TS] just prior to recheckpoint
	iio: dac: ad5686: fix bit shift read register
	9p/net: put a lower bound on msize
	rxe: fix error completion wr_id and qp_num
	RDMA/srpt: Fix a use-after-free in the channel release code
	iommu/vt-d: Handle domain agaw being less than iommu agaw
	sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b
	ceph: don't update importing cap's mseq when handing cap export
	video: fbdev: pxafb: Fix "WARNING: invalid free of devm_ allocated data"
	drivers/perf: hisi: Fixup one DDRC PMU register offset
	genwqe: Fix size check
	intel_th: msu: Fix an off-by-one in attribute store
	power: supply: olpc_battery: correct the temperature units
	of: of_node_get()/of_node_put() nodes held in phandle cache
	of: __of_detach_node() - remove node from phandle cache
	lib: fix build failure in CONFIG_DEBUG_VIRTUAL test
	drm/nouveau/drm/nouveau: Check rc from drm_dp_mst_topology_mgr_resume()
	drm/vc4: Set ->is_yuv to false when num_planes == 1
	drm/rockchip: psr: do not dereference encoder before it is null checked.
	drm/amd/display: Fix unintialized max_bpc state values
	bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw
	Linux 4.19.15

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-13 10:25:45 +01:00
Linus Torvalds
dc8408ea0b sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b
commit c40f7d74c7 upstream.

Zhipeng Xie, Xie XiuQi and Sargun Dhillon reported lockups in the
scheduler under high loads, starting at around the v4.18 time frame,
and Zhipeng Xie tracked it down to bugs in the rq->leaf_cfs_rq_list
manipulation.

Do a (manual) revert of:

  a9e7f6544b ("sched/fair: Fix O(nr_cgroups) in load balance path")

It turns out that the list_del_leaf_cfs_rq() introduced by this commit
is a surprising property that was not considered in followup commits
such as:

  9c2791f936 ("sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list")

As Vincent Guittot explains:

 "I think that there is a bigger problem with commit a9e7f6544b and
  cfs_rq throttling:

  Let take the example of the following topology TG2 --> TG1 --> root:

   1) The 1st time a task is enqueued, we will add TG2 cfs_rq then TG1
      cfs_rq to leaf_cfs_rq_list and we are sure to do the whole branch in
      one path because it has never been used and can't be throttled so
      tmp_alone_branch will point to leaf_cfs_rq_list at the end.

   2) Then TG1 is throttled

   3) and we add TG3 as a new child of TG1.

   4) The 1st enqueue of a task on TG3 will add TG3 cfs_rq just before TG1
      cfs_rq and tmp_alone_branch will stay  on rq->leaf_cfs_rq_list.

  With commit a9e7f6544b, we can del a cfs_rq from rq->leaf_cfs_rq_list.
  So if the load of TG1 cfs_rq becomes NULL before step 2) above, TG1
  cfs_rq is removed from the list.
  Then at step 4), TG3 cfs_rq is added at the beginning of rq->leaf_cfs_rq_list
  but tmp_alone_branch still points to TG3 cfs_rq because its throttled
  parent can't be enqueued when the lock is released.
  tmp_alone_branch doesn't point to rq->leaf_cfs_rq_list whereas it should.

  So if TG3 cfs_rq is removed or destroyed before tmp_alone_branch
  points on another TG cfs_rq, the next TG cfs_rq that will be added,
  will be linked outside rq->leaf_cfs_rq_list - which is bad.

  In addition, we can break the ordering of the cfs_rq in
  rq->leaf_cfs_rq_list but this ordering is used to update and
  propagate the update from leaf down to root."

Instead of trying to work through all these cases and trying to reproduce
the very high loads that produced the lockup to begin with, simplify
the code temporarily by reverting a9e7f6544b - which change was clearly
not thought through completely.

This (hopefully) gives us a kernel that doesn't lock up so people
can continue to enjoy their holidays without worrying about regressions. ;-)

[ mingo: Wrote changelog, fixed weird spelling in code comment while at it. ]

Analyzed-by: Xie XiuQi <xiexiuqi@huawei.com>
Analyzed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reported-by: Zhipeng Xie <xiezhipeng1@huawei.com>
Reported-by: Sargun Dhillon <sargun@sargun.me>
Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Tested-by: Zhipeng Xie <xiezhipeng1@huawei.com>
Tested-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: <stable@vger.kernel.org> # v4.13+
Cc: Bin Li <huawei.libin@huawei.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a9e7f6544b ("sched/fair: Fix O(nr_cgroups) in load balance path")
Link: http://lkml.kernel.org/r/1545879866-27809-1-git-send-email-xiexiuqi@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:09 +01:00
Matthew Wilcox
a854491995 Fix failure path in alloc_pid()
commit 1a80dade01 upstream.

The failure path removes the allocated PIDs from the wrong namespace.
This could lead to us inadvertently reusing PIDs in the leaf namespace
and leaking PIDs in parent namespaces.

Fixes: 95846ecf9d ("pid: replace pid bitmap implementation with IDR API")
Cc: <stable@vger.kernel.org>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:06 +01:00