mirror of
https://github.com/torvalds/linux.git
synced 2026-06-10 23:53:52 +02:00
099bdfba32
351 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2df0fb4a4b |
This is the 5.10.50 stable release
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmDu+1UACgkQONu9yGCS
aT7jQRAAuLDi7ejk3JUameYFMzVXGAUE6yPs392/lWJzey7IBf+2uLqz4FzqqUHp
U1GkEKJVaCacEfi0+rpi7BxNFljUdZdg/F/P68ARtAWPvwqAeJ4QIh5u3A682UUO
1M5h6e5/oY9F4kQIb5Kot04avqOeR6lTqrkA8jeP5h43ngyLWuS2d+5oOGmbCukS
UgEaCC6CiKjcN51UUTj/fXMQ0X4IDHP5pD8rWwH0IvK0i7gduvk744un8LVB6aW1
rNV88C3BEFFtkPQh2XySnXM5Ok8kYlhFoTDsqlpeAX7pA8hiUPYBoRzTg0MJtPZn
N1L/Yqhvxmn5xs9HAw7mDOo8E8NWXzsT5FvZVaBeiCgtdKmcPszylXqmSt1oiOb0
/EmkCWmlbG/3qWql24+LU4XP36iVPx32HQxAgg2XbnlNU5o0E1y2F98p6p/3JSWX
NAjHtmg/MxueFQ+w8bDzhO8YzYn1dIU3V3qaXRvtpODrmaSYW+bwCyPtSjXe3/vL
604zb3dOg9+tD/gKqfRb/UPMu24nNll8M/gnSRci05/thmIxwtYudPwoLNSejDqr
e+a8vejISfIyp41XrpYQbUeKs1WOA+A7vgx6CZrT791afiT+6UgC/ecQfg1NFxhs
8ayWpocaIszxyXxVGro1rfwZeQmTlbTCZ5wVdpn9sDPZfI7epts=
=FCrA
-----END PGP SIGNATURE-----
Merge 5.10.50 into android12-5.10-lts
Changes in 5.10.50
Bluetooth: hci_qca: fix potential GPF
Bluetooth: btqca: Don't modify firmware contents in-place
Bluetooth: Remove spurious error message
ALSA: usb-audio: fix rate on Ozone Z90 USB headset
ALSA: usb-audio: Fix OOB access at proc output
ALSA: firewire-motu: fix stream format for MOTU 8pre FireWire
ALSA: usb-audio: scarlett2: Fix wrong resume call
ALSA: intel8x0: Fix breakage at ac97 clock measurement
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8
ALSA: hda/realtek: Add another ALC236 variant support
ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook x360 830 G8
ALSA: hda/realtek: Improve fixup for HP Spectre x360 15-df0xxx
ALSA: hda/realtek: Fix bass speaker DAC mapping for Asus UM431D
ALSA: hda/realtek: Apply LED fixup for HP Dragonfly G1, too
ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC
media: dvb-usb: fix wrong definition
Input: usbtouchscreen - fix control-request directions
net: can: ems_usb: fix use-after-free in ems_usb_disconnect()
usb: gadget: eem: fix echo command packet response issue
usb: renesas-xhci: Fix handling of unknown ROM state
USB: cdc-acm: blacklist Heimann USB Appset device
usb: dwc3: Fix debugfs creation flow
usb: typec: Add the missed altmode_id_remove() in typec_register_altmode()
xhci: solve a double free problem while doing s4
gfs2: Fix underflow in gfs2_page_mkwrite
gfs2: Fix error handling in init_statfs
ntfs: fix validity check for file name attribute
selftests/lkdtm: Avoid needing explicit sub-shell
copy_page_to_iter(): fix ITER_DISCARD case
iov_iter_fault_in_readable() should do nothing in xarray case
Input: joydev - prevent use of not validated data in JSIOCSBTNMAP ioctl
crypto: nx - Fix memcpy() over-reading in nonce
crypto: ccp - Annotate SEV Firmware file names
arm_pmu: Fix write counter incorrect in ARMv7 big-endian mode
ARM: dts: ux500: Fix LED probing
ARM: dts: at91: sama5d4: fix pinctrl muxing
btrfs: send: fix invalid path for unlink operations after parent orphanization
btrfs: compression: don't try to compress if we don't have enough pages
btrfs: clear defrag status of a root if starting transaction fails
ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle
ext4: fix kernel infoleak via ext4_extent_header
ext4: fix overflow in ext4_iomap_alloc()
ext4: return error code when ext4_fill_flex_info() fails
ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
ext4: remove check for zero nr_to_scan in ext4_es_scan()
ext4: fix avefreec in find_group_orlov
ext4: use ext4_grp_locked_error in mb_find_extent
can: bcm: delay release of struct bcm_op after synchronize_rcu()
can: gw: synchronize rcu operations before removing gw job entry
can: isotp: isotp_release(): omit unintended hrtimer restart on socket release
can: j1939: j1939_sk_init(): set SOCK_RCU_FREE to call sk_destruct() after RCU is done
can: peak_pciefd: pucan_handle_status(): fix a potential starvation issue in TX path
mac80211: remove iwlwifi specific workaround that broke sta NDP tx
SUNRPC: Fix the batch tasks count wraparound.
SUNRPC: Should wake up the privileged task firstly.
bus: mhi: Wait for M2 state during system resume
mm/gup: fix try_grab_compound_head() race with split_huge_page()
perf/smmuv3: Don't trample existing events with global filter
KVM: nVMX: Handle split-lock #AC exceptions that happen in L2
KVM: PPC: Book3S HV: Workaround high stack usage with clang
KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk
s390/cio: dont call css_wait_for_slow_path() inside a lock
s390: mm: Fix secure storage access exception handling
f2fs: Prevent swap file in LFS mode
clk: agilex/stratix10/n5x: fix how the bypass_reg is handled
clk: agilex/stratix10: remove noc_clk
clk: agilex/stratix10: fix bypass representation
rtc: stm32: Fix unbalanced clk_disable_unprepare() on probe error path
iio: frequency: adf4350: disable reg and clk on error in adf4350_probe()
iio: light: tcs3472: do not free unallocated IRQ
iio: ltr501: mark register holding upper 8 bits of ALS_DATA{0,1} and PS_DATA as volatile, too
iio: ltr501: ltr559: fix initialization of LTR501_ALS_CONTR
iio: ltr501: ltr501_read_ps(): add missing endianness conversion
iio: accel: bma180: Fix BMA25x bandwidth register values
serial: mvebu-uart: fix calculation of clock divisor
serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
serial_cs: Add Option International GSM-Ready 56K/ISDN modem
serial_cs: remove wrong GLOBETROTTER.cis entry
ath9k: Fix kernel NULL pointer dereference during ath_reset_internal()
ssb: sdio: Don't overwrite const buffer if block_write fails
rsi: Assign beacon rate settings to the correct rate_info descriptor field
rsi: fix AP mode with WPA failure due to encrypted EAPOL
tracing/histograms: Fix parsing of "sym-offset" modifier
tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
seq_buf: Make trace_seq_putmem_hex() support data longer than 8
powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
loop: Fix missing discard support when using LOOP_CONFIGURE
evm: Execute evm_inode_init_security() only when an HMAC key is loaded
evm: Refuse EVM_ALLOW_METADATA_WRITES only if an HMAC key is loaded
fuse: Fix crash in fuse_dentry_automount() error path
fuse: Fix crash if superblock of submount gets killed early
fuse: Fix infinite loop in sget_fc()
fuse: ignore PG_workingset after stealing
fuse: check connected before queueing on fpq->io
fuse: reject internal errno
thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
spi: Make of_register_spi_device also set the fwnode
Add a reference to ucounts for each cred
staging: media: rkvdec: fix pm_runtime_get_sync() usage count
media: marvel-ccic: fix some issues when getting pm_runtime
media: mdk-mdp: fix pm_runtime_get_sync() usage count
media: s5p: fix pm_runtime_get_sync() usage count
media: am437x: fix pm_runtime_get_sync() usage count
media: sh_vou: fix pm_runtime_get_sync() usage count
media: mtk-vcodec: fix PM runtime get logic
media: s5p-jpeg: fix pm_runtime_get_sync() usage count
media: sunxi: fix pm_runtime_get_sync() usage count
media: sti/bdisp: fix pm_runtime_get_sync() usage count
media: exynos4-is: fix pm_runtime_get_sync() usage count
media: exynos-gsc: fix pm_runtime_get_sync() usage count
spi: spi-loopback-test: Fix 'tx_buf' might be 'rx_buf'
spi: spi-topcliff-pch: Fix potential double free in pch_spi_process_messages()
spi: omap-100k: Fix the length judgment problem
regulator: uniphier: Add missing MODULE_DEVICE_TABLE
sched/core: Initialize the idle task with preemption disabled
hwrng: exynos - Fix runtime PM imbalance on error
crypto: nx - add missing MODULE_DEVICE_TABLE
media: sti: fix obj-$(config) targets
media: cpia2: fix memory leak in cpia2_usb_probe
media: cobalt: fix race condition in setting HPD
media: hevc: Fix dependent slice segment flags
media: pvrusb2: fix warning in pvr2_i2c_core_done
media: imx: imx7_mipi_csis: Fix logging of only error event counters
crypto: qat - check return code of qat_hal_rd_rel_reg()
crypto: qat - remove unused macro in FW loader
crypto: qce: skcipher: Fix incorrect sg count for dma transfers
arm64: perf: Convert snprintf to sysfs_emit
sched/fair: Fix ascii art by relpacing tabs
media: i2c: ov2659: Use clk_{prepare_enable,disable_unprepare}() to set xvclk on/off
media: bt878: do not schedule tasklet when it is not setup
media: em28xx: Fix possible memory leak of em28xx struct
media: hantro: Fix .buf_prepare
media: cedrus: Fix .buf_prepare
media: v4l2-core: Avoid the dangling pointer in v4l2_fh_release
media: bt8xx: Fix a missing check bug in bt878_probe
media: st-hva: Fix potential NULL pointer dereferences
crypto: hisilicon/sec - fixup 3des minimum key size declaration
Makefile: fix GDB warning with CONFIG_RELR
media: dvd_usb: memory leak in cinergyt2_fe_attach
memstick: rtsx_usb_ms: fix UAF
mmc: sdhci-sprd: use sdhci_sprd_writew
mmc: via-sdmmc: add a check against NULL pointer dereference
spi: meson-spicc: fix a wrong goto jump for avoiding memory leak.
spi: meson-spicc: fix memory leak in meson_spicc_probe
crypto: shash - avoid comparing pointers to exported functions under CFI
media: dvb_net: avoid speculation from net slot
media: siano: fix device register error path
media: imx-csi: Skip first few frames from a BT.656 source
hwmon: (max31790) Report correct current pwm duty cycles
hwmon: (max31790) Fix pwmX_enable attributes
drivers/perf: fix the missed ida_simple_remove() in ddr_perf_probe()
KVM: PPC: Book3S HV: Fix TLB management on SMT8 POWER9 and POWER10 processors
btrfs: fix error handling in __btrfs_update_delayed_inode
btrfs: abort transaction if we fail to update the delayed inode
btrfs: sysfs: fix format string for some discard stats
btrfs: don't clear page extent mapped if we're not invalidating the full page
btrfs: disable build on platforms having page size 256K
locking/lockdep: Fix the dep path printing for backwards BFS
lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
KVM: s390: get rid of register asm usage
regulator: mt6358: Fix vdram2 .vsel_mask
regulator: da9052: Ensure enough delay time for .set_voltage_time_sel
media: Fix Media Controller API config checks
ACPI: video: use native backlight for GA401/GA502/GA503
HID: do not use down_interruptible() when unbinding devices
EDAC/ti: Add missing MODULE_DEVICE_TABLE
ACPI: processor idle: Fix up C-state latency if not ordered
hv_utils: Fix passing zero to 'PTR_ERR' warning
lib: vsprintf: Fix handling of number field widths in vsscanf
Input: goodix - platform/x86: touchscreen_dmi - Move upside down quirks to touchscreen_dmi.c
platform/x86: touchscreen_dmi: Add an extra entry for the upside down Goodix touchscreen on Teclast X89 tablets
platform/x86: touchscreen_dmi: Add info for the Goodix GT912 panel of TM800A550L tablets
ACPI: EC: Make more Asus laptops use ECDT _GPE
block_dump: remove block_dump feature in mark_inode_dirty()
blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
blk-mq: clear stale request in tags->rq[] before freeing one request pool
fs: dlm: cancel work sync othercon
random32: Fix implicit truncation warning in prandom_seed_state()
open: don't silently ignore unknown O-flags in openat2()
drivers: hv: Fix missing error code in vmbus_connect()
fs: dlm: fix memory leak when fenced
ACPICA: Fix memory leak caused by _CID repair function
ACPI: bus: Call kobject_put() in acpi_init() error path
ACPI: resources: Add checks for ACPI IRQ override
block: fix race between adding/removing rq qos and normal IO
platform/x86: asus-nb-wmi: Revert "Drop duplicate DMI quirk structures"
platform/x86: asus-nb-wmi: Revert "add support for ASUS ROG Zephyrus G14 and G15"
platform/x86: toshiba_acpi: Fix missing error code in toshiba_acpi_setup_keyboard()
nvme-pci: fix var. type for increasing cq_head
nvmet-fc: do not check for invalid target port in nvmet_fc_handle_fcp_rqst()
EDAC/Intel: Do not load EDAC driver when running as a guest
PCI: hv: Add check for hyperv_initialized in init_hv_pci_drv()
cifs: improve fallocate emulation
ACPI: EC: trust DSDT GPE for certain HP laptop
clocksource: Retry clock read if long delays detected
clocksource: Check per-CPU clock synchronization when marked unstable
tpm_tis_spi: add missing SPI device ID entries
ACPI: tables: Add custom DSDT file as makefile prerequisite
HID: wacom: Correct base usage for capacitive ExpressKey status bits
cifs: fix missing spinlock around update to ses->status
mailbox: qcom: Use PLATFORM_DEVID_AUTO to register platform device
block: fix discard request merge
kthread_worker: fix return value when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
ia64: mca_drv: fix incorrect array size calculation
writeback, cgroup: increment isw_nr_in_flight before grabbing an inode
spi: Allow to have all native CSs in use along with GPIOs
spi: Avoid undefined behaviour when counting unused native CSs
media: venus: Rework error fail recover logic
media: s5p_cec: decrement usage count if disabled
media: hantro: do a PM resume earlier
crypto: ixp4xx - dma_unmap the correct address
crypto: ixp4xx - update IV after requests
crypto: ux500 - Fix error return code in hash_hw_final()
sata_highbank: fix deferred probing
pata_rb532_cf: fix deferred probing
media: I2C: change 'RST' to "RSET" to fix multiple build errors
sched/uclamp: Fix wrong implementation of cpu.uclamp.min
sched/uclamp: Fix locking around cpu_util_update_eff()
kbuild: Fix objtool dependency for 'OBJECT_FILES_NON_STANDARD_<obj> := n'
pata_octeon_cf: avoid WARN_ON() in ata_host_activate()
evm: fix writing <securityfs>/evm overflow
x86/elf: Use _BITUL() macro in UAPI headers
crypto: sa2ul - Fix leaks on failure paths with sa_dma_init()
crypto: sa2ul - Fix pm_runtime enable in sa_ul_probe()
crypto: ccp - Fix a resource leak in an error handling path
media: rc: i2c: Fix an error message
pata_ep93xx: fix deferred probing
locking/lockdep: Reduce LOCKDEP dependency list
media: rkvdec: Fix .buf_prepare
media: exynos4-is: Fix a use after free in isp_video_release
media: au0828: fix a NULL vs IS_ERR() check
media: tc358743: Fix error return code in tc358743_probe_of()
media: gspca/gl860: fix zero-length control requests
m68k: atari: Fix ATARI_KBD_CORE kconfig unmet dependency warning
media: siano: Fix out-of-bounds warnings in smscore_load_firmware_family2()
regulator: fan53880: Fix vsel_mask setting for FAN53880_BUCK
crypto: nitrox - fix unchecked variable in nitrox_register_interrupts
crypto: omap-sham - Fix PM reference leak in omap sham ops
crypto: x86/curve25519 - fix cpu feature checking logic in mod_exit
crypto: sm2 - remove unnecessary reset operations
crypto: sm2 - fix a memory leak in sm2
mmc: usdhi6rol0: fix error return code in usdhi6_probe()
arm64: consistently use reserved_pg_dir
arm64/mm: Fix ttbr0 values stored in struct thread_info for software-pan
media: subdev: remove VIDIOC_DQEVENT_TIME32 handling
media: s5p-g2d: Fix a memory leak on ctx->fh.m2m_ctx
hwmon: (lm70) Use device_get_match_data()
hwmon: (lm70) Revert "hwmon: (lm70) Add support for ACPI"
hwmon: (max31722) Remove non-standard ACPI device IDs
hwmon: (max31790) Fix fan speed reporting for fan7..12
KVM: nVMX: Sync all PGDs on nested transition with shadow paging
KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap
KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch
KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level()
perf/arm-cmn: Fix invalid pointer when access dtc object sharing the same IRQ number
KVM: arm64: Don't zero the cycle count register when PMCR_EL0.P is set
regulator: hi655x: Fix pass wrong pointer to config.driver_data
btrfs: clear log tree recovering status if starting transaction fails
x86/sev: Make sure IRQs are disabled while GHCB is active
x86/sev: Split up runtime #VC handler for correct state tracking
sched/rt: Fix RT utilization tracking during policy change
sched/rt: Fix Deadline utilization tracking during policy change
sched/uclamp: Fix uclamp_tg_restrict()
lockdep: Fix wait-type for empty stack
lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
spi: spi-sun6i: Fix chipselect/clock bug
crypto: nx - Fix RCU warning in nx842_OF_upd_status
psi: Fix race between psi_trigger_create/destroy
media: v4l2-async: Clean v4l2_async_notifier_add_fwnode_remote_subdev
media: video-mux: Skip dangling endpoints
PM / devfreq: Add missing error code in devfreq_add_device()
ACPI: PM / fan: Put fan device IDs into separate header file
block: avoid double io accounting for flush request
nvme-pci: look for StorageD3Enable on companion ACPI device instead
ACPI: sysfs: Fix a buffer overrun problem with description_show()
mark pstore-blk as broken
clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG
extcon: extcon-max8997: Fix IRQ freeing at error path
ACPI: APEI: fix synchronous external aborts in user-mode
blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled()
blk-wbt: make sure throttle is enabled properly
ACPI: Use DEVICE_ATTR_<RW|RO|WO> macros
ACPI: bgrt: Fix CFI violation
cpufreq: Make cpufreq_online() call driver->offline() on errors
blk-mq: update hctx->dispatch_busy in case of real scheduler
ocfs2: fix snprintf() checking
dax: fix ENOMEM handling in grab_mapping_entry()
mm/debug_vm_pgtable/basic: add validation for dirtiness after write protect
mm/debug_vm_pgtable/basic: iterate over entire protection_map[]
mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage()
swap: fix do_swap_page() race with swapoff
mm/shmem: fix shmem_swapin() race with swapoff
mm: memcg/slab: properly set up gfp flags for objcg pointer array
mm: page_alloc: refactor setup_per_zone_lowmem_reserve()
mm/page_alloc: fix counting of managed_pages
xfrm: xfrm_state_mtu should return at least 1280 for ipv6
drm/bridge/sii8620: fix dependency on extcon
drm/bridge: Fix the stop condition of drm_bridge_chain_pre_enable()
drm/amd/dc: Fix a missing check bug in dm_dp_mst_detect()
drm/ast: Fix missing conversions to managed API
video: fbdev: imxfb: Fix an error message
net: mvpp2: Put fwnode in error case during ->probe()
net: pch_gbe: Propagate error from devm_gpio_request_one()
pinctrl: renesas: r8a7796: Add missing bias for PRESET# pin
pinctrl: renesas: r8a77990: JTAG pins do not have pull-down capabilities
drm/vmwgfx: Mark a surface gpu-dirty after the SVGA3dCmdDXGenMips command
drm/vmwgfx: Fix cpu updates of coherent multisample surfaces
net: qrtr: ns: Fix error return code in qrtr_ns_init()
clk: meson: g12a: fix gp0 and hifi ranges
net: ftgmac100: add missing error return code in ftgmac100_probe()
drm: rockchip: set alpha_en to 0 if it is not used
drm/rockchip: cdn-dp-core: add missing clk_disable_unprepare() on error in cdn_dp_grf_write()
drm/rockchip: dsi: move all lane config except LCDC mux to bind()
drm/rockchip: lvds: Fix an error handling path
drm/rockchip: cdn-dp: fix sign extension on an int multiply for a u64 result
mptcp: fix pr_debug in mptcp_token_new_connect
mptcp: generate subflow hmac after mptcp_finish_join()
RDMA/srp: Fix a recently introduced memory leak
RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats
RDMA/rtrs: Do not reset hb_missed_max after re-connection
RDMA/rtrs-srv: Fix memory leak of unfreed rtrs_srv_stats object
RDMA/rtrs-srv: Fix memory leak when having multiple sessions
RDMA/rtrs-clt: Check if the queue_depth has changed during a reconnection
RDMA/rtrs-clt: Fix memory leak of not-freed sess->stats and stats->pcpu_stats
ehea: fix error return code in ehea_restart_qps()
clk: tegra30: Use 300MHz for video decoder by default
xfrm: remove the fragment check for ipv6 beet mode
net/sched: act_vlan: Fix modify to allow 0
RDMA/core: Sanitize WQ state received from the userspace
drm/pl111: depend on CONFIG_VEXPRESS_CONFIG
RDMA/rxe: Fix failure during driver load
drm/pl111: Actually fix CONFIG_VEXPRESS_CONFIG depends
drm/vc4: hdmi: Fix error path of hpd-gpios
clk: vc5: fix output disabling when enabling a FOD
drm: qxl: ensure surf.data is ininitialized
tools/bpftool: Fix error return code in do_batch()
ath10k: go to path err_unsupported when chip id is not supported
ath10k: add missing error return code in ath10k_pci_probe()
wireless: carl9170: fix LEDS build errors & warnings
ieee802154: hwsim: Fix possible memory leak in hwsim_subscribe_all_others
clk: imx8mq: remove SYS PLL 1/2 clock gates
wcn36xx: Move hal_buf allocation to devm_kmalloc in probe
ssb: Fix error return code in ssb_bus_scan()
brcmfmac: fix setting of station info chains bitmask
brcmfmac: correctly report average RSSI in station info
brcmfmac: Fix a double-free in brcmf_sdio_bus_reset
brcmsmac: mac80211_if: Fix a resource leak in an error handling path
cw1200: Revert unnecessary patches that fix unreal use-after-free bugs
ath11k: Fix an error handling path in ath11k_core_fetch_board_data_api_n()
ath10k: Fix an error code in ath10k_add_interface()
ath11k: send beacon template after vdev_start/restart during csa
netlabel: Fix memory leak in netlbl_mgmt_add_common
RDMA/mlx5: Don't add slave port to unaffiliated list
netfilter: nft_exthdr: check for IPv6 packet before further processing
netfilter: nft_osf: check for TCP packet before further processing
netfilter: nft_tproxy: restrict support to TCP and UDP transport protocols
RDMA/rxe: Fix qp reference counting for atomic ops
selftests/bpf: Whitelist test_progs.h from .gitignore
xsk: Fix missing validation for skb and unaligned mode
xsk: Fix broken Tx ring validation
bpf: Fix libelf endian handling in resolv_btfids
RDMA/rtrs-srv: Set minimal max_send_wr and max_recv_wr
samples/bpf: Fix Segmentation fault for xdp_redirect command
samples/bpf: Fix the error return code of xdp_redirect's main()
mt76: fix possible NULL pointer dereference in mt76_tx
mt76: mt7615: fix NULL pointer dereference in tx_prepare_skb()
net: ethernet: aeroflex: fix UAF in greth_of_remove
net: ethernet: ezchip: fix UAF in nps_enet_remove
net: ethernet: ezchip: fix error handling
vrf: do not push non-ND strict packets with a source LLA through packet taps again
net: sched: add barrier to ensure correct ordering for lockless qdisc
tls: prevent oversized sendfile() hangs by ignoring MSG_MORE
netfilter: nf_tables_offload: check FLOW_DISSECTOR_KEY_BASIC in VLAN transfer logic
pkt_sched: sch_qfq: fix qfq_change_class() error path
xfrm: Fix xfrm offload fallback fail case
iwlwifi: increase PNVM load timeout
rtw88: 8822c: fix lc calibration timing
vxlan: add missing rcu_read_lock() in neigh_reduce()
ip6_tunnel: fix GRE6 segmentation
net/ipv4: swap flow ports when validating source
net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queues
tc-testing: fix list handling
ieee802154: hwsim: Fix memory leak in hwsim_add_one
ieee802154: hwsim: avoid possible crash in hwsim_del_edge_nl()
bpf: Fix null ptr deref with mixed tail calls and subprogs
drm/msm: Fix error return code in msm_drm_init()
drm/msm/dpu: Fix error return code in dpu_mdss_init()
mac80211: remove iwlwifi specific workaround NDPs of null_response
net: bcmgenet: Fix attaching to PYH failed on RPi 4B
ipv6: exthdrs: do not blindly use init_net
can: j1939: j1939_sk_setsockopt(): prevent allocation of j1939 filter for optlen == 0
bpf: Do not change gso_size during bpf_skb_change_proto()
i40e: Fix error handling in i40e_vsi_open
i40e: Fix autoneg disabling for non-10GBaseT links
i40e: Fix missing rtnl locking when setting up pf switch
Revert "ibmvnic: remove duplicate napi_schedule call in open function"
ibmvnic: set ltb->buff to NULL after freeing
ibmvnic: free tx_pool if tso_pool alloc fails
RDMA/cma: Protect RMW with qp_mutex
net: macsec: fix the length used to copy the key for offloading
net: phy: mscc: fix macsec key length
net: atlantic: fix the macsec key length
ipv6: fix out-of-bound access in ip6_parse_tlv()
e1000e: Check the PCIm state
net: dsa: sja1105: fix NULL pointer dereference in sja1105_reload_cbs()
bpfilter: Specify the log level for the kmsg message
RDMA/cma: Fix incorrect Packet Lifetime calculation
gve: Fix swapped vars when fetching max queues
Revert "be2net: disable bh with spin_lock in be_process_mcc"
Bluetooth: mgmt: Fix slab-out-of-bounds in tlv_data_is_valid
Bluetooth: Fix not sending Set Extended Scan Response
Bluetooth: Fix Set Extended (Scan Response) Data
Bluetooth: Fix handling of HCI_LE_Advertising_Set_Terminated event
clk: actions: Fix UART clock dividers on Owl S500 SoC
clk: actions: Fix SD clocks factor table on Owl S500 SoC
clk: actions: Fix bisp_factor_table based clocks on Owl S500 SoC
clk: actions: Fix AHPPREDIV-H-AHB clock chain on Owl S500 SoC
clk: qcom: clk-alpha-pll: fix CAL_L write in alpha_pll_fabia_prepare
clk: si5341: Wait for DEVICE_READY on startup
clk: si5341: Avoid divide errors due to bogus register contents
clk: si5341: Check for input clock presence and PLL lock on startup
clk: si5341: Update initialization magic
writeback: fix obtain a reference to a freeing memcg css
net: lwtunnel: handle MTU calculation in forwading
net: sched: fix warning in tcindex_alloc_perfect_hash
net: tipc: fix FB_MTU eat two pages
RDMA/mlx5: Don't access NULL-cleared mpi pointer
RDMA/core: Always release restrack object
MIPS: Fix PKMAP with 32-bit MIPS huge page support
staging: fbtft: Rectify GPIO handling
staging: fbtft: Don't spam logs when probe is deferred
ASoC: rt5682: Disable irq on shutdown
rcu: Invoke rcu_spawn_core_kthreads() from rcu_spawn_gp_kthread()
serial: fsl_lpuart: don't modify arbitrary data on lpuart32
serial: fsl_lpuart: remove RTSCTS handling from get_mctrl()
serial: 8250_omap: fix a timeout loop condition
tty: nozomi: Fix a resource leak in an error handling function
mwifiex: re-fix for unaligned accesses
iio: adis_buffer: do not return ints in irq handlers
iio: adis16400: do not return ints in irq handlers
iio: adis16475: do not return ints in irq handlers
iio: accel: bma180: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: bma220: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: hid: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: kxcjk-1013: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: mxc4005: Fix overread of data and alignment issue.
iio: accel: stk8312: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: stk8ba50: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: ti-ads1015: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: vf610: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: gyro: bmg160: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: humidity: am2315: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: srf08: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: pulsed-light: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: as3935: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: magn: hmc5843: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: magn: bmc150: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: light: isl29125: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: light: tcs3414: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: light: tcs3472: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: chemical: atlas: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: cros_ec_sensors: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
iio: potentiostat: lmp91000: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
ASoC: rk3328: fix missing clk_disable_unprepare() on error in rk3328_platform_probe()
ASoC: hisilicon: fix missing clk_disable_unprepare() on error in hi6210_i2s_startup()
backlight: lm3630a_bl: Put fwnode in error case during ->probe()
ASoC: rsnd: tidyup loop on rsnd_adg_clk_query()
Input: hil_kbd - fix error return code in hil_dev_connect()
perf scripting python: Fix tuple_set_u64()
mtd: partitions: redboot: seek fis-index-block in the right node
mtd: rawnand: arasan: Ensure proper configuration for the asserted target
staging: mmal-vchiq: Fix incorrect static vchiq_instance.
char: pcmcia: error out if 'num_bytes_read' is greater than 4 in set_protocol()
firmware: stratix10-svc: Fix a resource leak in an error handling path
tty: nozomi: Fix the error handling path of 'nozomi_card_init()'
leds: class: The -ENOTSUPP should never be seen by user space
leds: lm3532: select regmap I2C API
leds: lm36274: Put fwnode in error case during ->probe()
leds: lm3692x: Put fwnode in any case during ->probe()
leds: lm3697: Don't spam logs when probe is deferred
leds: lp50xx: Put fwnode in error case during ->probe()
scsi: FlashPoint: Rename si_flags field
scsi: iscsi: Flush block work before unblock
mfd: mp2629: Select MFD_CORE to fix build error
mfd: rn5t618: Fix IRQ trigger by changing it to level mode
fsi: core: Fix return of error values on failures
fsi: scom: Reset the FSI2PIB engine for any error
fsi: occ: Don't accept response from un-initialized OCC
fsi/sbefifo: Clean up correct FIFO when receiving reset request from SBE
fsi/sbefifo: Fix reset timeout
visorbus: fix error return code in visorchipset_init()
iommu/amd: Fix extended features logging
s390/irq: select HAVE_IRQ_EXIT_ON_IRQ_STACK
s390: enable HAVE_IOREMAP_PROT
s390: appldata depends on PROC_SYSCTL
selftests: splice: Adjust for handler fallback removal
iommu/dma: Fix IOVA reserve dma ranges
ASoC: max98373-sdw: use first_hw_init flag on resume
ASoC: rt1308-sdw: use first_hw_init flag on resume
ASoC: rt5682-sdw: use first_hw_init flag on resume
ASoC: rt700-sdw: use first_hw_init flag on resume
ASoC: rt711-sdw: use first_hw_init flag on resume
ASoC: rt715-sdw: use first_hw_init flag on resume
ASoC: rt5682: fix getting the wrong device id when the suspend_stress_test
ASoC: rt5682-sdw: set regcache_cache_only false before reading RT5682_DEVICE_ID
ASoC: mediatek: mtk-btcvsd: Fix an error handling path in 'mtk_btcvsd_snd_probe()'
usb: gadget: f_fs: Fix setting of device and driver data cross-references
usb: dwc2: Don't reset the core after setting turnaround time
eeprom: idt_89hpesx: Put fwnode in matching case during ->probe()
eeprom: idt_89hpesx: Restore printing the unsupported fwnode name
thunderbolt: Bond lanes only when dual_link_port != NULL in alloc_dev_default()
iio: adc: at91-sama5d2: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: hx711: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: mxs-lradc: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: ti-ads8688: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
iio: magn: rm3100: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
iio: light: vcnl4000: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
ASoC: fsl_spdif: Fix error handler with pm_runtime_enable
staging: gdm724x: check for buffer overflow in gdm_lte_multi_sdu_pkt()
staging: gdm724x: check for overflow in gdm_lte_netif_rx()
staging: rtl8712: fix error handling in r871xu_drv_init
staging: rtl8712: fix memory leak in rtl871x_load_fw_cb
coresight: core: Fix use of uninitialized pointer
staging: mt7621-dts: fix pci address for PCI memory range
serial: 8250: Actually allow UPF_MAGIC_MULTIPLIER baud rates
iio: light: vcnl4035: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: isl29501: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
ASoC: cs42l42: Correct definition of CS42L42_ADC_PDN_MASK
of: Fix truncation of memory sizes on 32-bit platforms
mtd: rawnand: marvell: add missing clk_disable_unprepare() on error in marvell_nfc_resume()
habanalabs: Fix an error handling path in 'hl_pci_probe()'
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
soundwire: stream: Fix test for DP prepare complete
phy: uniphier-pcie: Fix updating phy parameters
phy: ti: dm816x: Fix the error handling path in 'dm816x_usb_phy_probe()
extcon: sm5502: Drop invalid register write in sm5502_reg_data
extcon: max8997: Add missing modalias string
powerpc/powernv: Fix machine check reporting of async store errors
ASoC: atmel-i2s: Fix usage of capture and playback at the same time
configfs: fix memleak in configfs_release_bin_file
ASoC: Intel: sof_sdw: add SOF_RT715_DAI_ID_FIX for AlderLake
ASoC: fsl_spdif: Fix unexpected interrupt after suspend
leds: as3645a: Fix error return code in as3645a_parse_node()
leds: ktd2692: Fix an error handling path
selftests/ftrace: fix event-no-pid on 1-core machine
serial: 8250: 8250_omap: Disable RX interrupt after DMA enable
serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs
powerpc: Offline CPU in stop_this_cpu()
powerpc/papr_scm: Properly handle UUID types and API
powerpc/64s: Fix copy-paste data exposure into newly created tasks
powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable
ALSA: firewire-lib: Fix 'amdtp_domain_start()' when no AMDTP_OUT_STREAM stream is found
serial: mvebu-uart: do not allow changing baudrate when uartclk is not available
serial: mvebu-uart: correctly calculate minimal possible baudrate
arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART
vfio/pci: Handle concurrent vma faults
mm/pmem: avoid inserting hugepage PTE entry with fsdax if hugepage support is disabled
mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK
mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
mm/huge_memory.c: don't discard hugepage if other processes are mapping it
mm/hugetlb: use helper huge_page_order and pages_per_huge_page
mm/hugetlb: remove redundant check in preparing and destroying gigantic page
hugetlb: remove prep_compound_huge_page cleanup
include/linux/huge_mm.h: remove extern keyword
mm/z3fold: fix potential memory leak in z3fold_destroy_pool()
mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page
lib/math/rational.c: fix divide by zero
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: refill shadow register after implicit kernel write
perf llvm: Return -ENOMEM when asprintf() fails
csky: fix syscache.c fallthrough warning
csky: syscache: Fixup duplicate cache flush
exfat: handle wrong stream entry size in exfat_readdir()
scsi: fc: Correct RHBA attributes length
scsi: target: cxgbit: Unmap DMA buffer before calling target_execute_cmd()
mailbox: qcom-ipcc: Fix IPCC mbox channel exhaustion
fscrypt: don't ignore minor_hash when hash is 0
fscrypt: fix derivation of SipHash keys on big endian CPUs
tpm: Replace WARN_ONCE() with dev_err_once() in tpm_tis_status()
erofs: fix error return code in erofs_read_superblock()
block: return the correct bvec when checking for gaps
io_uring: fix blocking inline submission
mmc: block: Disable CMDQ on the ioctl path
mmc: vub3000: fix control-request direction
media: exynos4-is: remove a now unused integer
scsi: core: Retry I/O for Notify (Enable Spinup) Required error
crypto: qce - fix error return code in qce_skcipher_async_req_handle()
s390: preempt: Fix preempt_count initialization
cred: add missing return error code when set_cred_ucounts() failed
iommu/dma: Fix compile warning in 32-bit builds
powerpc/preempt: Don't touch the idle task's preempt_count during hotplug
Linux 5.10.50
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iec4eab24ea8eb5a6d79739a1aec8432d93a8f82c
|
||
|
|
019d04f914 |
open: don't silently ignore unknown O-flags in openat2()
[ Upstream commit
|
||
|
|
a7a3b31d58 |
ANDROID: syscall_check: add vendor hook for open syscall
Through this vendor hook, we can get the timing to check current running task for the validation of its credential and open operation. Bug: 191291287 Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Change-Id: Ia644ceb02dbc230ee1d25cad3630c2c3f908e41a |
||
|
|
28b4b1588e |
FROMLIST: mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs
Transparent huge pages are supported for read-only non-shmem files,
but are only used for vmas with VM_DENYWRITE. This condition ensures that
file THPs are protected from writes while an application is running
(ETXTBSY). Any existing file THPs are then dropped from the page cache
when a file is opened for write in do_dentry_open(). Since sys_mmap
ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
produced by execve().
Systems that make heavy use of shared libraries (e.g. Android) are unable
to apply VM_DENYWRITE through the dynamic linker, preventing them from
benefiting from the resultant reduced contention on the TLB.
This patch reduces the constraint on file THPs allowing use with any
executable mapping from a file not opened for write (see
inode_is_open_for_write()). It also introduces additional conditions to
ensure that files opened for write will never be backed by file THPs.
Restricting the use of THPs to executable mappings eliminates the risk that
a read-only file later opened for write would encounter significant
latencies due to page cache truncation.
The ld linker flag '-z max-page-size=(hugepage size)' can be used to
produce executables with the necessary layout. The dynamic linker must
map these file's segments at a hugepage size aligned vma for the mapping to
be backed with THPs.
Comparison of the performance characteristics of 4KB and 2MB-backed
libraries follows; the Android dex2oat tool was used to AOT compile an
example application on a single ARM core.
4KB Pages:
==========
count event_name # count / runtime
598,995,035,942 cpu-cycles # 1.800861 GHz
81,195,620,851 raw-stall-frontend # 244.112 M/sec
347,754,466,597 iTLB-loads # 1.046 G/sec
2,970,248,900 iTLB-load-misses # 0.854122% miss rate
Total test time: 332.854998 seconds.
2MB Pages:
==========
count event_name # count / runtime
592,872,663,047 cpu-cycles # 1.800358 GHz
76,485,624,143 raw-stall-frontend # 232.261 M/sec
350,478,413,710 iTLB-loads # 1.064 G/sec
803,233,322 iTLB-load-misses # 0.229182% miss rate
Total test time: 329.826087 seconds
A check of /proc/$(pidof dex2oat64)/smaps shows THPs in use:
/apex/com.android.art/lib64/libart.so
FilePmdMapped: 4096 kB
/apex/com.android.art/lib64/libart-compiler.so
FilePmdMapped: 2048 kB
Bug: 158135888
Link: https://lore.kernel.org/patchwork/patch/1408266/
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
Change-Id: I75c693a4b4e7526d374ef2c010bde3094233eef2
|
||
|
|
cf8f7947f2 |
ANDROID: Add filp_open_block() for zram
We currently plan to disallow use of filp_open() from drivers in GKI, however the ZRAM driver still needs it. Add a new GKI-only variant of filp_open() which only permits a block device to be opened, which can be exported instead. This keeps ZRAM working but cuts down on drivers that attempt to open and write files in kernel mode. Bug: 179220339 Change-Id: Id696b4aaf204b0499ce0a1b6416648670236e570 Signed-off-by: Alistair Delva <adelva@google.com> |
||
|
|
aa606ebab1 |
openat2: reject RESOLVE_BENEATH|RESOLVE_IN_ROOT
commit |
||
|
|
633fb6ac39 |
exec: move S_ISREG() check earlier
The execve(2)/uselib(2) syscalls have always rejected non-regular files.
Recently, it was noticed that a deadlock was introduced when trying to
execute pipes, as the S_ISREG() test was happening too late. This was
fixed in commit
|
||
|
|
e1ec517e18 |
Merge branch 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull init and set_fs() cleanups from Al Viro: "Christoph's 'getting rid of ksys_...() uses under KERNEL_DS' series" * 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (50 commits) init: add an init_dup helper init: add an init_utimes helper init: add an init_stat helper init: add an init_mknod helper init: add an init_mkdir helper init: add an init_symlink helper init: add an init_link helper init: add an init_eaccess helper init: add an init_chmod helper init: add an init_chown helper init: add an init_chroot helper init: add an init_chdir helper init: add an init_rmdir helper init: add an init_unlink helper init: add an init_umount helper init: add an init_mount helper init: mark create_dev as __init init: mark console_on_rootfs as __init init: initialize ramdisk_execute_command at compile time devtmpfs: refactor devtmpfsd() ... |
||
|
|
eb9d7d390e |
init: add an init_eaccess helper
Add a simple helper to check if a file exists based on kernel space file name and switch the early init code over to it. Note that this theoretically changes behavior as it always is based on the effective permissions. But during early init that doesn't make a difference. Signed-off-by: Christoph Hellwig <hch@lst.de> |
||
|
|
1097742efc |
init: add an init_chmod helper
Add a simple helper to chmod with a kernel space file name and switch the early init code over to it. Signed-off-by: Christoph Hellwig <hch@lst.de> |
||
|
|
b873498f99 |
init: add an init_chown helper
Add a simple helper to chown with a kernel space file name and switch the early init code over to it. Signed-off-by: Christoph Hellwig <hch@lst.de> |
||
|
|
4b7ca5014c |
init: add an init_chroot helper
Add a simple helper to chroot with a kernel space file name and switch the early init code over to it. Remove the now unused ksys_chroot. Signed-off-by: Christoph Hellwig <hch@lst.de> |
||
|
|
db63f1e315 |
init: add an init_chdir helper
Add a simple helper to chdir with a kernel space file name and switch the early init code over to it. Remove the now unused ksys_chdir. Signed-off-by: Christoph Hellwig <hch@lst.de> |
||
|
|
b25ba7c3c9 |
fs: remove ksys_fchmod
Fold it into the only remaining caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
166e07c37c |
fs: remove ksys_open
Just open code it in the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9e96c8c0e9 |
fs: add a vfs_fchmod helper
Add a helper for struct file based chmode operations. To be used by the initramfs code soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c04011fe8c |
fs: add a vfs_fchown helper
Add a helper for struct file based chown operations. To be used by the initramfs code soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
60997c3d45
|
close_range: add CLOSE_RANGE_UNSHARE
One of the use-cases of close_range() is to drop file descriptors just before
execve(). This would usually be expressed in the sequence:
unshare(CLONE_FILES);
close_range(3, ~0U);
as pointed out by Linus it might be desirable to have this be a part of
close_range() itself under a new flag CLOSE_RANGE_UNSHARE.
This expands {dup,unshare)_fd() to take a max_fds argument that indicates the
maximum number of file descriptors to copy from the old struct files. When the
user requests that all file descriptors are supposed to be closed via
close_range(min, max) then we can cap via unshare_fd(min) and hence don't need
to do any of the heavy fput() work for everything above min.
The patch makes it so that if CLOSE_RANGE_UNSHARE is requested and we do in
fact currently share our file descriptor table we create a new private copy.
We then close all fds in the requested range and finally after we're done we
install the new fd table.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
||
|
|
278a5fbaed
|
open: add close_range()
This adds the close_range() syscall. It allows to efficiently close a range of file descriptors up to all file descriptors of a calling task. I was contacted by FreeBSD as they wanted to have the same close_range() syscall as we proposed here. We've coordinated this and in the meantime, Kyle was fast enough to merge close_range() into FreeBSD already in April: https://reviews.freebsd.org/D21627 https://svnweb.freebsd.org/base?view=revision&revision=359836 and the current plan is to backport close_range() to FreeBSD 12.2 (cf. [2]) once its merged in Linux too. Python is in the process of switching to close_range() on FreeBSD and they are waiting on us to merge this to switch on Linux as well: https://bugs.python.org/issue38061 The syscall came up in a recent discussion around the new mount API and making new file descriptor types cloexec by default. During this discussion, Al suggested the close_range() syscall (cf. [1]). Note, a syscall in this manner has been requested by various people over time. First, it helps to close all file descriptors of an exec()ing task. This can be done safely via (quoting Al's example from [1] verbatim): /* that exec is sensitive */ unshare(CLONE_FILES); /* we don't want anything past stderr here */ close_range(3, ~0U); execve(....); The code snippet above is one way of working around the problem that file descriptors are not cloexec by default. This is aggravated by the fact that we can't just switch them over without massively regressing userspace. For a whole class of programs having an in-kernel method of closing all file descriptors is very helpful (e.g. demons, service managers, programming language standard libraries, container managers etc.). (Please note, unshare(CLONE_FILES) should only be needed if the calling task is multi-threaded and shares the file descriptor table with another thread in which case two threads could race with one thread allocating file descriptors and the other one closing them via close_range(). For the general case close_range() before the execve() is sufficient.) Second, it allows userspace to avoid implementing closing all file descriptors by parsing through /proc/<pid>/fd/* and calling close() on each file descriptor. From looking at various large(ish) userspace code bases this or similar patterns are very common in: - service managers (cf. [4]) - libcs (cf. [6]) - container runtimes (cf. [5]) - programming language runtimes/standard libraries - Python (cf. [2]) - Rust (cf. [7], [8]) As Dmitry pointed out there's even a long-standing glibc bug about missing kernel support for this task (cf. [3]). In addition, the syscall will also work for tasks that do not have procfs mounted and on kernels that do not have procfs support compiled in. In such situations the only way to make sure that all file descriptors are closed is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery (cf. comment [8] on Rust). The performance is striking. For good measure, comparing the following simple close_all_fds() userspace implementation that is essentially just glibc's version in [6]: static int close_all_fds(void) { int dir_fd; DIR *dir; struct dirent *direntp; dir = opendir("/proc/self/fd"); if (!dir) return -1; dir_fd = dirfd(dir); while ((direntp = readdir(dir))) { int fd; if (strcmp(direntp->d_name, ".") == 0) continue; if (strcmp(direntp->d_name, "..") == 0) continue; fd = atoi(direntp->d_name); if (fd == dir_fd || fd == 0 || fd == 1 || fd == 2) continue; close(fd); } closedir(dir); return 0; } to close_range() yields: 1. closing 4 open files: - close_all_fds(): ~280 us - close_range(): ~24 us 2. closing 1000 open files: - close_all_fds(): ~5000 us - close_range(): ~800 us close_range() is designed to allow for some flexibility. Specifically, it does not simply always close all open file descriptors of a task. Instead, callers can specify an upper bound. This is e.g. useful for scenarios where specific file descriptors are created with well-known numbers that are supposed to be excluded from getting closed. For extra paranoia close_range() comes with a flags argument. This can e.g. be used to implement extension. Once can imagine userspace wanting to stop at the first error instead of ignoring errors under certain circumstances. There might be other valid ideas in the future. In any case, a flag argument doesn't hurt and keeps us on the safe side. From an implementation side this is kept rather dumb. It saw some input from David and Jann but all nonsense is obviously my own! - Errors to close file descriptors are currently ignored. (Could be changed by setting a flag in the future if needed.) - __close_range() is a rather simplistic wrapper around __close_fd(). My reasoning behind this is based on the nature of how __close_fd() needs to release an fd. But maybe I misunderstood specifics: We take the files_lock and rcu-dereference the fdtable of the calling task, we find the entry in the fdtable, get the file and need to release files_lock before calling filp_close(). In the meantime the fdtable might have been altered so we can't just retake the spinlock and keep the old rcu-reference of the fdtable around. Instead we need to grab a fresh reference to the fdtable. If my reasoning is correct then there's really no point in fancyfying __close_range(): We just need to rcu-dereference the fdtable of the calling task once to cap the max_fd value correctly and then go on calling __close_fd() in a loop. /* References */ [1]: https://lore.kernel.org/lkml/20190516165021.GD17978@ZenIV.linux.org.uk/ [2]: |
||
|
|
94709049fb |
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: "A few little subsystems and a start of a lot of MM patches. Subsystems affected by this patch series: squashfs, ocfs2, parisc, vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup, swap, memcg, pagemap, memory-failure, vmalloc, kasan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) kasan: move kasan_report() into report.c mm/mm_init.c: report kasan-tag information stored in page->flags ubsan: entirely disable alignment checks under UBSAN_TRAP kasan: fix clang compilation warning due to stack protector x86/mm: remove vmalloc faulting mm: remove vmalloc_sync_(un)mappings() x86/mm/32: implement arch_sync_kernel_mappings() x86/mm/64: implement arch_sync_kernel_mappings() mm/ioremap: track which page-table levels were modified mm/vmalloc: track which page-table levels were modified mm: add functions to track page directory modifications s390: use __vmalloc_node in stack_alloc powerpc: use __vmalloc_node in alloc_vm_stack arm64: use __vmalloc_node in arch_alloc_vmap_stack mm: remove vmalloc_user_node_flags mm: switch the test_vmalloc module to use __vmalloc_node mm: remove __vmalloc_node_flags_caller mm: remove both instances of __vmalloc_node_flags mm: remove the prot argument to __vmalloc_node mm: remove the pgprot argument to __vmalloc ... |
||
|
|
735e4ae5ba |
vfs: track per-sb writeback errors and report them to syncfs
Patch series "vfs: have syncfs() return error when there are writeback errors", v6. Currently, syncfs does not return errors when one of the inodes fails to be written back. It will return errors based on the legacy AS_EIO and AS_ENOSPC flags when syncing out the block device fails, but that's not particularly helpful for filesystems that aren't backed by a blockdev. It's also possible for a stray sync to lose those errors. The basic idea in this set is to track writeback errors at the superblock level, so that we can quickly and easily check whether something bad happened without having to fsync each file individually. syncfs is then changed to reliably report writeback errors after they occur, much in the same fashion as fsync does now. This patch (of 2): Usually we suggest that applications call fsync when they want to ensure that all data written to the file has made it to the backing store, but that can be inefficient when there are a lot of open files. Calling syncfs on the filesystem can be more efficient in some situations, but the error reporting doesn't currently work the way most people expect. If a single inode on a filesystem reports a writeback error, syncfs won't necessarily return an error. syncfs only returns an error if __sync_blockdev fails, and on some filesystems that's a no-op. It would be better if syncfs reported an error if there were any writeback failures. Then applications could call syncfs to see if there are any errors on any open files, and could then call fsync on all of the other descriptors to figure out which one failed. This patch adds a new errseq_t to struct super_block, and has mapping_set_error also record writeback errors there. To report those errors, we also need to keep an errseq_t in struct file to act as a cursor. This patch adds a dedicated field for that purpose, which slots nicely into 4 bytes of padding at the end of struct file on x86_64. An earlier version of this patch used an O_PATH file descriptor to cue the kernel that the open file should track the superblock error and not the inode's writeback error. I think that API is just too weird though. This is simpler and should make syncfs error reporting "just work" even if someone is multiplexing fsync and syncfs on the same fds. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Andres Freund <andres@anarazel.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Howells <dhowells@redhat.com> Link: http://lkml.kernel.org/r/20200428135155.19223-1-jlayton@kernel.org Link: http://lkml.kernel.org/r/20200428135155.19223-2-jlayton@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c8ffd8bcdd |
vfs: add faccessat2 syscall
POSIX defines faccessat() as having a fourth "flags" argument, while the linux syscall doesn't have it. Glibc tries to emulate AT_EACCESS and AT_SYMLINK_NOFOLLOW, but AT_EACCESS emulation is broken. Add a new faccessat(2) syscall with the added flags argument and implement both flags. The value of AT_EACCESS is defined in glibc headers to be the same as AT_REMOVEDIR. Use this value for the kernel interface as well, together with the explanatory comment. Also add AT_EMPTY_PATH support, which is not documented by POSIX, but can be useful and is trivial to implement. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> |
||
|
|
9470451505 |
vfs: split out access_override_creds()
Split out a helper that overrides the credentials in preparation for actually doing the access check. This prepares for the next patch that optionally disables the creds override. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> |
||
|
|
9c577491b9 |
Merge branch 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pathwalk sanitizing from Al Viro:
"Massive pathwalk rewrite and cleanups.
Several iterations have been posted; hopefully this thing is getting
readable and understandable now. Pretty much all parts of pathname
resolutions are affected...
The branch is identical to what has sat in -next, except for commit
message in "lift all calls of step_into() out of follow_dotdot/
follow_dotdot_rcu", crediting Qian Cai for reporting the bug; only
commit message changed there."
* 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (69 commits)
lookup_open(): don't bother with fallbacks to lookup+create
atomic_open(): no need to pass struct open_flags anymore
open_last_lookups(): move complete_walk() into do_open()
open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open()
open_last_lookups(): don't abuse complete_walk() when all we want is unlazy
open_last_lookups(): consolidate fsnotify_create() calls
take post-lookup part of do_last() out of loop
link_path_walk(): sample parent's i_uid and i_mode for the last component
__nd_alloc_stack(): make it return bool
reserve_stack(): switch to __nd_alloc_stack()
pick_link(): take reserving space on stack into a new helper
pick_link(): more straightforward handling of allocation failures
fold path_to_nameidata() into its only remaining caller
pick_link(): pass it struct path already with normal refcounting rules
fs/namei.c: kill follow_mount()
non-RCU analogue of the previous commit
helper for mount rootwards traversal
follow_dotdot(): be lazy about changing nd->path
follow_dotdot_rcu(): be lazy about changing nd->path
follow_dotdot{,_rcu}(): massage loops
...
|
||
|
|
d9a9f4849f |
cifs_atomic_open(): fix double-put on late allocation failure
several iterations of ->atomic_open() calling conventions ago, we
used to need fput() if ->atomic_open() failed at some point after
successful finish_open(). Now (since 2016) it's not needed -
struct file carries enough state to make fput() work regardless
of the point in struct file lifecycle and discarding it on
failure exits in open() got unified. Unfortunately, I'd missed
the fact that we had an instance of ->atomic_open() (cifs one)
that used to need that fput(), as well as the stale comment in
finish_open() demanding such late failure handling. Trivially
fixed...
Fixes:
|
||
|
|
31d1726d72 |
make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW
O_CREAT | O_EXCL means "-EEXIST if we run into a trailing symlink". As it is, we might or might not have LOOKUP_FOLLOW in op->intent in that case - that depends upon having O_NOFOLLOW in open flags. It doesn't matter, since we won't be checking it in that case - do_last() bails out earlier. However, making sure it's not set (i.e. acting as if we had an explicit O_NOFOLLOW) makes the behaviour more explicit and allows to reorder the check for O_CREAT | O_EXCL in do_last() with the call of step_into() immediately following it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
35cb6d54c1 |
fs: make build_open_flags() available internally
This is a prep patch for supporting non-blocking open from io_uring. Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
fddb5d430a |
open: introduce openat2(2) syscall
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.
In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.
Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).
/* Syscall Prototype. */
/*
* open_how is an extensible structure (similar in interface to
* clone3(2) or sched_setattr(2)). The size parameter must be set to
* sizeof(struct open_how), to allow for future extensions. All future
* extensions will be appended to open_how, with their zero value
* acting as a no-op default.
*/
struct open_how { /* ... */ };
int openat2(int dfd, const char *pathname,
struct open_how *how, size_t size);
/* Description. */
The initial version of 'struct open_how' contains the following fields:
flags
Used to specify openat(2)-style flags. However, any unknown flag
bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
will result in -EINVAL. In addition, this field is 64-bits wide to
allow for more O_ flags than currently permitted with openat(2).
mode
The file mode for O_CREAT or O_TMPFILE.
Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
resolve
Restrict path resolution (in contrast to O_* flags they affect all
path components). The current set of flags are as follows (at the
moment, all of the RESOLVE_ flags are implemented as just passing
the corresponding LOOKUP_ flag).
RESOLVE_NO_XDEV => LOOKUP_NO_XDEV
RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS
RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
RESOLVE_BENEATH => LOOKUP_BENEATH
RESOLVE_IN_ROOT => LOOKUP_IN_ROOT
open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.
Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).
After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.
/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.
In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).
/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).
Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.
Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit
|
||
|
|
2be7d348fe |
Revert "vfs: properly and reliably lock f_pos in fdget_pos()"
This reverts commit
|
||
|
|
0be0ee7181 |
vfs: properly and reliably lock f_pos in fdget_pos()
fdget_pos() is used by file operations that will read and update f_pos:
things like "read()", "write()" and "lseek()" (but not, for example,
"pread()/pwrite" that get their file positions elsewhere).
However, it had two separate escape clauses for this, because not
everybody wants or needs serialization of the file position.
The first and most obvious case is the "file descriptor doesn't have a
position at all", ie a stream-like file. Except we didn't actually use
FMODE_STREAM, but instead used FMODE_ATOMIC_POS. The reason for that
was that FMODE_STREAM didn't exist back in the days, but also that we
didn't want to mark all the special cases, so we only marked the ones
that _required_ position atomicity according to POSIX - regular files
and directories.
The case one was intentionally lazy, but now that we _do_ have
FMODE_STREAM we could and should just use it. With the change to use
FMODE_STREAM, there are no remaining uses for FMODE_ATOMIC_POS, and all
the code to set it is deleted.
Any cases where we don't want the serialization because the driver (or
subsystem) doesn't use the file position should just be updated to do
"stream_open()". We've done that for all the obvious and common
situations, we may need a few more. Quoting Kirill Smelkov in the
original FMODE_STREAM thread (see link below for full email):
"And I appreciate if people could help at least somehow with "getting
rid of mixed case entirely" (i.e. always lock f_pos_lock on
!FMODE_STREAM), because this transition starts to diverge from my
particular use-case too far. To me it makes sense to do that
transition as follows:
- convert nonseekable_open -> stream_open via stream_open.cocci;
- audit other nonseekable_open calls and convert left users that
truly don't depend on position to stream_open;
- extend stream_open.cocci to analyze alloc_file_pseudo as well (this
will cover pipes and sockets), or maybe convert pipes and sockets
to FMODE_STREAM manually;
- extend stream_open.cocci to analyze file_operations that use
no_llseek or noop_llseek, but do not use nonseekable_open or
alloc_file_pseudo. This might find files that have stream semantic
but are opened differently;
- extend stream_open.cocci to analyze file_operations whose
.read/.write do not use ppos at all (independently of how file was
opened);
- ...
- after that remove FMODE_ATOMIC_POS and always take f_pos_lock if
!FMODE_STREAM;
- gather bug reports for deadlocked read/write and convert missed
cases to FMODE_STREAM, probably extending stream_open.cocci along
the road to catch similar cases
i.e. always take f_pos_lock unless a file is explicitly marked as
being stream, and try to find and cover all files that are streams"
We have not done the "extend stream_open.cocci to analyze
alloc_file_pseudo" as well, but the previous commit did manually handle
the case of pipes and sockets.
The other case where we can avoid locking f_pos is the "this file
descriptor only has a single user and it is us, and thus there is no
need to lock it".
The second test was correct, although a bit subtle and worth just
re-iterating here. There are two kinds of other sources of references
to the same file descriptor: file descriptors that have been explicitly
shared across fork() or with dup(), and file tables having elevated
reference counts due to threading (or explicit file sharing with
clone()).
The first case would have incremented the file count explicitly, and in
the second case the previous __fdget() would have incremented it for us
and set the FDPUT_FPUT flag.
But in both cases the file count would be greater than one, so the
"file_count(file) > 1" test catches both situations. Also note that if
file_count is 1, that also means that no other thread can have access to
the file table, so there also cannot be races with concurrent calls to
dup()/fork()/clone() that would increment the file count any other way.
Link: https://lore.kernel.org/linux-fsdevel/20190413184404.GA13490@deco.navytux.spb.ru
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Eic Dumazet <edumazet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Marco Elver <elver@google.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
7159d54418 |
fs: remove unlikely() from WARN_ON() condition
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely() internally. Link: http://lkml.kernel.org/r/20190829165025.15750-5-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
09d91cda0e |
mm,thp: avoid writes to file with THP in pagecache
In previous patch, an application could put part of its text section in THP via madvise(). These THPs will be protected from writes when the application is still running (TXTBSY). However, after the application exits, the file is available for writes. This patch avoids writes to file THP by dropping page cache for the file when the file is open for write. A new counter nr_thps is added to struct address_space. In do_dentry_open(), if the file is open for write and nr_thps is non-zero, we drop page cache for the whole file. Link: http://lkml.kernel.org/r/20190801184244.3169074-8-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
d7852fbd0f |
access: avoid the RCU grace period for the temporary subjective credentials
It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU work because it installs a temporary credential that gets allocated and freed for each system call. The allocation and freeing overhead is mostly benign, but because credentials can be accessed under the RCU read lock, the freeing involves a RCU grace period. Which is not a huge deal normally, but if you have a lot of access() calls, this causes a fair amount of seconday damage: instead of having a nice alloc/free patterns that hits in hot per-CPU slab caches, you have all those delayed free's, and on big machines with hundreds of cores, the RCU overhead can end up being enormous. But it turns out that all of this is entirely unnecessary. Exactly because access() only installs the credential as the thread-local subjective credential, the temporary cred pointer doesn't actually need to be RCU free'd at all. Once we're done using it, we can just free it synchronously and avoid all the RCU overhead. So add a 'non_rcu' flag to 'struct cred', which can be set by users that know they only use it in non-RCU context (there are other potential users for this). We can make it a union with the rcu freeing list head that we need for the RCU case, so this doesn't need any extra storage. Note that this also makes 'get_current_cred()' clear the new non_rcu flag, in case we have filesystems that take a long-term reference to the cred and then expect the RCU delayed freeing afterwards. It's not entirely clear that this is required, but it makes for clear semantics: the subjective cred remains non-RCU as long as you only access it synchronously using the thread-local accessors, but you _can_ use it as a generic cred if you want to. It is possible that we should just remove the whole RCU markings for ->cred entirely. Only ->real_cred is really supposed to be accessed through RCU, and the long-term cred copies that nfs uses might want to explicitly re-enable RCU freeing if required, rather than have get_current_cred() do it implicitly. But this is a "minimal semantic changes" change for the immediate problem. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jan Glauber <jglauber@marvell.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com> Cc: Greg KH <greg@kroah.com> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
457c899653 |
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
438ab720c6 |
vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files
This amends commit |
||
|
|
10dce8af34 |
fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
Commit |
||
|
|
73601ea5b7 |
fs/open.c: allow opening only regular files during execve()
syzbot is hitting lockdep warning [1] due to trying to open a fifo
during an execve() operation. But we don't need to open non regular
files during an execve() operation, for all files which we will need are
the executable file itself and the interpreter programs like /bin/sh and
ld-linux.so.2 .
Since the manpage for execve(2) says that execve() returns EACCES when
the file or a script interpreter is not a regular file, and the manpage
for uselib(2) says that uselib() can return EACCES, and we use
FMODE_EXEC when opening for execve()/uselib(), we can bail out if a non
regular file is requested with FMODE_EXEC set.
Since this deadlock followed by khungtaskd warnings is trivially
reproducible by a local unprivileged user, and syzbot's frequent crash
due to this deadlock defers finding other bugs, let's workaround this
deadlock until we get a chance to find a better solution.
[1] https://syzkaller.appspot.com/bug?id=b5095bfec44ec84213bac54742a82483aad578ce
Link: http://lkml.kernel.org/r/1552044017-7890-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Reported-by: syzbot <syzbot+e93a80c1bb7c5c56e522461c149f8bf55eab1b2b@syzkaller.appspotmail.com>
Fixes:
|
||
|
|
d9a185f8b4 |
overlayfs update for 4.19
This contains two new features:
1) Stack file operations: this allows removal of several hacks from the
VFS, proper interaction of read-only open files with copy-up,
possibility to implement fs modifying ioctls properly, and others.
2) Metadata only copy-up: when file is on lower layer and only metadata is
modified (except size) then only copy up the metadata and continue to
use the data from the lower file.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCW3srhAAKCRDh3BK/laaZ
PC6tAQCP+KklcN+TvNp502f+O/kATahSpgnun4NY1/p4I8JV+AEAzdlkTN3+MiAO
fn9brN6mBK7h59DO3hqedPLJy2vrgwg=
=QDXH
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
"This contains two new features:
- Stack file operations: this allows removal of several hacks from
the VFS, proper interaction of read-only open files with copy-up,
possibility to implement fs modifying ioctls properly, and others.
- Metadata only copy-up: when file is on lower layer and only
metadata is modified (except size) then only copy up the metadata
and continue to use the data from the lower file"
* tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits)
ovl: Enable metadata only feature
ovl: Do not do metacopy only for ioctl modifying file attr
ovl: Do not do metadata only copy-up for truncate operation
ovl: add helper to force data copy-up
ovl: Check redirect on index as well
ovl: Set redirect on upper inode when it is linked
ovl: Set redirect on metacopy files upon rename
ovl: Do not set dentry type ORIGIN for broken hardlinks
ovl: Add an inode flag OVL_CONST_INO
ovl: Treat metacopy dentries as type OVL_PATH_MERGE
ovl: Check redirects for metacopy files
ovl: Move some dir related ovl_lookup_single() code in else block
ovl: Do not expose metacopy only dentry from d_real()
ovl: Open file with data except for the case of fsync
ovl: Add helper ovl_inode_realdata()
ovl: Store lower data inode in ovl_inode
ovl: Fix ovl_getattr() to get number of blocks from lower
ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
ovl: Copy up meta inode data from lowest data inode
ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
...
|
||
|
|
8cf9ee5061 |
Revert "vfs: do get_write_access() on upper layer of overlayfs"
This reverts commit
|
||
|
|
4ab30319fd |
Revert "vfs: add flags to d_real()"
This reverts commit
|
||
|
|
6742cee043 |
Revert "ovl: don't allow writing ioctl on lower layer"
This reverts commit
|
||
|
|
a6518f73e6 |
vfs: don't open real
Let overlayfs do its thing when opening a file. This enables stacking and fixes the corner case when a file is opened for read, modified through a writable open, and data is read from the read-only file. After this patch the read-only open will not return stale data even in this case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
d3b1084dfd |
vfs: make open_with_fake_path() not contribute to nr_files
Stacking file operations in overlay will store an extra open file for each overlay file opened. The overhead is just that of "struct file" which is about 256bytes, because overlay already pins an extra dentry and inode when the file is open, which add up to a much larger overhead. For fear of breaking working setups, don't start accounting the extra file. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> |
||
|
|
2abc77af89 |
new helper: open_with_fake_path()
open a file by given inode, faking ->f_path. Use with shitloads of caution - at the very least you'd damn better make sure that some dentry alias of that inode is pinned down by the path in question. Again, this is no general-purpose interface and I hope it will eventually go away. Right now overlayfs wants something like that, but nothing else should. Any out-of-tree code with bright idea of using this one *will* eventually get hurt, with zero notice and great delight on my part. I refuse to use EXPORT_SYMBOL_GPL(), especially in situations when it's really EXPORT_SYMBOL_DONT_USE_IT(), but don't take that export as "you are welcome to use it". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
64e1ac4d46 |
->atomic_open(): return 0 in all success cases
FMODE_OPENED can be used to distingusish "successful open" from the "called finish_no_open(), do it yourself" cases. Since finish_no_open() has been adjusted, no changes in the instances were actually needed. The caller has been adjusted. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
be12af3ef5 |
getting rid of 'opened' argument of ->atomic_open() - part 1
'opened' argument of finish_open() is unused. Kill it. Signed-off-by Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
aad888f828 |
switch all remaining checks for FILE_OPENED to FMODE_OPENED
... and don't bother with setting FILE_OPENED at all. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
69527c554f |
now we can fold open_check_o_direct() into do_dentry_open()
These checks are better off in do_dentry_open(); the reason we couldn't put them there used to be that callers couldn't tell what kind of cleanup would do_dentry_open() failure call for. Now that we have FMODE_OPENED, cleanup is the same in all cases - it's simply fput(). So let's fold that into do_dentry_open(), as Christoph's patch tried to. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
4d27f3266f |
fold put_filp() into fput()
Just check FMODE_OPENED in __fput() and be done with that... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
f5d11409e6 |
introduce FMODE_OPENED
basically, "is that instance set up enough for regular fput(), or do we want put_filp() for that one". NOTE: the only alloc_file() caller that could be followed by put_filp() is in arch/ia64/kernel/perfmon.c, which is (Kconfig-level) broken. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |