Commit Graph

509 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
1b71a028a2 This is the 5.10.84 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmGwZtwACgkQONu9yGCS
 aT7dQhAAjnlFKkXb+omHKQNkSHbD0ynEkxwtQfNFt1kcWcpJy5Df9xyNXQBohnqr
 Y0KUowpVF8gkXOelbdMrK5P6k28SpT2k+UMnUtZLNR6qMNlOY371BDasfi/dqWWR
 1JdLtQe6JVvwxo+6INRqEO27Ocyc1PbLZSo7i3Ik2+7mIRjN7+k1apFG0HOLEHIP
 3oMWDgnyQp3gTBvTFG0Vrd4f9AwrHq4JoVrhruNLqIYajlQ8dPPjuJ9alTifRddD
 eWY10Z21jAFib4WHgy6wXBVv3L5Np19liYMzv02o5pzFV1nLJCnKDA79jV7a2i2H
 lVmVpcWG0Yagyu8MW0hmOewqPXpAJH/C8g75mXeja546vCnvccNx1OXNOR4ux5Es
 IpEFAV+DnjSYgu88Cw6kF8j/B9x1n90sgywCWbRwAMJ1zX9/tvvLWSe8HpfZ2jvo
 Iuw6XDTL84DDuHY4yiK2fofxZvXp+Hk+c0Betu6GoQvoGaDRD8IWIceDWgiqy+V7
 fOrLitl8lbk1yjD7bDZMpEIgzQaaxJu6d+YWzy+PibZxQzOKHPC5gqEmajJd7ZWm
 OJ48SrNxyfjRZP/3NBgXOxje3lz3WkCdiPQrSQOQxoe+kdW5ZFuXDapWSO4dZfSe
 6XPOD/d+KVLNDQby3WnVB2MMlufHFnCs4wPgb13jfyiEbxifp+A=
 =D40k
 -----END PGP SIGNATURE-----

Merge 5.10.84 into android12-5.10-lts

Changes in 5.10.84
	NFSv42: Fix pagecache invalidation after COPY/CLONE
	can: j1939: j1939_tp_cmd_recv(): check the dst address of TP.CM_BAM
	ovl: simplify file splice
	ovl: fix deadlock in splice write
	gfs2: release iopen glock early in evict
	gfs2: Fix length of holes reported at end-of-file
	powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
	drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY
	mac80211: do not access the IV when it was stripped
	net/smc: Transfer remaining wait queue entries during fallback
	atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait
	net: return correct error code
	platform/x86: thinkpad_acpi: Add support for dual fan control
	platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep
	s390/setup: avoid using memblock_enforce_memory_limit
	btrfs: check-integrity: fix a warning on write caching disabled disk
	thermal: core: Reset previous low and high trip during thermal zone init
	scsi: iscsi: Unblock session then wake up error handler
	drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
	drm/amd/amdgpu: fix potential memleak
	ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile
	ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
	ipv6: check return value of ipv6_skip_exthdr
	net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
	net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
	perf inject: Fix ARM SPE handling
	perf hist: Fix memory leak of a perf_hpp_fmt
	perf report: Fix memory leaks around perf_tip()
	net/smc: Avoid warning of possible recursive locking
	ACPI: Add stubs for wakeup handler functions
	vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
	kprobes: Limit max data_size of the kretprobe instances
	rt2x00: do not mark device gone on EPROTO errors during start
	ipmi: Move remove_work to dedicated workqueue
	cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()
	s390/pci: move pseudo-MMIO to prevent MIO overlap
	fget: check that the fd still exists after getting a ref to it
	sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl
	sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl
	ipv6: fix memory leak in fib6_rule_suppress
	drm/amd/display: Allow DSC on supported MST branch devices
	KVM: Disallow user memslot with size that exceeds "unsigned long"
	KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST
	KVM: x86: Use a stable condition around all VT-d PI paths
	KVM: arm64: Avoid setting the upper 32 bits of TCR_EL2 and CPTR_EL2 to 1
	KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()
	tracing/histograms: String compares should not care about signed values
	wireguard: selftests: increase default dmesg log size
	wireguard: allowedips: add missing __rcu annotation to satisfy sparse
	wireguard: selftests: actually test for routing loops
	wireguard: selftests: rename DEBUG_PI_LIST to DEBUG_PLIST
	wireguard: device: reset peer src endpoint when netns exits
	wireguard: receive: use ring buffer for incoming handshakes
	wireguard: receive: drop handshakes if queue lock is contended
	wireguard: ratelimiter: use kvcalloc() instead of kvzalloc()
	i2c: stm32f7: flush TX FIFO upon transfer errors
	i2c: stm32f7: recover the bus on access timeout
	i2c: stm32f7: stop dma transfer in case of NACK
	i2c: cbus-gpio: set atomic transfer callback
	natsemi: xtensa: fix section mismatch warnings
	tcp: fix page frag corruption on page fault
	net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
	net: mpls: Fix notifications when deleting a device
	siphash: use _unaligned version by default
	arm64: ftrace: add missing BTIs
	net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
	selftests: net: Correct case name
	mt76: mt7915: fix NULL pointer dereference in mt7915_get_phy_mode
	ASoC: tegra: Fix wrong value type in ADMAIF
	ASoC: tegra: Fix wrong value type in I2S
	ASoC: tegra: Fix wrong value type in DMIC
	ASoC: tegra: Fix wrong value type in DSPK
	ASoC: tegra: Fix kcontrol put callback in ADMAIF
	ASoC: tegra: Fix kcontrol put callback in I2S
	ASoC: tegra: Fix kcontrol put callback in DMIC
	ASoC: tegra: Fix kcontrol put callback in DSPK
	ASoC: tegra: Fix kcontrol put callback in AHUB
	rxrpc: Fix rxrpc_peer leak in rxrpc_look_up_bundle()
	rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer()
	ALSA: intel-dsp-config: add quirk for CML devices based on ES8336 codec
	net: usb: lan78xx: lan78xx_phy_init(): use PHY_POLL instead of "0" if no IRQ is available
	net: marvell: mvpp2: Fix the computation of shared CPUs
	dpaa2-eth: destroy workqueue at the end of remove function
	net: annotate data-races on txq->xmit_lock_owner
	ipv4: convert fib_num_tclassid_users to atomic_t
	net/smc: fix wrong list_del in smc_lgr_cleanup_early
	net/rds: correct socket tunable error in rds_tcp_tune()
	net/smc: Keep smc_close_final rc during active close
	drm/msm/a6xx: Allocate enough space for GMU registers
	drm/msm: Do hw_init() before capturing GPU state
	atlantic: Increase delay for fw transactions
	atlatnic: enable Nbase-t speeds with base-t
	atlantic: Fix to display FW bundle version instead of FW mac version.
	atlantic: Add missing DIDs and fix 115c.
	Remove Half duplex mode speed capabilities.
	atlantic: Fix statistics logic for production hardware
	atlantic: Remove warn trace message.
	KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
	KVM: VMX: Set failure code in prepare_vmcs02()
	x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
	x86/entry: Use the correct fence macro after swapgs in kernel CR3
	x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
	sched/uclamp: Fix rq->uclamp_max not set on first enqueue
	x86/pv: Switch SWAPGS to ALTERNATIVE
	x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
	parisc: Fix KBUILD_IMAGE for self-extracting kernel
	parisc: Fix "make install" on newer debian releases
	vgacon: Propagate console boot parameters before calling `vc_resize'
	xhci: Fix commad ring abort, write all 64 bits to CRCR register.
	USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub
	usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect
	x86/tsc: Add a timer to make sure TSC_adjust is always checked
	x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
	x86/64/mm: Map all kernel memory into trampoline_pgd
	tty: serial: msm_serial: Deactivate RX DMA for polling support
	serial: pl011: Add ACPI SBSA UART match id
	serial: tegra: Change lower tolerance baud rate limit for tegra20 and tegra30
	serial: core: fix transmit-buffer reset and memleak
	serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array
	serial: 8250_pci: rewrite pericom_do_set_divisor()
	serial: 8250: Fix RTS modem control while in rs485 mode
	iwlwifi: mvm: retry init flow if failed
	parisc: Mark cr16 CPU clocksource unstable on all SMP machines
	net/tls: Fix authentication failure in CCM mode
	ipmi: msghandler: Make symbol 'remove_work_wq' static
	Linux 5.10.84

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iad592da28c6425dea7dca35b229d14c44edb412d
2021-12-08 09:41:05 +01:00
Eric Dumazet
9a40a1e0eb ipv4: convert fib_num_tclassid_users to atomic_t
commit 213f5f8f31 upstream.

Before commit faa041a40b ("ipv4: Create cleanup helper for fib_nh")
changes to net->ipv4.fib_num_tclassid_users were protected by RTNL.

After the change, this is no longer the case, as free_fib_info_rcu()
runs after rcu grace period, without rtnl being held.

Fixes: faa041a40b ("ipv4: Create cleanup helper for fib_nh")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08 09:03:26 +01:00
Greg Kroah-Hartman
c80c82c899 Revert "xfrm: Fix RCU vs hash_resize_mutex lock inversion"
This reverts commit f08b2d078c which is
commit 2580d3f400 upstream.

It breaks the abi and should not be needed for Android systems.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I13a6e3459cc2c7fb31b4cbc1185a9b463fc61c6e
2021-08-12 20:16:24 +02:00
Greg Kroah-Hartman
af3bdb4304 This is the 5.10.58 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEVBn4ACgkQONu9yGCS
 aT5CoQ//ZpZKkAu4NzUKmFfzJjpTwRWSg4YKGHx0BSt/ovuw8SXQa3xAx12r1U9w
 5hXxkCQncCPSul1eIRNmpwVIUavRvyS2Vll//XtT695MA0e9sK9nrXoYgqwUE/gf
 hqgytJAb7DHk5pcLlyfCncXNyw//9WIl8WCNSppZgSnRRk+smfThnEVE6pcgQok5
 4vIy4RXSK12/Xs9jG+q+pqcvQ3ILscCrzQyfxBu5AWePlm8kaM+6cxuL3b+S0COH
 Gg0YLPrTQjLdQ+oNnh93s2fFk0bqeqt/KDMI9xGt00hNwaTBHxgoxj04AFQEgkft
 APwvezabNBojmDZasMq9h5sxGoloxLe2vD+fAWGA3KVTJ6vlKYmhx7+X6jtAJP5Y
 HHNUiYg58VnkYzxT1Ok8CUHocbaFhlFjCqjDUg1oo93XuZJaTgO63q8yeI49pN7S
 xZyim8X/rdouW1WG+7pubOe1E+nZudZDooWOs5CxyJmO1jb3CpMjfl3rRdWG4PI4
 7gC8IG8xmgC9CeGen5egAPqARwBW0d4AFDHT5tY4yF8ovbkO437hYHKvJjZDY919
 v8fqeI95nRrsawXH+GdIMlwOf/PzkODiZc2EwSh/uPnbwPPeZY7a7YQwMwzyxRfb
 Q3dOrapi5k1aoC7583ReIY5RPqc82a4LAdmTN7YN60hiXHkm0bg=
 =+Sab
 -----END PGP SIGNATURE-----

Merge 5.10.58 into android12-5.10-lts

Changes in 5.10.58
	Revert "ACPICA: Fix memory leak caused by _CID repair function"
	ALSA: seq: Fix racy deletion of subscriber
	bus: ti-sysc: Fix gpt12 system timer issue with reserved status
	net: xfrm: fix memory leak in xfrm_user_rcv_msg
	arm64: dts: ls1028a: fix node name for the sysclk
	ARM: imx: add missing iounmap()
	ARM: imx: add missing clk_disable_unprepare()
	ARM: dts: imx6qdl-sr-som: Increase the PHY reset duration to 10ms
	arm64: dts: ls1028: sl28: fix networking for variant 2
	ARM: dts: colibri-imx6ull: limit SDIO clock to 25MHz
	ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init
	ARM: dts: imx: Swap M53Menlo pinctrl_power_button/pinctrl_power_out pins
	arm64: dts: armada-3720-turris-mox: fixed indices for the SDHC controllers
	arm64: dts: armada-3720-turris-mox: remove mrvl,i2c-fast-mode
	ALSA: usb-audio: fix incorrect clock source setting
	clk: stm32f4: fix post divisor setup for I2S/SAI PLLs
	ARM: dts: am437x-l4: fix typo in can@0 node
	omap5-board-common: remove not physically existing vdds_1v8_main fixed-regulator
	dmaengine: uniphier-xdmac: Use readl_poll_timeout_atomic() in atomic state
	clk: tegra: Implement disable_unused() of tegra_clk_sdmmc_mux_ops
	dmaengine: stm32-dma: Fix PM usage counter imbalance in stm32 dma ops
	dmaengine: stm32-dmamux: Fix PM usage counter unbalance in stm32 dmamux ops
	spi: imx: mx51-ecspi: Reinstate low-speed CONFIGREG delay
	spi: imx: mx51-ecspi: Fix low-speed CONFIGREG delay calculation
	scsi: sr: Return correct event when media event code is 3
	media: videobuf2-core: dequeue if start_streaming fails
	ARM: dts: stm32: Disable LAN8710 EDPD on DHCOM
	ARM: dts: stm32: Fix touchscreen IRQ line assignment on DHCOM
	dmaengine: imx-dma: configure the generic DMA type to make it work
	net, gro: Set inner transport header offset in tcp/udp GRO hook
	net: dsa: sja1105: overwrite dynamic FDB entries with static ones in .port_fdb_add
	net: dsa: sja1105: invalidate dynamic FDB entries learned concurrently with statically added ones
	net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too
	net: dsa: sja1105: match FDB entries regardless of inner/outer VLAN tag
	net: phy: micrel: Fix detection of ksz87xx switch
	net: natsemi: Fix missing pci_disable_device() in probe and remove
	gpio: tqmx86: really make IRQ optional
	RDMA/mlx5: Delay emptying a cache entry when a new MR is added to it recently
	sctp: move the active_key update after sh_keys is added
	nfp: update ethtool reporting of pauseframe control
	net: ipv6: fix returned variable type in ip6_skb_dst_mtu
	net: dsa: qca: ar9331: reorder MDIO write sequence
	net: sched: fix lockdep_set_class() typo error for sch->seqlock
	MIPS: check return value of pgtable_pmd_page_ctor
	mips: Fix non-POSIX regexp
	bnx2x: fix an error code in bnx2x_nic_load()
	net: pegasus: fix uninit-value in get_interrupt_interval
	net: fec: fix use-after-free in fec_drv_remove
	net: vxge: fix use-after-free in vxge_device_unregister
	blk-iolatency: error out if blk_get_queue() failed in iolatency_set_limit()
	Bluetooth: defer cleanup of resources in hci_unregister_dev()
	USB: usbtmc: Fix RCU stall warning
	USB: serial: option: add Telit FD980 composition 0x1056
	USB: serial: ch341: fix character loss at high transfer rates
	USB: serial: ftdi_sio: add device ID for Auto-M3 OP-COM v2
	firmware_loader: use -ETIMEDOUT instead of -EAGAIN in fw_load_sysfs_fallback
	firmware_loader: fix use-after-free in firmware_fallback_sysfs
	drm/amdgpu/display: fix DMUB firmware version info
	ALSA: pcm - fix mmap capability check for the snd-dummy driver
	ALSA: hda/realtek: add mic quirk for Acer SF314-42
	ALSA: hda/realtek: Fix headset mic for Acer SWIFT SF314-56 (ALC256)
	ALSA: usb-audio: Fix superfluous autosuspend recovery
	ALSA: usb-audio: Add registration quirk for JBL Quantum 600
	usb: dwc3: gadget: Avoid runtime resume if disabling pullup
	usb: gadget: remove leaked entry from udc driver list
	usb: cdns3: Fixed incorrect gadget state
	usb: gadget: f_hid: added GET_IDLE and SET_IDLE handlers
	usb: gadget: f_hid: fixed NULL pointer dereference
	usb: gadget: f_hid: idle uses the highest byte for duration
	usb: host: ohci-at91: suspend/resume ports after/before OHCI accesses
	usb: typec: tcpm: Keep other events when receiving FRS and Sourcing_vbus events
	usb: otg-fsm: Fix hrtimer list corruption
	clk: fix leak on devm_clk_bulk_get_all() unwind
	scripts/tracing: fix the bug that can't parse raw_trace_func
	tracing / histogram: Give calculation hist_fields a size
	tracing: Reject string operand in the histogram expression
	tracing: Fix NULL pointer dereference in start_creating
	tracepoint: static call: Compare data on transition from 2->1 callees
	tracepoint: Fix static call function vs data state mismatch
	arm64: stacktrace: avoid tracing arch_stack_walk()
	optee: Clear stale cache entries during initialization
	tee: add tee_shm_alloc_kernel_buf()
	optee: Fix memory leak when failing to register shm pages
	optee: Refuse to load the driver under the kdump kernel
	optee: fix tee out of memory failure seen during kexec reboot
	tpm_ftpm_tee: Free and unregister TEE shared memory during kexec
	staging: rtl8723bs: Fix a resource leak in sd_int_dpc
	staging: rtl8712: get rid of flush_scheduled_work
	staging: rtl8712: error handling refactoring
	drivers core: Fix oops when driver probe fails
	media: rtl28xxu: fix zero-length control request
	pipe: increase minimum default pipe size to 2 pages
	ext4: fix potential htree corruption when growing large_dir directories
	serial: tegra: Only print FIFO error message when an error occurs
	serial: 8250_mtk: fix uart corruption issue when rx power off
	serial: 8250: Mask out floating 16/32-bit bus bits
	MIPS: Malta: Do not byte-swap accesses to the CBUS UART
	serial: 8250_pci: Enumerate Elkhart Lake UARTs via dedicated driver
	serial: 8250_pci: Avoid irq sharing for MSI(-X) interrupts.
	fpga: dfl: fme: Fix cpu hotplug issue in performance reporting
	timers: Move clearing of base::timer_running under base:: Lock
	xfrm: Fix RCU vs hash_resize_mutex lock inversion
	net/xfrm/compat: Copy xfrm_spdattr_type_t atributes
	pcmcia: i82092: fix a null pointer dereference bug
	selinux: correct the return value when loads initial sids
	bus: ti-sysc: AM3: RNG is GP only
	Revert "gpio: mpc8xxx: change the gpio interrupt flags."
	ARM: omap2+: hwmod: fix potential NULL pointer access
	md/raid10: properly indicate failure when ending a failed write request
	KVM: x86: accept userspace interrupt only if no event is injected
	KVM: Do not leak memory for duplicate debugfs directories
	KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
	arm64: vdso: Avoid ISB after reading from cntvct_el0
	soc: ixp4xx: fix printing resources
	interconnect: Fix undersized devress_alloc allocation
	spi: meson-spicc: fix memory leak in meson_spicc_remove
	interconnect: Zero initial BW after sync-state
	interconnect: Always call pre_aggregate before aggregate
	interconnect: qcom: icc-rpmh: Ensure floor BW is enforced for all nodes
	drm/i915: Correct SFC_DONE register offset
	soc: ixp4xx/qmgr: fix invalid __iomem access
	perf/x86/amd: Don't touch the AMD64_EVENTSEL_HOSTONLY bit inside the guest
	sched/rt: Fix double enqueue caused by rt_effective_prio
	drm/i915: avoid uninitialised var in eb_parse()
	libata: fix ata_pio_sector for CONFIG_HIGHMEM
	reiserfs: add check for root_inode in reiserfs_fill_super
	reiserfs: check directory items on read from disk
	virt_wifi: fix error on connect
	net: qede: Fix end of loop tests for list_for_each_entry
	alpha: Send stop IPI to send to online CPUs
	net/qla3xxx: fix schedule while atomic in ql_wait_for_drvr_lock and ql_adapter_reset
	smb3: rc uninitialized in one fallocate path
	drm/amdgpu/display: only enable aux backlight control for OLED panels
	arm64: fix compat syscall return truncation
	Linux 5.10.58

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2533667974c9dff419a14d63e0e8febfb3de80f1
2021-08-12 14:58:34 +02:00
Frederic Weisbecker
f08b2d078c xfrm: Fix RCU vs hash_resize_mutex lock inversion
commit 2580d3f400 upstream.

xfrm_bydst_resize() calls synchronize_rcu() while holding
hash_resize_mutex. But then on PREEMPT_RT configurations,
xfrm_policy_lookup_bytype() may acquire that mutex while running in an
RCU read side critical section. This results in a deadlock.

In fact the scope of hash_resize_mutex is way beyond the purpose of
xfrm_policy_lookup_bytype() to just fetch a coherent and stable policy
for a given destination/direction, along with other details.

The lower level net->xfrm.xfrm_policy_lock, which among other things
protects per destination/direction references to policy entries, is
enough to serialize and benefit from priority inheritance against the
write side. As a bonus, it makes it officially a per network namespace
synchronization business where a policy table resize on namespace A
shouldn't block a policy lookup on namespace B.

Fixes: 77cc278f7b (xfrm: policy: Use sequence counters with associated lock)
Cc: stable@vger.kernel.org
Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Varad Gautam <varad.gautam@suse.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-12 13:22:16 +02:00
Greg Kroah-Hartman
37485a3025 ANDROID: add kabi padding for structures for the android12 release
There are a lot of different structures that need to have a "frozen" abi
for the next 5+ years.  Add padding to a lot of them in order to be able
to handle any future changes that might be needed due to LTS and
security fixes that might come up.

It's a best guess, based on what has happened in the past from the
5.4.0..5.4.129 release (1 1/2 years).  Yes, past changes do not mean
that future changes will also be needed in the same area, but that is a
hint that those areas are both well maintained and looked after, and
there have been previous problems found in them.

Also the list of structures that are being required based on OEM usage
in the android/ symbol lists were consulted as that's a larger list than
what has been changed in the past.

Hopefully we caught everything we need to worry about, only time will
tell...

Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I880bbcda0628a7459988eeb49d18655522697664
2021-07-14 20:51:51 -07:00
Greg Kroah-Hartman
a629454175 Revert "Revert "net: xfrm: Localize sequence counter per network namespace""
This reverts commit 1aff922933 as we are
free to update the ABI at this point in time.

Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iba293953586fd78503ef5f47db40ccdc03b098f5
2021-04-23 18:42:38 -07:00
Greg Kroah-Hartman
1aff922933 Revert "net: xfrm: Localize sequence counter per network namespace"
This reverts commit 0224432a8f as it
breaks the abi and we can't allow that at this point in time.

Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I21e6c33ac7011d8489e766fd9b5e32bd528dc456
2021-04-15 14:24:14 +02:00
Greg Kroah-Hartman
9a705f0463 Merge 5.10.30 into android12-5.10
Changes in 5.10.30
	xfrm/compat: Cleanup WARN()s that can be user-triggered
	ALSA: aloop: Fix initialization of controls
	ALSA: hda/realtek: Fix speaker amp setup on Acer Aspire E1
	ALSA: hda/conexant: Apply quirk for another HP ZBook G5 model
	ASoC: intel: atom: Stop advertising non working S24LE support
	nfc: fix refcount leak in llcp_sock_bind()
	nfc: fix refcount leak in llcp_sock_connect()
	nfc: fix memory leak in llcp_sock_connect()
	nfc: Avoid endless loops caused by repeated llcp_sock_connect()
	selinux: make nslot handling in avtab more robust
	selinux: fix cond_list corruption when changing booleans
	selinux: fix race between old and new sidtab
	xen/evtchn: Change irq_info lock to raw_spinlock_t
	net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
	net: dsa: lantiq_gswip: Let GSWIP automatically set the xMII clock
	net: dsa: lantiq_gswip: Don't use PHY auto polling
	net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
	drm/i915: Fix invalid access to ACPI _DSM objects
	ACPI: processor: Fix build when CONFIG_ACPI_PROCESSOR=m
	IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOS
	LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
	gcov: re-fix clang-11+ support
	ia64: fix user_stack_pointer() for ptrace()
	nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff
	ocfs2: fix deadlock between setattr and dio_end_io_write
	fs: direct-io: fix missing sdio->boundary
	ethtool: fix incorrect datatype in set_eee ops
	of: property: fw_devlink: do not link ".*,nr-gpios"
	parisc: parisc-agp requires SBA IOMMU driver
	parisc: avoid a warning on u8 cast for cmpxchg on u8 pointers
	ARM: dts: turris-omnia: configure LED[2]/INTn pin as interrupt pin
	batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field
	ice: Continue probe on link/PHY errors
	ice: Increase control queue timeout
	ice: prevent ice_open and ice_stop during reset
	ice: fix memory allocation call
	ice: remove DCBNL_DEVRESET bit from PF state
	ice: Fix for dereference of NULL pointer
	ice: Use port number instead of PF ID for WoL
	ice: Cleanup fltr list in case of allocation issues
	iwlwifi: pcie: properly set LTR workarounds on 22000 devices
	ice: fix memory leak of aRFS after resuming from suspend
	net: hso: fix null-ptr-deref during tty device unregistration
	libbpf: Fix bail out from 'ringbuf_process_ring()' on error
	bpf: Enforce that struct_ops programs be GPL-only
	bpf: link: Refuse non-O_RDWR flags in BPF_OBJ_GET
	ethernet/netronome/nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
	libbpf: Ensure umem pointer is non-NULL before dereferencing
	libbpf: Restore umem state after socket create failure
	libbpf: Only create rx and tx XDP rings when necessary
	bpf: Refcount task stack in bpf_get_task_stack
	bpf, sockmap: Fix sk->prot unhash op reset
	bpf, sockmap: Fix incorrect fwd_alloc accounting
	net: ensure mac header is set in virtio_net_hdr_to_skb()
	i40e: Fix sparse warning: missing error code 'err'
	i40e: Fix sparse error: 'vsi->netdev' could be null
	i40e: Fix sparse error: uninitialized symbol 'ring'
	i40e: Fix sparse errors in i40e_txrx.c
	vdpa/mlx5: Fix suspend/resume index restoration
	net: sched: sch_teql: fix null-pointer dereference
	net: sched: fix action overwrite reference counting
	nl80211: fix beacon head validation
	nl80211: fix potential leak of ACL params
	cfg80211: check S1G beacon compat element length
	mac80211: fix time-is-after bug in mlme
	mac80211: fix TXQ AC confusion
	net: hsr: Reset MAC header for Tx path
	net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind()
	net: let skb_orphan_partial wake-up waiters.
	thunderbolt: Fix a leak in tb_retimer_add()
	thunderbolt: Fix off by one in tb_port_find_retimer()
	usbip: add sysfs_lock to synchronize sysfs code paths
	usbip: stub-dev synchronize sysfs code paths
	usbip: vudc synchronize sysfs code paths
	usbip: synchronize event handler with sysfs code paths
	driver core: Fix locking bug in deferred_probe_timeout_work_func()
	scsi: pm80xx: Fix chip initialization failure
	scsi: target: iscsi: Fix zero tag inside a trace event
	percpu: make pcpu_nr_empty_pop_pages per chunk type
	i2c: turn recovery error on init to debug
	KVM: x86/mmu: change TDP MMU yield function returns to match cond_resched
	KVM: x86/mmu: Merge flush and non-flush tdp_mmu_iter_cond_resched
	KVM: x86/mmu: Rename goal_gfn to next_last_level_gfn
	KVM: x86/mmu: Ensure forward progress when yielding in TDP MMU iter
	KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed
	KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
	KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
	KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
	KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp
	net: sched: fix err handler in tcf_action_init()
	ice: Refactor DCB related variables out of the ice_port_info struct
	ice: Recognize 860 as iSCSI port in CEE mode
	xfrm: interface: fix ipv4 pmtu check to honor ip header df
	xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume
	remoteproc: qcom: pil_info: avoid 64-bit division
	regulator: bd9571mwv: Fix AVS and DVFS voltage range
	ARM: OMAP4: Fix PMIC voltage domains for bionic
	ARM: OMAP4: PM: update ROM return address for OSWR and OFF
	net: xfrm: Localize sequence counter per network namespace
	esp: delete NETIF_F_SCTP_CRC bit from features for esp offload
	ASoC: SOF: Intel: HDA: fix core status verification
	ASoC: wm8960: Fix wrong bclk and lrclk with pll enabled for some chips
	xfrm: Fix NULL pointer dereference on policy lookup
	virtchnl: Fix layout of RSS structures
	i40e: Added Asym_Pause to supported link modes
	i40e: Fix kernel oops when i40e driver removes VF's
	hostfs: fix memory handling in follow_link()
	amd-xgbe: Update DMA coherency values
	vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
	geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply
	sch_red: fix off-by-one checks in red_check_params()
	drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit
	arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0
	xfrm: Provide private skb extensions for segmented and hw offloaded ESP packets
	can: bcm/raw: fix msg_namelen values depending on CAN_REQUIRED_SIZE
	can: isotp: fix msg_namelen values depending on CAN_REQUIRED_SIZE
	mlxsw: spectrum: Fix ECN marking in tunnel decapsulation
	ethernet: myri10ge: Fix a use after free in myri10ge_sw_tso
	gianfar: Handle error code at MAC address change
	net: dsa: Fix type was not set for devlink port
	cxgb4: avoid collecting SGE_QBASE regs during traffic
	net:tipc: Fix a double free in tipc_sk_mcast_rcv
	ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces
	net/ncsi: Avoid channel_monitor hrtimer deadlock
	net: qrtr: Fix memory leak on qrtr_tx_wait failure
	nfp: flower: ignore duplicate merge hints from FW
	net: phy: broadcom: Only advertise EEE for supported modes
	I2C: JZ4780: Fix bug for Ingenic X1000.
	ASoC: sunxi: sun4i-codec: fill ASoC card owner
	net/mlx5e: Fix mapping of ct_label zero
	net/mlx5e: Fix ethtool indication of connector type
	net/mlx5: Don't request more than supported EQs
	net/rds: Fix a use after free in rds_message_map_pages
	xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model
	soc/fsl: qbman: fix conflicting alignment attributes
	i40e: Fix display statistics for veb_tc
	RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files
	drm/msm: Set drvdata to NULL when msm_drm_init() fails
	net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);
	mptcp: forbit mcast-related sockopt on MPTCP sockets
	scsi: ufs: core: Fix task management request completion timeout
	scsi: ufs: core: Fix wrong Task Tag used in task management request UPIUs
	net: cls_api: Fix uninitialised struct field bo->unlocked_driver_cb
	net: macb: restore cmp registers on resume path
	clk: fix invalid usage of list cursor in register
	clk: fix invalid usage of list cursor in unregister
	workqueue: Move the position of debug_work_activate() in __queue_work()
	s390/cpcmd: fix inline assembly register clobbering
	perf inject: Fix repipe usage
	net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()
	openvswitch: fix send of uninitialized stack memory in ct limit reply
	i2c: designware: Adjust bus_freq_hz when refuse high speed mode set
	iwlwifi: fix 11ax disabled bit in the regulatory capability flags
	can: mcp251x: fix support for half duplex SPI host controllers
	tipc: increment the tmp aead refcnt before attaching it
	net: hns3: clear VF down state bit before request link status
	net/mlx5: Fix placement of log_max_flow_counter
	net/mlx5: Fix PPLM register mapping
	net/mlx5: Fix PBMC register mapping
	RDMA/cxgb4: check for ipv6 address properly while destroying listener
	perf report: Fix wrong LBR block sorting
	RDMA/qedr: Fix kernel panic when trying to access recv_cq
	drm/vc4: crtc: Reduce PV fifo threshold on hvs4
	i40e: Fix parameters in aq_get_phy_register()
	RDMA/addr: Be strict with gid size
	vdpa/mlx5: should exclude header length and fcs from mtu
	vdpa/mlx5: Fix wrong use of bit numbers
	RAS/CEC: Correct ce_add_elem()'s returned values
	clk: socfpga: fix iomem pointer cast on 64-bit
	lockdep: Address clang -Wformat warning printing for %hd
	dt-bindings: net: ethernet-controller: fix typo in NVMEM
	net: sched: bump refcount for new action in ACT replace mode
	gpiolib: Read "gpio-line-names" from a firmware node
	cfg80211: remove WARN_ON() in cfg80211_sme_connect
	net: tun: set tun->dev->addr_len during TUNSETLINK processing
	drivers: net: fix memory leak in atusb_probe
	drivers: net: fix memory leak in peak_usb_create_dev
	net: mac802154: Fix general protection fault
	net: ieee802154: nl-mac: fix check on panid
	net: ieee802154: fix nl802154 del llsec key
	net: ieee802154: fix nl802154 del llsec dev
	net: ieee802154: fix nl802154 add llsec key
	net: ieee802154: fix nl802154 del llsec devkey
	net: ieee802154: forbid monitor for set llsec params
	net: ieee802154: forbid monitor for del llsec seclevel
	net: ieee802154: stop dump llsec params for monitors
	Revert "net: sched: bump refcount for new action in ACT replace mode"
	Linux 5.10.30

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie8754a2e4dfef03bf1f2b878843cde19a4adab21
2021-04-15 14:23:41 +02:00
Ahmed S. Darwish
0224432a8f net: xfrm: Localize sequence counter per network namespace
[ Upstream commit e88add19f6 ]

A sequence counter write section must be serialized or its internal
state can get corrupted. The "xfrm_state_hash_generation" seqcount is
global, but its write serialization lock (net->xfrm.xfrm_state_lock) is
instantiated per network namespace. The write protection is thus
insufficient.

To provide full protection, localize the sequence counter per network
namespace instead. This should be safe as both the seqcount read and
write sections access data exclusively within the network namespace. It
also lays the foundation for transforming "xfrm_state_hash_generation"
data type from seqcount_t to seqcount_LOCKNAME_t in further commits.

Fixes: b65e3d7be0 ("xfrm: state: add sequence count to detect hash resizes")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14 08:42:05 +02:00
Maciej Żenczykowski
8a4b8ea595 ANDROID: net: introduce ip_local_unbindable_ports sysctl
and associated inet_is_local_unbindable_port() helper function:
use it to make explicitly binding to an unbindable port return
-EPERM 'Operation not permitted'.

Autobind doesn't honour this new sysctl since:
  (a) you can simply set both if that's the behaviour you desire
  (b) there could be a use for preventing explicit while allowing auto
  (c) it's faster in the relatively critical path of doing port selection
      during connect() to only check one bitmap instead of both

Various ports may have special use cases which are not suitable for
use by general userspace applications. Currently, ports specified in
ip_local_reserved_ports sysctl will not be returned only in case of
automatic port assignment, but nothing prevents you from explicitly
binding to them - even from an entirely unprivileged process.

In certain cases it is desirable to prevent the host from assigning the
ports even in case of explicit binds, even from superuser processes.

Example use cases might be:
 - a port being stolen by the nic for remote serial console, remote
   power management or some other sort of debugging functionality
   (crash collection, gdb, direct access to some other microcontroller
   on the nic or motherboard, remote management of the nic itself).
 - a transparent proxy where packets are being redirected: in case
   a socket matches this connection, packets from this application
   would be incorrectly sent to one of the endpoints.

Initially I wanted to solve this problem via the simple one line:

static inline bool inet_port_requires_bind_service(struct net *net, unsigned short port) {
-       return port < net->ipv4.sysctl_ip_prot_sock;
+       return port < net->ipv4.sysctl_ip_prot_sock || inet_is_local_reserved_port(net, port);
}

However, this doesn't work for two reasons:
  (a) it changes userspace visible behaviour of the existing local
      reserved ports sysctl, and there appears to be enough documentation
      on the internet talking about setting it to make this a bad idea
  (b) it doesn't prevent privileged apps from using these ports,
      CAP_BIND_SERVICE is relatively likely to be available to, for example,
      a recursive DNS server so it can listed on port 53, which also needs
      to do src port randomization for outgoing queries due to security
      reasons (and it thus does manual port binding).

If we *know* that certain ports are simply unusable, then it's better
nothing even gets the opportunity to try to use them.  This way we at
least get a quick failure, instead of some sort of timeout (or possibly
even corruption of the data stream of the non-kernel based use case).

Test:
  vm:~# cat /proc/sys/net/ipv4/ip_local_unbindable_ports

  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0); s.bind(("::", 3967))'
  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0); s.bind(("::", 3967))'
  vm:~# echo 3967 > /proc/sys/net/ipv4/ip_local_unbindable_ports
  vm:~# cat /proc/sys/net/ipv4/ip_local_unbindable_ports
  3967
  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0); s.bind(("::", 3967))'
  socket.error: (1, 'Operation not permitted')
  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0); s.bind(("::", 3967))'
  socket.error: (1, 'Operation not permitted')

Cc: Sean Tranchetti <stranche@codeaurora.org>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Linux SCTP <linux-sctp@vger.kernel.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Bug: 140404597
Change-Id: Ie96207bea90ae1345adf7b45724d0caf4d6e52c2
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
2021-02-09 15:49:37 +00:00
Oliver Hartkopp
f726f3d371 can: remove obsolete version strings
As pointed out by Jakub Kicinski here:
http://lore.kernel.org/r/20201009175751.5c54097f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com
this patch removes the obsolete version information of the different
CAN protocols and the AF_CAN core module.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20201012074354.25839-2-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-10-12 10:06:39 +02:00
David S. Miller
3ab0a7a0c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Two minor conflicts:

1) net/ipv4/route.c, adding a new local variable while
   moving another local variable and removing it's
   initial assignment.

2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
   One pretty prints the port mode differently, whilst another
   changes the driver to try and obtain the port mode from
   the port node rather than the switch node.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 16:45:34 -07:00
Ido Schimmel
80690ec6b5 nexthop: Convert to blocking notification chain
Currently, the only listener of the nexthop notification chain is the
VXLAN driver. Subsequent patches will add more listeners (e.g., device
drivers such as netdevsim) that need to be able to block when processing
notifications.

Therefore, convert the notification chain to a blocking one. This is
safe as notifications are always emitted from process context.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15 16:31:17 -07:00
Wei Wang
ac8f1710c1 tcp: reflect tos value received in SYN to the socket
This commit adds a new TCP feature to reflect the tos value received in
SYN, and send it out on the SYN-ACK, and eventually set the tos value of
the established socket with this reflected tos value. This provides a
way to set the traffic class/QoS level for all traffic in the same
connection to be the same as the incoming SYN request. It could be
useful in data centers to provide equivalent QoS according to the
incoming request.
This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by
default turned off.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10 13:15:40 -07:00
Pablo Neira Ayuso
67cc570eda netfilter: nf_tables: coalesce multiple notifications into one skbuff
On x86_64, each notification results in one skbuff allocation which
consumes at least 768 bytes due to the skbuff overhead.

This patch coalesces several notifications into one single skbuff, so
each notification consumes at least ~211 bytes, that ~3.5 times less
memory consumption. As a result, this is reducing the chances to exhaust
the netlink socket receive buffer.

Rule of thumb is that each notification batch only contains netlink
messages whose report flag is the same, nfnetlink_send() requires this
to do appropriate delivery to userspace, either via unicast (echo
mode) or multicast (monitor mode).

The skbuff control buffer is used to annotate the report flag for later
handling at the new coalescing routine.

The batch skbuff notification size is NLMSG_GOODSIZE, using a larger
skbuff would allow for more socket receiver buffer savings (to amortize
the cost of the skbuff even more), however, going over that size might
break userspace applications, so let's be conservative and stick to
NLMSG_GOODSIZE.

Reported-by: Phil Sutter <phil@nwl.cc>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08 13:02:44 +02:00
Jakub Sitnicki
ab53cad90e bpf, netns: Keep a list of attached bpf_link's
To support multi-prog link-based attachments for new netns attach types, we
need to keep track of more than one bpf_link per attach type. Hence,
convert net->bpf.links into a list, that currently can be either empty or
have just one item.

Instead of reusing bpf_prog_list from bpf-cgroup, we link together
bpf_netns_link's themselves. This makes list management simpler as we don't
have to allocate, initialize, and later release list elements. We can do
this because multi-prog attachment will be available only for bpf_link, and
we don't need to build a list of programs attached directly and indirectly
via links.

No functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200625141357.910330-4-jakub@cloudflare.com
2020-06-30 10:45:08 -07:00
Jakub Sitnicki
695c12147a bpf, netns: Keep attached programs in bpf_prog_array
Prepare for having multi-prog attachments for new netns attach types by
storing programs to run in a bpf_prog_array, which is well suited for
iterating over programs and running them in sequence.

After this change bpf(PROG_QUERY) may block to allocate memory in
bpf_prog_array_copy_to_user() for collected program IDs. This forces a
change in how we protect access to the attached program in the query
callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
an RCU read lock to holding a mutex that serializes updaters.

Because we allow only one BPF flow_dissector program to be attached to
netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
always either detached (null) or one element long.

No functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200625141357.910330-3-jakub@cloudflare.com
2020-06-30 10:45:08 -07:00
Jakub Sitnicki
7f045a49fe bpf: Add link-based BPF program attachment to network namespace
Extend bpf() syscall subcommands that operate on bpf_link, that is
LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
network namespaces (only flow dissector at the moment).

Link-based and prog-based attachment can be used interchangeably, but only
one can exist at a time. Attempts to attach a link when a prog is already
attached directly, and the other way around, will be met with -EEXIST.
Attempts to detach a program when link exists result in -EINVAL.

Attachment of multiple links of same attach type to one netns is not
supported with the intention to lift the restriction when a use-case
presents itself. Because of that link create returns -E2BIG when trying to
create another netns link, when one already exists.

Link-based attachments to netns don't keep a netns alive by holding a ref
to it. Instead links get auto-detached from netns when the latter is being
destroyed, using a pernet pre_exit callback.

When auto-detached, link lives in defunct state as long there are open FDs
for it. -ENOLINK is returned if a user tries to update a defunct link.

Because bpf_link to netns doesn't hold a ref to struct net, special care is
taken when releasing, updating, or filling link info. The netns might be
getting torn down when any of these link operations are in progress. That
is why auto-detach and update/release/fill_info are synchronized by the
same mutex. Also, link ops have to always check if auto-detach has not
happened yet and if netns is still alive (refcnt > 0).

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-5-jakub@cloudflare.com
2020-06-01 15:21:03 -07:00
Jakub Sitnicki
a3fd7ceee0 net: Introduce netns_bpf for BPF programs attached to netns
In order to:

 (1) attach more than one BPF program type to netns, or
 (2) support attaching BPF programs to netns with bpf_link, or
 (3) support multi-prog attach points for netns

we will need to keep more state per netns than a single pointer like we
have now for BPF flow dissector program.

Prepare for the above by extracting netns_bpf that is part of struct net,
for storing all state related to BPF programs attached to netns.

Turn flow dissector callbacks for querying/attaching/detaching a program
into generic ones that operate on netns_bpf. Next patch will move the
generic callbacks into their own module.

This is similar to how it is organized for cgroup with cgroup_bpf.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-3-jakub@cloudflare.com
2020-06-01 15:21:02 -07:00
Roopa Prabhu
8590ceedb7 nexthop: add support for notifiers
This patch adds nexthop add/del notifiers. To be used by
vxlan driver in a later patch. Could possibly be used by
switchdev drivers in the future.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
Eric Dumazet
a70437cc09 tcp: add hrtimer slack to sack compression
Add a sysctl to control hrtimer slack, default of 100 usec.

This gives the opportunity to reduce system overhead,
and help very short RTT flows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 13:24:01 -07:00
Roopa Prabhu
4f80116d3d net: ipv4: add sysctl for nexthop api compatibility mode
Current route nexthop API maintains user space compatibility
with old route API by default. Dumps and netlink notifications
support both new and old API format. In systems which have
moved to the new API, this compatibility mode cancels some
of the performance benefits provided by the new nexthop API.

This patch adds new sysctl nexthop_compat_mode which is on
by default but provides the ability to turn off compatibility
mode allowing systems to run entirely with the new routing
API. Old route API behaviour and support is not modified by this
sysctl.

Uses a single sysctl to cover both ipv4 and ipv6 following
other sysctls. Covers dumps and delete notifications as
suggested by David Ahern.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 12:50:37 -07:00
Florian Westphal
fc518953bc mptcp: add and use MIB counter infrastructure
Exported via same /proc file as the Linux TCP MIB counters, so "netstat -s"
or "nstat" will show them automatically.

The MPTCP MIB counters are allocated in a distinct pcpu area in order to
avoid bloating/wasting TCP pcpu memory.

Counters are allocated once the first MPTCP socket is created in a
network namespace and free'd on exit.

If no sockets have been allocated, all-zero mptcp counters are shown.

The MIB counter list is taken from the multipath-tcp.org kernel, but
only a few counters have been picked up so far.  The counter list can
be increased at any time later on.

v2 -> v3:
 - remove 'inline' in foo.c files (David S. Miller)

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Kuniyuki Iwashima
4b01a96742 tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.
Commit aacd9289af ("tcp: bind() use stronger
condition for bind_conflict") introduced a restriction to forbid to bind
SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to
assign ports dispersedly so that we can connect to the same remote host.

The change results in accelerating port depletion so that we fail to bind
sockets to the same local port even if we want to connect to the different
remote hosts.

You can reproduce this issue by following instructions below.

  1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768"
  2. set SO_REUSEADDR to two sockets.
  3. bind two sockets to (localhost, 0) and the latter fails.

Therefore, when ephemeral ports are exhausted, bind(0) should fallback to
the legacy behaviour to enable the SO_REUSEADDR option and make it possible
to connect to different remote (addr, port) tuples.

This patch allows us to bind SO_REUSEADDR enabled sockets to the same
(addr, port) only when net.ipv4.ip_autobind_reuse is set 1 and all
ephemeral ports are exhausted. This also allows connect() and listen() to
share ports in the following way and may break some applications. So the
ip_autobind_reuse is 0 by default and disables the feature.

  1. setsockopt(sk1, SO_REUSEADDR)
  2. setsockopt(sk2, SO_REUSEADDR)
  3. bind(sk1, saddr, 0)
  4. bind(sk2, saddr, 0)
  5. connect(sk1, daddr)
  6. listen(sk2)

If it is set 1, we can fully utilize the 4-tuples, but we should use
IP_BIND_ADDRESS_NO_PORT for bind()+connect() as possible.

The notable thing is that if all sockets bound to the same port have
both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an
ephemeral port and also do listen().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 12:08:09 -07:00
David S. Miller
4d8773b68e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor conflict in mlx5 because changes happened to code that has
moved meanwhile.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-26 10:40:21 +01:00
Pablo Neira Ayuso
eb014de4fd netfilter: nf_tables: autoload modules from the abort path
This patch introduces a list of pending module requests. This new module
list is composed of nft_module_request objects that contain the module
name and one status field that tells if the module has been already
loaded (the 'done' field).

In the first pass, from the preparation phase, the netlink command finds
that a module is missing on this list. Then, a module request is
allocated and added to this list and nft_request_module() returns
-EAGAIN. This triggers the abort path with the autoload parameter set on
from nfnetlink, request_module() is called and the module request enters
the 'done' state. Since the mutex is released when loading modules from
the abort phase, the module list is zapped so this is iteration occurs
over a local list. Therefore, the request_module() calls happen when
object lists are in consistent state (after fulling aborting the
transaction) and the commit list is empty.

On the second pass, the netlink command will find that it already tried
to load the module, so it does not request it again and
nft_request_module() returns 0. Then, there is a look up to find the
object that the command was missing. If the module was successfully
loaded, the command proceeds normally since it finds the missing object
in place, otherwise -ENOENT is reported to userspace.

This patch also updates nfnetlink to include the reason to enter the
abort phase, which is required for this new autoload module rationale.

Fixes: ec7470b834 ("netfilter: nf_tables: store transaction list locally while requesting module")
Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-24 20:54:29 +01:00
Kevin(Yudong) Yang
65e6d90168 net-tcp: Disable TCP ssthresh metrics cache by default
This patch introduces a sysctl knob "net.ipv4.tcp_no_ssthresh_metrics_save"
that disables TCP ssthresh metrics cache by default. Other parts of TCP
metrics cache, e.g. rtt, cwnd, remain unchanged.

As modern networks becoming more and more dynamic, TCP metrics cache
today often causes more harm than benefits. For example, the same IP
address is often shared by different subscribers behind NAT in residential
networks. Even if the IP address is not shared by different users,
caching the slow-start threshold of a previous short flow using loss-based
congestion control (e.g. cubic) often causes the future longer flows of
the same network path to exit slow-start prematurely with abysmal
throughput.

Caching ssthresh is very risky and can lead to terrible performance.
Therefore it makes sense to make disabling ssthresh caching by
default and opt-in for specific networks by the administrators.
This practice also has worked well for several years of deployment with
CUBIC congestion control at Google.

Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09 20:17:48 -08:00
Paolo Abeni
b9b33e7c24 ipv6: keep track of routes using src
Use a per namespace counter, increment it on successful creation
of any route using the source address, decrement it on deletion
of such routes.

This allows us to check easily if the routing decision in the
current namespace depends on the packet source. Will be used
by the next patch.

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21 14:45:55 -08:00
Xin Long
34515e94c9 sctp: add support for Primary Path Switchover
This is a new feature defined in section 5 of rfc7829: "Primary Path
Switchover". By introducing a new tunable parameter:

  Primary.Switchover.Max.Retrans (PSMR)

The primary path will be changed to another active path when the path
error counter on the old primary path exceeds PSMR, so that "the SCTP
sender is allowed to continue data transmission on a new working path
even when the old primary destination address becomes active again".

This patch is to add this tunable parameter, 'ps_retrans' per netns,
sock, asoc and transport. It also allows a user to change ps_retrans
per netns by sysctl, and ps_retrans per sock/asoc/transport will be
initialized with it.

The check will be done in sctp_do_8_2_transport_strike() when this
feature is enabled.

Note this feature is disabled by initializing 'ps_retrans' per netns
as 0xffff by default, and its value can't be less than 'pf_retrans'
when changing by sysctl.

v3->v4:
  - add define SCTP_PS_RETRANS_MAX 0xffff, and use it on extra2 of
    sysctl 'ps_retrans'.
  - add a new entry for ps_retrans on ip-sysctl.txt.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-08 14:18:32 -08:00
Xin Long
aef587be42 sctp: add pf_expose per netns and sock and asoc
As said in rfc7829, section 3, point 12:

  The SCTP stack SHOULD expose the PF state of its destination
  addresses to the ULP as well as provide the means to notify the
  ULP of state transitions of its destination addresses from
  active to PF, and vice versa.  However, it is recommended that
  an SCTP stack implementing SCTP-PF also allows for the ULP to be
  kept ignorant of the PF state of its destinations and the
  associated state transitions, thus allowing for retention of the
  simpler state transition model of [RFC4960] in the ULP.

Not only does it allow to expose the PF state to ULP, but also
allow to ignore sctp-pf to ULP.

So this patch is to add pf_expose per netns, sock and asoc. And in
sctp_assoc_control_transport(), ulp_notify will be set to false if
asoc->expose is not 'enabled' in next patch.

It also allows a user to change pf_expose per netns by sysctl, and
pf_expose per sock and asoc will be initialized with it.

Note that pf_expose also works for SCTP_GET_PEER_ADDR_INFO sockopt,
to not allow a user to query the state of a sctp-pf peer address
when pf_expose is 'disabled', as said in section 7.3.

v1->v2:
  - Fix a build warning noticed by Nathan Chancellor.
v2->v3:
  - set pf_expose to UNUSED by default to keep compatible with old
    applications.
v3->v4:
  - add a new entry for pf_expose on ip-sysctl.txt, as Marcelo suggested.
  - change this patch to 1/5, and move sctp_assoc_control_transport
    change into 2/5, as Marcelo suggested.
  - use SCTP_PF_EXPOSE_UNSET instead of SCTP_PF_EXPOSE_UNUSED, and
    set SCTP_PF_EXPOSE_UNSET to 0 in enum, as Marcelo suggested.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-08 14:18:32 -08:00
Jakub Kicinski
d26b698dd3 net/tls: add skeleton of MIB statistics
Add a skeleton structure for adding TLS statistics.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-05 16:29:00 -07:00
Marc Kleine-Budde
564577dfee can: netns: remove "can_" prefix from members struct netns_can
This patch improves the code reability by removing the redundant "can_"
prefix from the members of struct netns_can (as the struct netns_can itself
is the member "can" of the struct net.)

The conversion is done with:

	sed -i \
		-e "s/struct can_dev_rcv_lists \*can_rx_alldev_list;/struct can_dev_rcv_lists *rx_alldev_list;/" \
		-e "s/spinlock_t can_rcvlists_lock;/spinlock_t rcvlists_lock;/" \
		-e "s/struct timer_list can_stattimer;/struct timer_list stattimer; /" \
		-e "s/can\.can_rx_alldev_list/can.rx_alldev_list/g" \
		-e "s/can\.can_rcvlists_lock/can.rcvlists_lock/g" \
		-e "s/can\.can_stattimer/can.stattimer/g" \
		include/net/netns/can.h \
		net/can/*.[ch]

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-09-04 13:29:14 +02:00
Marc Kleine-Budde
2341086df4 can: netns: give members of struct netns_can holding the statistics a sensible name
This patch gives the members of the struct netns_can that are holding
the statistics a sensible name, by renaming struct netns_can::can_stats
into struct netns_can::pkg_stats and struct netns_can::can_pstats into
struct netns_can::rcv_lists_stats.

The conversion is done with:

	sed -i \
		-e "s:\(struct[^*]*\*\)can_stats;.*:\1pkg_stats;:" \
		-e "s:\(struct[^*]*\*\)can_pstats;.*:\1rcv_lists_stats;:" \
		-e "s/can\.can_stats/can.pkg_stats/g" \
		-e "s/can\.can_pstats/can.rcv_lists_stats/g" \
		net/can/*.[ch] \
		include/net/netns/can.h

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-09-04 13:29:13 +02:00
Marc Kleine-Budde
6c43bb3a41 can: netns: give structs holding the CAN statistics a sensible name
This patch renames both "struct s_stats" and "struct s_pstats", to
"struct can_pkg_stats" and "struct can_rcv_lists_stats" to better
reflect their meaning and improve code readability.

The conversion is done with:

	sed -i \
		-e "s/struct s_stats/struct can_pkg_stats/g" \
		-e "s/struct s_pstats/struct can_rcv_lists_stats/g" \
		net/can/*.[ch] \
		include/net/netns/can.h

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-09-04 13:29:13 +02:00
Xin Long
1b0b8114b9 sctp: make ecn flag per netns and endpoint
This patch is to add ecn flag for both netns_sctp and sctp_endpoint,
net->sctp.ecn_enable is set 1 by default, and ep->ecn_enable will
be initialized with net->sctp.ecn_enable.

asoc->peer.ecn_capable will be set during negotiation only when
ep->ecn_enable is set on both sides.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 20:54:14 -07:00
Josh Hunt
c04b79b6cf tcp: add new tcp_mtu_probe_floor sysctl
The current implementation of TCP MTU probing can considerably
underestimate the MTU on lossy connections allowing the MSS to get down to
48. We have found that in almost all of these cases on our networks these
paths can handle much larger MTUs meaning the connections are being
artificially limited. Even though TCP MTU probing can raise the MSS back up
we have seen this not to be the case causing connections to be "stuck" with
an MSS of 48 when heavy loss is present.

Prior to pushing out this change we could not keep TCP MTU probing enabled
b/c of the above reasons. Now with a reasonble floor set we've had it
enabled for the past 6 months.

The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
administrators the ability to control the floor of MSS probing.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09 13:03:30 -07:00
David S. Miller
13091aa305 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 20:20:36 -07:00
Eric Dumazet
5f3e2bf008 tcp: add tcp_min_snd_mss sysctl
Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.

This forces the stack to send packets with a very high network/cpu
overhead.

Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.

In some cases, it can be useful to increase the minimal value
to a saner value.

We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.

Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f1 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.

We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.

CVE-2019-11479 -- tcp mss hardcoded to 48

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15 18:47:31 -07:00
David Ahern
ab84be7e54 net: Initial nexthop code
Barebones start point for nexthops. Implementation for RTM commands,
notifications, management of rbtree for holding nexthops by id, and
kernel side data structures for nexthops and nexthop config.

Nexthops are maintained in an rbtree sorted by id. Similar to routes,
nexthops are configured per namespace using netns_nexthop struct added
to struct net.

Nexthop notifications are sent when a nexthop is added or deleted,
but NOT if the delete is due to a device event or network namespace
teardown (which also involves device events). Applications are
expected to use the device down event to flush nexthops and any
routes used by the nexthops.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28 21:37:30 -07:00
Eric Dumazet
4907abc605 net: dynamically allocate fqdir structures
Following patch will add rcu grace period before fqdir
rhashtable destruction, so we need to dynamically allocate
fqdir structures to not force expensive synchronize_rcu() calls
in netns dismantle path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26 14:08:05 -07:00
Eric Dumazet
803fdd9968 net: rename struct fqdir fields
Rename the @frags fields from structs netns_ipv4, netns_ipv6,
netns_nf_frag and netns_ieee802154_lowpan to @fqdir

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26 14:08:05 -07:00
Eric Dumazet
6ce3b4dcee inet: rename netns_frags to fqdir
1) struct netns_frags is renamed to struct fqdir
  This structure is really holding many frag queues in a hash table.

2) (struct inet_frag_queue)->net field is renamed to fqdir
  since net is generally associated to a 'struct net' pointer
  in networking stack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26 14:08:04 -07:00
Tonghao Zhang
8f14c99c7e netfilter: conntrack: limit sysctl setting for boolean options
We use the zero and one to limit the boolean options setting.
After this patch we only set 0 or 1 to boolean options for nf
conntrack sysctl.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-30 14:18:56 +02:00
Stephen Suryaputra
0bc1998544 ipv6: Add rate limit mask for ICMPv6 messages
To make ICMPv6 closer to ICMPv4, add ratemask parameter. Since the ICMP
message types use larger numeric values, a simple bitmask doesn't fit.
I use large bitmap. The input and output are the in form of list of
ranges. Set the default to rate limit all error messages but Packet Too
Big. For Packet Too Big, use ratemask instead of hard-coded.

There are functions where icmpv6_xrlim_allow() and icmpv6_global_allow()
aren't called. This patch only adds them to icmpv6_echo_reply().

Rate limiting error messages is mandated by RFC 4443 but RFC 4890 says
that it is also acceptable to rate limit informational messages. Thus,
I removed the current hard-coded behavior of icmpv6_mask_allow() that
doesn't rate limit informational messages.

v2: Add dummy function proc_do_large_bitmap() if CONFIG_PROC_SYSCTL
    isn't defined, expand the description in ip-sysctl.txt and remove
    unnecessary conditional before kfree().
v3: Inline the bitmap instead of dynamically allocated. Still is a
    pointer to it is needed because of the way proc_do_large_bitmap work.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18 16:58:37 -07:00
David S. Miller
f83f715195 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor comment merge conflict in mlx5.

Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-05 14:14:19 -07:00
Eric Dumazet
355b985537 netns: provide pure entropy for net_hash_mix()
net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)

I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.

Also provide entropy regardless of CONFIG_NET_NS.

Fixes: 0b4419162a ("netns: introduce the net_hash_mix "salt" for hashes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:00:45 -07:00
Eric Dumazet
df453700e8 inet: switch IP ID generator to siphash
According to Amit Klein and Benny Pinkas, IP ID generation is too weak
and might be used by attackers.

Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
having 64bit key and Jenkins hash is risky.

It is time to switch to siphash and its 128bit keys.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-27 14:29:26 -07:00
Stephen Suryaputra
0b03a5ca8b ipv6: Add icmp_echo_ignore_anycast for ICMPv6
In addition to icmp_echo_ignore_multicast, there is a need to also
prevent responding to pings to anycast addresses for security.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 16:29:37 -07:00
Stephen Suryaputra
03f1eccc7a ipv6: Add icmp_echo_ignore_multicast support for ICMPv6
IPv4 has icmp_echo_ignore_broadcast to prevent responding to broadcast pings.
IPv6 needs a similar mechanism.

v1->v2:
- Remove NET_IPV6_ICMP_ECHO_IGNORE_MULTICAST.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-19 14:29:51 -07:00