linux/drivers
Samuel Holland e19ca3fe6c clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
commit c950ca8c35 upstream.

The Allwinner A64 SoC is known[1] to have an unstable architectural
timer, which manifests itself most obviously in the time jumping forward
a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a
timer frequency of 24 MHz, implying that the time went slightly backward
(and this was interpreted by the kernel as it jumping forward and
wrapping around past the epoch).

Investigation revealed instability in the low bits of CNTVCT at the
point a high bit rolls over. This leads to power-of-two cycle forward
and backward jumps. (Testing shows that forward jumps are about twice as
likely as backward jumps.) Since the counter value returns to normal
after an indeterminate read, each "jump" really consists of both a
forward and backward jump from the software perspective.

Unless the kernel is trapping CNTVCT reads, a userspace program is able
to read the register in a loop faster than it changes. A test program
running on all 4 CPU cores that reported jumps larger than 100 ms was
run for 13.6 hours and reported the following:

 Count | Event
-------+---------------------------
  9940 | jumped backward      699ms
   268 | jumped backward     1398ms
     1 | jumped backward     2097ms
 16020 | jumped forward       175ms
  6443 | jumped forward       699ms
  2976 | jumped forward      1398ms
     9 | jumped forward    356516ms
     9 | jumped forward    357215ms
     4 | jumped forward    714430ms
     1 | jumped forward   3578440ms

This works out to a jump larger than 100 ms about every 5.5 seconds on
each CPU core.

The largest jump (almost an hour!) was the following sequence of reads:
    0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000

Note that the middle bits don't necessarily all read as all zeroes or
all ones during the anomalous behavior; however the low 10 bits checked
by the function in this patch have never been observed with any other
value.

Also note that smaller jumps are much more common, with backward jumps
of 2048 (2^11) cycles observed over 400 times per second on each core.
(Of course, this is partially explained by lower bits rolling over more
frequently.) Any one of these could have caused the 95 year time skip.

Similar anomalies were observed while reading CNTPCT (after patching the
kernel to allow reads from userspace). However, the CNTPCT jumps are
much less frequent, and only small jumps were observed. The same program
as before (except now reading CNTPCT) observed after 72 hours:

 Count | Event
-------+---------------------------
    17 | jumped backward      699ms
    52 | jumped forward       175ms
  2831 | jumped forward       699ms
     5 | jumped forward      1398ms

Further investigation showed that the instability in CNTPCT/CNTVCT also
affected the respective timer's TVAL register. The following values were
observed immediately after writing CNVT_TVAL to 0x10000000:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
 0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000
 0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000
 0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000
 0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000

The pattern of errors in CNTV_TVAL seemed to depend on exactly which
value was written to it. For example, after writing 0x10101010:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
 0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000
 0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000
 0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000
 0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000
 0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000
 0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000

I was also twice able to reproduce the issue covered by Allwinner's
workaround[4], that writing to TVAL sometimes fails, and both CVAL and
TVAL are left with entirely bogus values. One was the following values:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL
--------------------+------------+--------------------------------------
 0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the past)
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

========================================================================

Because the CPU can read the CNTPCT/CNTVCT registers faster than they
change, performing two reads of the register and comparing the high bits
(like other workarounds) is not a workable solution. And because the
timer can jump both forward and backward, no pair of reads can
distinguish a good value from a bad one. The only way to guarantee a
good value from consecutive reads would be to read _three_ times, and
take the middle value only if the three values are 1) each unique and
2) increasing. This takes at minimum 3 counter cycles (125 ns), or more
if an anomaly is detected.

However, since there is a distinct pattern to the bad values, we can
optimize the common case (1022/1024 of the time) to a single read by
simply ignoring values that match the error pattern. This still takes no
more than 3 cycles in the worst case, and requires much less code. As an
additional safety check, we still limit the loop iteration to the number
of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods.

For the TVAL registers, the simple solution is to not use them. Instead,
read or write the CVAL and calculate the TVAL value in software.

Although the manufacturer is aware of at least part of the erratum[4],
there is no official name for it. For now, use the kernel-internal name
"UNKNOWN1".

[1]: https://github.com/armbian/build/commit/a08cd6fe7ae9
[2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/
[3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26
[4]: https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272

Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 20:09:58 +01:00
..
accessibility
acpi ACPI / device_sysfs: Avoid OF modalias creation for removed device 2019-03-23 20:09:57 +01:00
amba
android binder: fix race that allows malicious free of live buffer 2018-12-05 19:32:11 +01:00
ata libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD 2019-02-15 08:10:10 +01:00
atm atm: he: fix sign-extension overflow on large shift 2019-02-27 10:08:57 +01:00
auxdisplay auxdisplay: ht16k33: fix potential user-after-free on module unload 2019-03-23 20:09:47 +01:00
base driver core: Postpone DMA tear-down until after devres release 2019-03-13 14:02:41 -07:00
bcma
block floppy: check_events callback should not return a negative number 2019-03-23 20:09:45 +01:00
bluetooth Bluetooth: btrtl: Restore old logic to assume firmware is already loaded 2019-03-10 07:17:21 +01:00
bus
cdrom gdrom: fix a memory leak bug 2019-02-12 19:47:18 +01:00
char applicom: Fix potential Spectre v1 vulnerabilities 2019-03-10 07:17:20 +01:00
clk clk: sunxi: A31: Fix wrong AHB gate number 2019-03-23 20:09:47 +01:00
clocksource clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability 2019-03-23 20:09:58 +01:00
connector connector: fix unsafe usage of ->real_parent 2019-03-19 13:12:38 +01:00
cpufreq cpufreq: Use struct kobj_attribute instead of struct global_attr 2019-03-10 07:17:15 +01:00
cpuidle cpuidle: big.LITTLE: fix refcount leak 2019-02-12 19:47:08 +01:00
crypto crypto: rockchip - update new iv to device in multiple operations 2019-03-23 20:09:41 +01:00
dax mm, devm_memremap_pages: fix shutdown handling 2019-01-13 09:51:04 +01:00
dca
devfreq
dio
dma dmaengine: dmatest: Abort test in case of mapping error 2019-03-13 14:02:36 -07:00
dma-buf
edac EDAC, skx_edac: Fix logical channel intermediate decoding 2018-11-13 11:08:44 -08:00
eisa
extcon
firewire
firmware iscsi_ibft: Fix missing break in switch statement 2019-03-13 14:02:39 -07:00
fmc
fpga fpga: altera-cvp: fix 'bad IO access' on x86_64 2019-02-12 19:46:59 +01:00
fsi fsi: master-ast-cf: select GENERIC_ALLOCATOR 2018-12-17 09:24:35 +01:00
gnss gnss: sirf: fix premature wakeup interrupt enable 2019-03-10 07:17:21 +01:00
gpio gpio: vf610: Mask all GPIO interrupts 2019-03-13 14:02:29 -07:00
gpu gpu: ipu-v3: Fix CSI offsets for imx53 2019-03-23 20:09:41 +01:00
hid HID: debug: fix the ring buffer implementation 2019-02-12 19:47:24 +01:00
hsi
hv Drivers: hv: vmbus: Check for ring when getting debug info 2019-01-31 08:14:36 +01:00
hwmon hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table 2019-02-27 10:08:57 +01:00
hwspinlock
hwtracing stm class: Prevent division by zero 2019-03-23 20:09:52 +01:00
i2c i2c: bcm2835: Clear current buffer pointers and counts after a transfer 2019-03-23 20:09:47 +01:00
ide ide: fix a typo in the settings proc file name 2019-01-31 08:14:42 +01:00
idle
iio iio: adc: exynos-adc: Fix NULL pointer exception on unbind 2019-03-23 20:09:39 +01:00
infiniband IB/ipoib: Fix for use-after-free in ipoib_cm_tx_start 2019-03-13 14:02:28 -07:00
input Input: st-keyscan - fix potential zalloc NULL dereference 2019-03-23 20:09:47 +01:00
iommu iommu/amd: Fix IOMMU page flush when detach device from a domain 2019-03-13 14:02:27 -07:00
ipack
irqchip irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable 2019-03-13 14:02:35 -07:00
isdn isdn: avm: Fix string plus integer warning from Clang 2019-02-27 10:08:58 +01:00
leds leds: lp5523: fix a missing check of return value of lp55xx_read 2019-02-27 10:08:57 +01:00
lightnvm lightnvm: pblk: add lock protection to list operations 2019-02-12 19:47:08 +01:00
macintosh
mailbox mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush timeout issue 2019-03-23 20:09:49 +01:00
mcb
md It's wrong to add len to sector_nr in raid10 reshape twice 2019-03-19 13:12:42 +01:00
media media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused() 2019-03-23 20:09:38 +01:00
memory memory: ti-aemif: fix a potential NULL-pointer dereference 2018-09-06 10:04:07 -07:00
memstick memstick: Prevent memstick host from getting runtime suspended during card detection 2019-02-12 19:47:10 +01:00
message
mfd mfd: mc13xxx: Fix a missing check of a register-read failure 2019-02-27 10:08:52 +01:00
misc mei: bus: move hw module get/put to probe/release 2019-03-23 20:09:39 +01:00
mmc mmc:fix a bug when max_discard is 0 2019-03-23 20:09:57 +01:00
mtd mtd: rawnand: gpmi: fix MX28 bus master lockup problem 2019-02-15 08:10:10 +01:00
mux mux: adgs1408: use the correct MODULE_LICENSE 2018-10-12 17:36:39 +02:00
net net: set static variable an initial value in atl2_probe() 2019-03-23 20:09:52 +01:00
nfc NFC: nfcmrvl_uart: fix OF child-node lookup 2018-11-13 11:08:48 -08:00
ntb
nubus
nvdimm libnvdimm: Fix altmap reservation size calculation 2019-03-23 20:09:53 +01:00
nvme nvme-pci: add missing unlock for reset error 2019-03-13 14:02:38 -07:00
nvmem nvmem: check the return value of nvmem_add_cells() 2018-11-13 11:08:35 -08:00
of of: overlay: do not duplicate properties from overlay for new nodes 2019-02-06 17:30:16 +01:00
opp OPP: Use opp_table->regulators to verify no regulator case 2019-02-12 19:47:08 +01:00
oprofile
parisc
parport
pci PCI: Fix __initdata issue with "pci=disable_acs_redir" parameter 2019-02-23 09:07:26 +01:00
pcmcia pcmcia: Implement CLKRUN protocol disabling for Ricoh bridges 2018-11-13 11:08:17 -08:00
perf perf: arm_spe: handle devm_kasprintf() failure 2019-02-12 19:47:03 +01:00
phy phy: ath79-usb: Fix the main reset name to match the DT binding 2019-03-05 17:58:48 +01:00
pinctrl pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins 2019-03-23 20:09:49 +01:00
platform platform/x86: Fix unmet dependency warning for SAMSUNG_Q10 2019-03-13 14:02:31 -07:00
pnp
power power: supply: olpc_battery: correct the temperature units 2019-01-13 09:51:10 +01:00
powercap
pps
ps3
ptp ptp: Fix pass zero to ERR_PTR() in ptp_clock_register 2019-02-12 19:47:01 +01:00
pwm
rapidio
ras
regulator regulator: s2mpa01: Fix step values for some LDOs 2019-03-23 20:09:58 +01:00
remoteproc remoteproc: qcom: q6v5: Propagate EPROBE_DEFER 2018-11-13 11:08:52 -08:00
reset
rpmsg rpmsg: smd: fix memory leak on channel create 2018-11-13 11:08:55 -08:00
rtc rtc: m41t80: Correct alarm month range with RTC reads 2019-01-09 17:38:48 +01:00
s390 s390/dasd: fix using offset into zero size array error 2019-03-23 20:09:42 +01:00
sbus drivers/sbus/char: add of_node_put() 2018-12-21 14:15:17 +01:00
scsi scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task 2019-03-23 20:09:48 +01:00
sfi
sh
siox
slimbus slimbus: ngd: mark PM functions as __maybe_unused 2018-12-19 19:19:49 +01:00
sn
soc soc: fsl: qbman: avoid race in clearing QMan interrupt 2019-03-13 14:02:33 -07:00
soundwire
spi spi: pxa2xx: Setup maximum supported DMA transfer length 2019-03-23 20:09:57 +01:00
spmi
ssb
staging staging: erofs: fix race when the managed cache is enabled 2019-03-19 13:12:42 +01:00
target scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes 2019-02-27 10:08:55 +01:00
tc TC: Set DMA masks for devices 2018-11-13 11:08:51 -08:00
tee tee: optee: avoid possible double list_del() 2019-02-12 19:47:08 +01:00
thermal drivers: thermal: int340x_thermal: Fix sysfs race condition 2019-03-05 17:58:48 +01:00
thunderbolt thunderbolt: Prevent root port runtime suspend during NVM upgrade 2018-12-17 09:24:36 +01:00
tty serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling 2019-03-05 17:58:49 +01:00
uio uio: Fix an Oops on load 2018-11-27 16:13:09 +01:00
usb usb: phy: fix link errors 2019-03-13 14:02:34 -07:00
uwb
vfio vfio/type1: Fix unmap overflow off-by-one 2019-01-16 22:04:34 +01:00
vhost vhost/vsock: fix vhost vsock cid hashing inconsistent 2019-03-19 13:12:42 +01:00
video udlfb: handle unplug properly 2019-02-27 10:09:03 +01:00
virt vbox: fix link error with 'gcc -Og' 2019-02-12 19:46:59 +01:00
virtio
visorbus
vlynq
vme
w1 w1: omap-hdq: fix missing bus unregister at removal 2018-11-13 11:08:48 -08:00
watchdog watchdog: mt7621_wdt/rt2880_wdt: Fix compilation problem 2019-02-27 10:08:52 +01:00
xen pvcalls-front: fix potential null dereference 2019-02-27 10:08:56 +01:00
zorro
Kconfig
Makefile