linux/include
Nicolas Pitre 00a31dd3ac
asm-generic/div64: optimize/simplify __div64_const32()
Several years later I just realized that this code could be greatly
simplified.

First, let's formalize the need for overflow handling in
__arch_xprod64(). Assuming n = UINT64_MAX, there are 2 cases where
an overflow may occur:

1) If a bias must be added, we have m_lo * n_lo + m or
   m_lo * 0xffffffff + ((m_hi << 32) + m_lo) or
   ((m_lo << 32) - m_lo) + ((m_hi << 32) + m_lo) or
   (m_lo + m_hi) << 32 which must be < (1 << 64). So the criteria for no
   overflow is m_lo + m_hi < (1 << 32).

2) The cross product m_lo * n_hi + m_hi * n_lo or
   m_lo * 0xffffffff + m_hi * 0xffffffff or
   ((m_lo << 32) - m_lo) + ((m_hi << 32) - m_hi). Assuming the top
   result from the previous step (m_lo + m_hi) that must be added to
   this, we get (m_lo + m_hi) << 32 again.

So let's have a straight and simpler version when this is true.
Otherwise some reordering allows for taking care of possible overflows
without any actual conditionals. And prevent from generating both code
variants by making sure this is considered only if m is perceived as
constant by the compiler.

This, in turn, allows for greatly simplifying __div64_const32(). The
"special case" may go as well as the regular case works just fine
without needing a bias. Then reduction should be applied all the time as
minimizing m is the key.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28 21:44:28 +00:00
..
acpi Power management updates for 6.12-rc1 2024-09-16 07:47:50 +02:00
asm-generic asm-generic/div64: optimize/simplify __div64_const32() 2024-10-28 21:44:28 +00:00
clocksource
crypto move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
cxl cxl: Move mailbox related bits to the same context 2024-09-12 08:38:01 -07:00
drm Short summary of fixes pull: 2024-10-01 08:15:55 +10:00
dt-bindings soc: convert ep93xx to devicetree 2024-09-26 12:00:25 -07:00
keys KEYS: Remove unused declarations 2024-09-20 18:28:26 +03:00
kunit The core clk framework is left largely untouched this time around except for 2024-09-23 15:01:48 -07:00
kvm
linux tty: serial: handle HAS_IOPORT dependencies 2024-10-28 21:44:28 +00:00
math-emu
media media: cec: move cec_get/put_device to header 2024-09-05 20:12:15 +02:00
memory
misc
net move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
pcmcia
ras
rdma move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
rv
scsi move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
soc soc: driver updates for 6.12 2024-09-17 10:48:09 +02:00
sound ALSA: hda: fix trigger_tstamp_latched 2024-10-02 12:50:24 +02:00
target move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
trace for-6.12-rc1-tag 2024-10-04 10:05:13 -07:00
uapi UAPI/ioctl: Improve parameter name of ioctl request definition helpers 2024-10-28 21:44:22 +00:00
ufs Many singleton patches - please see the various changelogs for details. 2024-09-21 08:20:50 -07:00
vdso random: vDSO: add a __vdso_getrandom prototype for all architectures 2024-09-13 17:28:35 +02:00
video
xen xen: sync elfnote.h from xen tree 2024-09-25 14:15:04 +02:00