linux/arch
Daniel Borkmann 6d2eea6fb0 bpf, arm64: optimize 32/64 immediate emission
Improve the JIT to emit 64 and 32 bit immediates, the current
algorithm is not optimal and we often emit more instructions
than actually needed. arm64 has movz, movn, movk variants but
for the current 64 bit immediates we only use movz with a
series of movk when needed.

For example loading ffffffffffffabab emits the following 4
instructions in the JIT today:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: ffff, shift: 16, result: 00000000ffffabab
  * movk: ffff, shift: 32, result: 0000ffffffffabab
  * movk: ffff, shift: 48, result: ffffffffffffabab

Whereas after the patch the same load only needs a single
instruction:

  * movn: 5454, shift:  0, result: ffffffffffffabab

Another example where two extra instructions can be saved:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: 1f2f, shift: 16, result: 000000001f2fabab
  * movk: ffff, shift: 32, result: 0000ffff1f2fabab
  * movk: ffff, shift: 48, result: ffffffff1f2fabab

After the patch:

  * movn: e0d0, shift: 16, result: ffffffff1f2fffff
  * movk: abab, shift:  0, result: ffffffff1f2fabab

Another example with movz, before:

  * movz: 0000, shift:  0, result: 0000000000000000
  * movk: fea0, shift: 32, result: 0000fea000000000

After:

  * movz: fea0, shift: 32, result: 0000fea000000000

Moreover, reuse emit_a64_mov_i() for 32 bit immediates that
are loaded via emit_a64_mov_i64() which is a similar optimization
as done in 6fe8b9c1f4 ("bpf, x64: save several bytes by using
mov over movabsq when possible"). On arm64, the latter allows to
use a single instruction with movn due to zero extension where
otherwise two would be needed. And last but not least add a
missing optimization in emit_a64_mov_i() where movn is used but
the subsequent movk not needed. With some of the Cilium programs
in use, this shrinks the needed instructions by about three
percent. Tested on Cavium ThunderX CN8890.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14 19:11:45 -07:00
..
alpha mm: introduce MAP_FIXED_NOREPLACE 2018-04-11 10:28:38 -07:00
arc kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers 2018-04-07 19:04:02 +09:00
arm bpf, arm32: save 4 bytes of unneeded stack space 2018-05-14 19:11:45 -07:00
arm64 bpf, arm64: optimize 32/64 immediate emission 2018-05-14 19:11:45 -07:00
c6x c6x: pass endianness info to sparse 2018-04-10 09:58:58 -04:00
h8300 h8300: remove extraneous __BIG_ENDIAN definition 2018-03-22 17:07:01 -07:00
hexagon hexagon: export csum_partial_copy_nocheck 2018-05-01 15:49:50 -05:00
ia64 pci-v4.17-changes 2018-04-06 18:31:06 -07:00
m68k Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu 2018-04-09 09:15:46 -07:00
microblaze Microblaze patches for 4.17-rc1 2018-04-12 10:18:02 -07:00
mips bpf, mips: remove unused function 2018-05-14 19:11:45 -07:00
nds32 page cache: use xa_lock 2018-04-11 10:28:39 -07:00
nios2 nios2 update for v4.17-rc1 2018-04-11 16:02:18 -07:00
openrisc OpenRISC updates for v4.17 2018-04-15 12:27:58 -07:00
parisc parisc: Fix section mismatches 2018-05-02 21:47:35 +02:00
powerpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-05-07 23:35:08 -04:00
riscv RISC-V: build vdso-dummy.o with -no-pie 2018-04-24 10:54:46 -07:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-05-07 23:35:08 -04:00
sh Merge branch 'akpm' (patches from Andrew) 2018-04-14 08:50:50 -07:00
sparc bpf, sparc: remove unused variable 2018-05-14 19:11:45 -07:00
um Merge git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2018-04-11 16:36:47 -07:00
unicore32 unicore32: turn flush_dcache_mmap_lock into a no-op 2018-04-11 10:28:39 -07:00
x86 bpf, x64: clean up retpoline emission slightly 2018-05-14 19:11:45 -07:00
xtensa mm: introduce MAP_FIXED_NOREPLACE 2018-04-11 10:28:38 -07:00
.gitignore
Kconfig kbuild: remove incremental linking option 2018-03-26 02:01:19 +09:00