Commit Graph

3088 Commits

Author SHA1 Message Date
Lukasz Luba
3f7ced7ac9 drivers/thermal/cpufreq_cooling : Refactor thermal_power_cpu_get_power tracing
Simplify the thermal_power_cpu_get_power trace event by removing
complicated cpumask and variable length array. Now the tools parsing trace
output don't have to hassle to get this power data. The simplified format
version uses 'policy->cpu'. Remove also the 'load' information completely
since there is very little value of it in this trace event. To get the
CPUs' load (or utilization) there are other dedicated trace hooks in the
kernel. This patch also simplifies and speeds-up the main cooling code
when that trace event is enabled.

Rename the trace event to avoid confusion of tools which parse the trace
file.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20220613124327.30766-3-lukasz.luba@arm.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-07-28 17:29:42 +02:00
Linus Torvalds
2319be1356 Locking changes in this cycle were:
- rwsem cleanups & optimizations/fixes:
     - Conditionally wake waiters in reader/writer slowpaths
     - Always try to wake waiters in out_nolock path
 
  - Add try_cmpxchg64() implementation, with arch optimizations - and use it to
    micro-optimize sched_clock_{local,remote}()
 
  - Various force-inlining fixes to address objdump instrumentation-check warnings
 
  - Add lock contention tracepoints:
 
     lock:contention_begin
     lock:contention_end
 
  - Misc smaller fixes & cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmKLsrERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1js3g//cPR9PYlvZv87T2hI8VWKfNzapgSmwCsH
 1P+nk27Pef+jfxHr/N7YScvSD06+z2wIroLE3npPNETmNd1X8obBDThmeD4VI899
 J6h4sE0cFOpTG/mHeECFxqnDuzhdHiRHWS52RxOwTjZTpdbeKWZYueC0Mvqn+tIp
 UM2D2yTseIHs67ikxYtayU/iJgSZ+PYrMPv9nSVUjIFILmg7gMIz38OZYQzR84++
 auL3m8sAq/i2pjzDBbXMpfYeu177/tPHpPJr2rOErLEXWqK2K6op8+CbX4z3yv3z
 EBBhGiUNqDmFaFuIgg7Mx94SvPh8MBGexUnT0XA2aXPwyP9oAaenCk2CZ1j9u15m
 /Xp1A4KNvg1WY8jHu5ZM4VIEXQ7d6Gwtbej7IeovUxBD6y7Trb3+rxb7PVdZX941
 uVGjss1Lgk70wUQqBqBPmBm08O6NUF3vekHlona5CZTQgEF84zD7+7D++QPaAZo7
 kiuNUptdgfq6X0xqgP88GX1KU85gJYoF5Q13vb7lAcv19QhRG5JBJeWMYiXEmg12
 Ktl97Sru0zCpCY1NCvwsBll09SLVO9kX3Lq+QFD8bFMZ0obsGIBotHq1qH6U7cH8
 RY6esVBF/1/+qdrxOKs8qowlJ4EUp/3bX0R/MKYHJJbulj/aaE916HvMsoN/QR4Y
 oW7GcxMQGLE=
 =gaS5
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - rwsem cleanups & optimizations/fixes:
    - Conditionally wake waiters in reader/writer slowpaths
    - Always try to wake waiters in out_nolock path

 - Add try_cmpxchg64() implementation, with arch optimizations - and use
   it to micro-optimize sched_clock_{local,remote}()

 - Various force-inlining fixes to address objdump instrumentation-check
   warnings

 - Add lock contention tracepoints:

    lock:contention_begin
    lock:contention_end

 - Misc smaller fixes & cleanups

* tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}
  locking/atomic/x86: Introduce arch_try_cmpxchg64
  locking/atomic: Add generic try_cmpxchg64 support
  futex: Remove a PREEMPT_RT_FULL reference.
  locking/qrwlock: Change "queue rwlock" to "queued rwlock"
  lockdep: Delete local_irq_enable_in_hardirq()
  locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning
  locking: Apply contention tracepoints in the slow path
  locking: Add lock contention tracepoints
  locking/rwsem: Always try to wake waiters in out_nolock path
  locking/rwsem: Conditionally wake waiters in reader/writer slowpaths
  locking/rwsem: No need to check for handoff bit if wait queue empty
  lockdep: Fix -Wunused-parameter for _THIS_IP_
  x86/mm: Force-inline __phys_addr_nodebug()
  x86/kvm/svm: Force-inline GHCB accessors
  task_stack, x86/cea: Force-inline stack helpers
2022-05-24 10:18:23 -07:00
Linus Torvalds
8443516da6 platform-drivers-x86 for v5.19-1
Highlights:
  -  New drivers:
     -  Intel "In Field Scan" (IFS) support
     -  Winmate FM07/FM07P buttons
     -  Mellanox SN2201 support
  -  AMD PMC driver enhancements
  -  Lots of various other small fixes and hardware-id additions
 
 The following is an automated git shortlog grouped by driver:
 
 Documentation:
  -  In-Field Scan
 
 Documentation/ABI:
  -  Add new attributes for mlxreg-io sysfs interfaces
  -  sysfs-class-firmware-attributes: Misc. cleanups
  -  sysfs-class-firmware-attributes: Fix Sphinx errors
  -  sysfs-driver-intel_sdsi: Fix sphinx warnings
 
 acerhdf:
  -  Cleanup str_starts_with()
 
 amd-pmc:
  -  Fix build error unused-function
  -  Shuffle location of amd_pmc_get_smu_version()
  -  Avoid reading SMU version at probe time
  -  Move FCH init to first use
  -  Move SMU logging setup out of init
  -  Fix compilation without CONFIG_SUSPEND
 
 amd_hsmp:
  -  Add HSMP protocol version 5 messages
 
 asus-nb-wmi:
  -  Add keymap for MyASUS key
 
 asus-wmi:
  -  Update unknown code message
  -  Use kobj_to_dev()
  -  Fix driver not binding when fan curve control probe fails
  -  Potential buffer overflow in asus_wmi_evaluate_method_buf()
 
 barco-p50-gpio:
  -  Fix duplicate included linux/io.h
 
 dell-laptop:
  -  Add quirk entry for Latitude 7520
 
 gigabyte-wmi:
  -  Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
  -  added support for B660 GAMING X DDR4 motherboard
 
 hp-wmi:
  -  Correct code style related issues
 
 intel-hid:
  -  fix _DSM function index handling
 
 intel-uncore-freq:
  -  Prevent driver loading in guests
 
 intel_cht_int33fe:
  -  Set driver data
 
 platform/mellanox:
  -  Add support for new SN2201 system
 
 platform/surface:
  -  aggregator: Fix initialization order when compiling as builtin module
  -  gpe: Add support for Surface Pro 8
 
 platform/x86/dell:
  -  add buffer allocation/free functions for SMI calls
 
 platform/x86/intel:
  -  Fix 'rmmod pmt_telemetry' panic
  -  pmc/core: Use kobj_to_dev()
  -  pmc/core: change pmc_lpm_modes to static
 
 platform/x86/intel/ifs:
  -  Add CPU_SUP_INTEL dependency
  -  add ABI documentation for IFS
  -  Add IFS sysfs interface
  -  Add scan test support
  -  Authenticate and copy to secured memory
  -  Check IFS Image sanity
  -  Read IFS firmware image
  -  Add stub driver for In-Field Scan
 
 platform/x86/intel/sdsi:
  -  Fix bug in multi packet reads
  -  Poll on ready bit for writes
  -  Handle leaky bucket
 
 platform_data/mlxreg:
  -  Add field for notification callback
 
 pmc_atom:
  -  dont export pmc_atom_read - no modular users
  -  remove unused pmc_atom_write()
 
 samsung-laptop:
  -  use kobj_to_dev()
  -  Fix an unsigned comparison which can never be negative
 
 stop_machine:
  -  Add stop_core_cpuslocked() for per-core operations
 
 think-lmi:
  -  certificate support clean ups
 
 thinkpad_acpi:
  -  Correct dual fan probe
  -  Add a s2idle resume quirk for a number of laptops
  -  Convert btusb DMI list to quirks
 
 tools/power/x86/intel-speed-select:
  -  Fix warning for perf_cap.cpu
  -  Display error on turbo mode disabled
  -  fix build failure when using -Wl,--as-needed
 
 toshiba_acpi:
  -  use kobj_to_dev()
 
 trace:
  -  platform/x86/intel/ifs: Add trace point to track Intel IFS operations
 
 winmate-fm07-keys:
  -  Winmate FM07/FM07P buttons
 
 wmi:
  -  replace usage of found with dedicated list iterator variable
 
 x86/microcode/intel:
  -  Expose collect_cpu_info_early() for IFS
 
 x86/msr-index:
  -  Define INTEGRITY_CAPABILITIES MSR
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmKKlA0UHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9w0Iwf+PYoq7qtU6j6N2f8gL2s65JpKiSPP
 CkgnCzTP+khvNnTWMQS8RW9VE6YrHXmN/+d3UAvRrHsOYm3nyZT5aPju9xJ6Xyfn
 5ZdMVvYxz7cm3lC6ay8AQt0Cmy6im/+lzP5vA5K68IYh0fPX/dvuOU57pNvXYFfk
 Yz5/Gm0t0C4CKVqkcdU/zkNawHP+2+SyQe+Ua2srz7S3DAqUci0lqLr/w9Xk2Yij
 nCgEWFB1Qjd2NoyRRe44ksLQ0dXpD4ADDzED+KPp6VTGnw61Eznf9319Z5ONNa/O
 VAaSCcDNKps8d3ZpfCpLb3Rs4ztBCkRnkLFczJBgPsBiuDmyTT2/yeEtNg==
 =HdEG
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver updates from Hans de Goede:
 "This includes some small changes to kernel/stop_machine.c and arch/x86
  which are deps of the new Intel IFS support.

  Highlights:

   - New drivers:
       - Intel "In Field Scan" (IFS) support
       - Winmate FM07/FM07P buttons
       - Mellanox SN2201 support

   -  AMD PMC driver enhancements

   -  Lots of various other small fixes and hardware-id additions"

* tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits)
  platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency
  platform/x86: intel_cht_int33fe: Set driver data
  platform/x86: intel-hid: fix _DSM function index handling
  platform/x86: toshiba_acpi: use kobj_to_dev()
  platform/x86: samsung-laptop: use kobj_to_dev()
  platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
  tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu
  tools/power/x86/intel-speed-select: Display error on turbo mode disabled
  Documentation: In-Field Scan
  platform/x86/intel/ifs: add ABI documentation for IFS
  trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
  platform/x86/intel/ifs: Add IFS sysfs interface
  platform/x86/intel/ifs: Add scan test support
  platform/x86/intel/ifs: Authenticate and copy to secured memory
  platform/x86/intel/ifs: Check IFS Image sanity
  platform/x86/intel/ifs: Read IFS firmware image
  platform/x86/intel/ifs: Add stub driver for In-Field Scan
  stop_machine: Add stop_core_cpuslocked() for per-core operations
  x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
  x86/microcode/intel: Expose collect_cpu_info_early() for IFS
  ...
2022-05-23 20:38:39 -07:00
Linus Torvalds
6e01f86fb2 Updates for timers and timekeeping core code:
- Expose CLOCK_TAI to instrumentation to aid with TSN debugging.
 
   - Ensure that the clockevent is stopped when there is no timer armed to
     avoid pointless wakeups.
 
   - Make the sched clock frequency handling and rounding consistent.
 
   - Provide a better debugobject hint for delayed works. The timer callback
     is always the same, which makes it difficult to identify the underlying
     work. Use the work function as a hint instead.
 
   - Move the timer specific sysctl code into the timer subsystem.
 
   - The usual set of improvements and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKLPHMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZBoEACIURtS8w9PFZ6q/2mFq0pTYi/uI/HQ
 vqbB6gCbrjfL6QwInd7jxDc/UoqEOllG9pTaGdWx/0Gi9syDosEbeop7cvvt2xi+
 pReoEN1kVI3JAVrQFIAuGw4EMuzYB8PfuZkm1PdozcCP9qkgDmtippVxe05sFQ+/
 RPdA29vE3g63eXkSFBhEID23pQR8yKLbqVq6KcH87OipZedL+2fry3yB+/9sLuuU
 /PFLbI6B9f43S2sfo6szzpFkpd6tJlBlu02IrB6gh4IxKrslmZb5onpvcf6iT+19
 rFh5A15GFWoZUC8EjH1sBpATq3wA/jfGEOPWgy07N5SmobtJvWSM5yvT+gC3qXqm
 C/bjyjqXzLKftG7KIXo/hWewtsjdovMbdfcMBsGiatytNBZfI1GR/4Pq60/qpTHZ
 qJo35trOUcP6o1njphwONy3lisq78S7xaozpWO1hIMTcAqGgBkm/lOieGMM4hGnE
 Ps0Im3ZsOXNGllulN+3h+UHstM5/y6f/vzBsw7pfIG66i6KqebAiNjbMfHCr22sX
 7UavNCoFggUQgZVgUYX/AscdW4/Dwx6R5YUqj1EBqztknd70Ac4TqjaIz4Xa6ZER
 z+eQSSt5XqqV2eKWA4FsQYmCIc+BvQ4apSA6+whz9vmsvCYtB7zzSfeh+xkgcl1/
 Cc0N6G5+L9v0Gw==
 =De28
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer and timekeeping updates from Thomas Gleixner:

 - Expose CLOCK_TAI to instrumentation to aid with TSN debugging.

 - Ensure that the clockevent is stopped when there is no timer armed to
   avoid pointless wakeups.

 - Make the sched clock frequency handling and rounding consistent.

 - Provide a better debugobject hint for delayed works. The timer
   callback is always the same, which makes it difficult to identify the
   underlying work. Use the work function as a hint instead.

 - Move the timer specific sysctl code into the timer subsystem.

 - The usual set of improvements and cleanups

* tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Provide a better debugobjects hint for delayed works
  time/sched_clock: Fix formatting of frequency reporting code
  time/sched_clock: Use Hz as the unit for clock rate reporting below 4kHz
  time/sched_clock: Round the frequency reported to nearest rather than down
  timekeeping: Consolidate fast timekeeper
  timekeeping: Annotate ktime_get_boot_fast_ns() with data_race()
  timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
  timekeeping: Introduce fast accessor to clock tai
  tracing/timer: Add missing argument documentation of trace points
  clocksource: Replace cpumask_weight() with cpumask_empty()
  timers: Move timer sysctl into the timer code
  clockevents: Use dedicated list iterator variable
  timers: Simplify calc_index()
  timers: Initialize base::next_expiry_recalc in timers_prepare_cpu()
2022-05-23 17:05:55 -07:00
Linus Torvalds
9836e93c0a for-5.19/io_uring-passthrough-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKovAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpv9oD/4qCs7k3bPZZWZ6xoWb4EObyyWOUifi26lp
 vpsJHFUbA67S/i4++LV9H18SazWJ7h08ac4bjgZ+NQz40/1WkTN8/Fa76jo+BnNK
 7T10Wp4Ak6uwWVrKaA81pnT+G9+xmHlJ3X27aKxzLuT7BEPpShZ6ouFVjTkx9CzN
 LrLjuCDTOBBN+ZoaroWYfdLwTQX2VCAl9B15lOtQIlFvuuU8VlrvLboY+80K8TvY
 1wvTA2HTjnXoYx+/cTTMIFZIwQH3r1hsbwEDD8/YJj1+ouhSRQ1b0p/nk2pA+3ws
 HF5r/YS/rLBjlPF094IzeOBaUyA433AN1VhZqnII8ek7ViT3W3x+BRrgE9O6ZkWT
 0AjX1BXReI5rdFmxBmwsSdBnrSoGaJOf2GdsCCdubXBIi+F/RvyajrPf7PTB5zbW
 9WEK/uy3xvZsRVkUGAzOb9QGdvjcllgMzwPJsDegDCw5PdcPdT3mzy6KGIWipFLp
 j8R+br7hRMpOJv/YpihJDMzSDkQ/r1/SCwR4fpLid/QdSHG/eRTQK6c4Su5bNYEy
 QDy2F6kQdBVtEJCQHcEOsbhXzSTNBcdB+ujUUM5653FkaHe6y4JbomLrsNx407Id
 i/4ROwA5K1dioJx503Eap+OhbI5rV+PFytJTwxvLrNyVGccwbH2YOVq80fsVBP2e
 cZbn6EX4Vg==
 =/peE
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block

Pull io_uring NVMe command passthrough from Jens Axboe:
 "On top of everything else, this adds support for passthrough for
  io_uring.

  The initial feature for this is NVMe passthrough support, which allows
  non-filesystem based IO commands and admin commands.

  To support this, io_uring grows support for SQE and CQE members that
  are twice as big, allowing to pass in a full NVMe command without
  having to copy data around. And to complete with more than just a
  single 32-bit value as the output"

* tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block: (22 commits)
  io_uring: cleanup handling of the two task_work lists
  nvme: enable uring-passthrough for admin commands
  nvme: helper for uring-passthrough checks
  blk-mq: fix passthrough plugging
  nvme: add vectored-io support for uring-cmd
  nvme: wire-up uring-cmd support for io-passthru on char-device.
  nvme: refactor nvme_submit_user_cmd()
  block: wire-up support for passthrough plugging
  fs,io_uring: add infrastructure for uring-cmd
  io_uring: support CQE32 for nop operation
  io_uring: enable CQE32
  io_uring: support CQE32 in /proc info
  io_uring: add tracing for additional CQE32 fields
  io_uring: overflow processing for CQE32
  io_uring: flush completions for CQE32
  io_uring: modify io_get_cqe for CQE32
  io_uring: add CQE32 completion processing
  io_uring: add CQE32 setup processing
  io_uring: change ring size calculation for CQE32
  io_uring: store add. return values for CQE32
  ...
2022-05-23 13:06:15 -07:00
Linus Torvalds
368da430d0 for-5.19/io_uring-socket-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKorgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpm0eEACdTzhm7h5cXn9KjIvWLkdocAb/NOL8GYPn
 Q1mY1SqKQFZvs/fyKHkkZEiIBPxhvN6snVFXMpb4LDmPYeeH4GTUlNomrGTIjvf/
 j6SnZN4lCs9A2NlE+iDVWnFQOPQFALza2Y9BhC5xzay326qnKlO+0fQv3C1vXXrc
 /PNLqxQr7+GmO0a0PJnS6mGWGj6qF7nLqilB9apnKsTK6BKbJEec6ciKreqxU6ME
 WHaux11uIAbcf8rc6C/2myEK0k6jCOAue3vZ0lizygf+8klUCl2vMqV5BLwCBlXG
 /e7hBsUUrGr0CG0fryqhQQTUxsZLshioBbQH1vttSeZCli46mmWWAhPNy3/jb1ZU
 72bazA84Fe9ney9uVZvZoMoBsG+6t6UOatqND13MeRFAXnkRr0jZRuau2iBxgqAr
 OINJW+IVPU7IrCD+S4lV1/LCdhLhYcob8/zfKmIrdHMQnWG/gLonVpYJIBCyLDAv
 2jvHFIPJuSMUSGVjRKCb16LLNV6u7YG6VOWbKuippxfJxDdwA3TOtOhvTJIpYq0u
 TotPgpZ7bfcr4xDsGgD9mZS8E7jwsL/G0/MwsnixELykEXuhd++sgoTbr+RyUYdV
 45Hm6DsxlytjzOb/5uQrqhwrso05eVt14K74XApPa3fWKL8aWCh1jGSdo3CSbIyW
 iHwss919Ag==
 =nb5i
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block

Pull io_uring socket() support from Jens Axboe:
 "This adds support for socket(2) for io_uring. This is handy when using
  direct / registered file descriptors with io_uring.

  Outside of those two patches, a small series from Dylan on top that
  improves the tracing by providing a text representation of the opcode
  rather than needing to decode this by reading the header file every
  time.

  That sits in this branch as it was the last opcode added (until it
  wasn't...)"

* tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block:
  io_uring: use the text representation of ops in trace
  io_uring: rename op -> opcode
  io_uring: add io_uring_get_opcode
  io_uring: add type to op enum
  io_uring: add socket(2) support
  net: add __sys_socket_file()
2022-05-23 12:42:33 -07:00
Linus Torvalds
09beaff75e for-5.19/io_uring-xattr-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKopkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgakEACktFtUBQLrYOXbM/mVxMpR//ht4e29E8k0
 j/DkqK0yDKn9VkvDryALguH+ixNSI9Z4N7xSELLb/meNQsbJ7YdprL3xJn3BoUgs
 3zx44janE8J3Q5TsXvD2z2jPIMaT892t5+5aLFYZqP1g+KDXI8T1WpHsETMkKfRG
 ZPeerUrd0fhtnDpViaaYbRutIEt8V8tsPhh0XG/4GojWjUW0FTsRKBSGuQ0sQnUr
 aJDfF5VylOjOBzRGimGZ23vJIgtZ8UEpX0T2MxR5V6ffj4cI8bCFQOrphh7yHxF5
 f09xte80zX6pow5AivIpultZShR6IoQG5DIvF59woNP16uXy5yUyVTQvdnt8RlyY
 RjLd8ro9Gt4wBQGqckJLyY/o1FGhaQ8S99wOixUlpb9qKAOGmQZI97FQKFENqx/1
 Xe+bP6QmTt9uCXsYPIFBtZaaEv2u0yjHOyERFUSzKJQUuPTa5Rmen0EXYXRhe5/E
 p+sR3Qbk1wzlW7UHuCT2gcaI67SAFG+yDv1U6BAaVdcS71i0WCA+Q2a6AuB+NJzg
 ER4+JRoeOnjEXSP2UPvIUBL1Komdj4R2hnrOK4S80R3yQ3NaadrWywhBn5HNcniM
 wE2P6J0erzRFqyfBw9tyNLsZwR1iS7JqSD9/NuBLoWwb42O0l+WgqqwDTSxMsde4
 egKBaidRqg==
 =CfhD
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block

Pull io_uring xattr support from Jens Axboe:
 "Support for the xattr variants"

* tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block:
  io_uring: cleanup error-handling around io_req_complete
  io_uring: fix trace for reduced sqe padding
  io_uring: add fgetxattr and getxattr support
  io_uring: add fsetxattr and setxattr support
  fs: split off do_getxattr from getxattr
  fs: split off setxattr_copy and do_setxattr function from setxattr
2022-05-23 12:30:30 -07:00
Linus Torvalds
3a166bdbf3 for-5.19/io_uring-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKol0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpn+sEACbdEQqG6OoCOhJ0ZuxTdQqNMGxCImKBxjP
 8Bqf+0hYNgwfG+80/UQvmc7olb+KxvZ6KtrgViC/ujhvMQmX0Xf/881kiiKG/iHJ
 XKoL9PdqIkenIGnlyEp1uRmnUbooYF+s4iT6Gj/pjnn29GbcKjsPzKV1CUNkt3GC
 R+wpdKczHQDaSwzDY5Ntyjf68QUQOyUznkHW+6JOcBeih3ET7NfapR/zsFS93RlL
 B9pQ9NiBBQfzCAUycVyQMC+p/rJbKWgidAiFk4fXKRm8/7iNwT4dB0+oUymlECxt
 xvalRVK6ER1s4RSdQcUTZoQA+SrzzOnK1DYja9cvcLT3wH+aojana6S0rOMDi8wp
 hoWT5jdMaZN09Vcm7J4sBN15i50m9aDITp21PKOVDZXSMVsebltCL9phaN5+9x/j
 AfF6Vki1WTB4gYaDHR8v6UkW+HcF1WOmMdq8GB9UMfnTya6EJqAooYT9lhQBP/rv
 jxkdj9Fu98O87dOfy1Av9AxH1UB8d7ypCJKkSEMAUPoWf0rC9HjYr0cRq/yppAj8
 pI/0PwXaXRfQuoHPqZyETrPel77VQdBw+Hg+6TS0KlTd3WlVEJMZJPtXK466IFLp
 pYSRVnSI9PuhiClOpxriTCw0cppfRIv11IerCxRziqH9S1zijk0VBCN40//XDs1o
 JfvoA6htKQ==
 =S+Uf
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:
 "Here are the main io_uring changes for 5.19. This contains:

   - Fixes for sparse type warnings (Christoph, Vasily)

   - Support for multi-shot accept (Hao)

   - Support for io_uring managed fixed files, rather than always
     needing the applicationt o manage the indices (me)

   - Fix for a spurious poll wakeup (Dylan)

   - CQE overflow fixes (Dylan)

   - Support more types of cancelations (me)

   - Support for co-operative task_work signaling, rather than always
     forcing an IPI (me)

   - Support for doing poll first when appropriate, rather than always
     attempting a transfer first (me)

   - Provided buffer cleanups and support for mapped buffers (me)

   - Improve how io_uring handles inflight SCM files (Pavel)

   - Speedups for registered files (Pavel, me)

   - Organize the completion data in a struct in io_kiocb rather than
     keep it in separate spots (Pavel)

   - task_work improvements (Pavel)

   - Cleanup and optimize the submission path, in general and for
     handling links (Pavel)

   - Speedups for registered resource handling (Pavel)

   - Support sparse buffers and file maps (Pavel, me)

   - Various fixes and cleanups (Almog, Pavel, me)"

* tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits)
  io_uring: fix incorrect __kernel_rwf_t cast
  io_uring: disallow mixed provided buffer group registrations
  io_uring: initialize io_buffer_list head when shared ring is unregistered
  io_uring: add fully sparse buffer registration
  io_uring: use rcu_dereference in io_close
  io_uring: consistently use the EPOLL* defines
  io_uring: make apoll_events a __poll_t
  io_uring: drop a spurious inline on a forward declaration
  io_uring: don't use ERR_PTR for user pointers
  io_uring: use a rwf_t for io_rw.flags
  io_uring: add support for ring mapped supplied buffers
  io_uring: add io_pin_pages() helper
  io_uring: add buffer selection support to IORING_OP_NOP
  io_uring: fix locking state for empty buffer group
  io_uring: implement multishot mode for accept
  io_uring: let fast poll support multishot
  io_uring: add REQ_F_APOLL_MULTISHOT for requests
  io_uring: add IORING_ACCEPT_MULTISHOT for accept
  io_uring: only wake when the correct events are set
  io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
  ...
2022-05-23 12:22:49 -07:00
Vasily Averin
0e7579ca73 io_uring: fix incorrect __kernel_rwf_t cast
Currently 'make C=1 fs/io_uring.o' generates sparse warning:

  CHECK   fs/io_uring.c
fs/io_uring.c: note: in included file (through
include/trace/trace_events.h, include/trace/define_trace.h, i
nclude/trace/events/io_uring.h):
./include/trace/events/io_uring.h:488:1:
 warning: incorrect type in assignment (different base types)
    expected unsigned int [usertype] op_flags
    got restricted __kernel_rwf_t const [usertype] rw_flags

This happen on cast of sqe->rw_flags which is defined as __kernel_rwf_t,
this type is bitwise and requires __force attribute for any casts.
However rw_flags is a member of the union, and its access can be safely
replaced by using of its neighbours, so let's use poll32_events to fix
the sparse warning.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Link: https://lore.kernel.org/r/6f009241-a63f-ae43-a04b-62841aaef293@openvz.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-19 12:27:59 -06:00
Linus Torvalds
01464a73a6 io_uring-5.18-2022-05-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKFeygQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsXxD/9dnVf1LCfVR3wQpV/O7o1u5zRpjTSFetmS
 qGw4kMDUtfE63oCLzGW/WpmQLrHduoTa32Ms/v0vU1+1MixUHatzRzN5OF2Nv7Wv
 Ukc6JtH2PjcYFNouAFJ8r/H1uanrr9DOK7R/9n3t9CwOIoand+PUmZ0rIHVN9u0j
 KtpousBih66c7tajADqSxKb1uPlDrgTkzfAYVj5O4OwBAiIwY2CLJfvbLFTdoHP9
 bdCrsvo9R08QRH9PedPqctdQURBxfzd0kXm3N8+d4r4GEQvbaGI+6bN88VYW2aEa
 VfBCZD4A8/VupcGWWhAnKu0rOwxgrh2jDEd5g+0g/f3SRTY2LBUJAlEMM7N043vX
 MtAuO05E3/Svr5doAwAJsp44h6SvS2s5ogvPIYjEUvY4JK2UDdWOcYrb7HqXrNBi
 0jNsgUu4+N89i6jiewf0dPw6BKK7Cf9322fGy4X1dQ77gIkDWt4Z55iV46aSX3Y8
 dQt2Jaoj0c/wRkOgZtrDi7FmdM0xcMzGMB/oOyI07d3ERjbyZuDTFowWh/kqd1xA
 6vsrkyCQaGilmDu9xTywPSfarKoxIrX+TgZj5rnnJpv2tENJSx/o8yKWcu9wnO1L
 gnanzpwYW4qIOjkWKvIBLBHXXA6Qp8aXacif8rODUhIuABQpe85b+btpHeqO9lWL
 a3wS9Gc2nQ==
 =8Xxv
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two small changes fixing issues from the 5.18 merge window:

   - Fix wrong ordering of a tracepoint (Dylan)

   - Fix MSG_RING on IOPOLL rings (me)"

* tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block:
  io_uring: don't attempt to IOPOLL for MSG_RING requests
  io_uring: fix ordering of args in io_uring_queue_async_work
2022-05-18 14:21:30 -10:00
Tony Luck
51af802fc0 trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
Add tracing support which may be useful for debugging systems that fail to complete
In Field Scan tests.

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220506225410.1652287-11-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Dylan Yudaken
2d2d5cb6ca io_uring: fix ordering of args in io_uring_queue_async_work
Fix arg ordering in TP_ARGS macro, which fixes the output.

Fixes: 502c87d655 ("io-uring: Make tracepoints consistent.")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220512091834.728610-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12 06:18:58 -06:00
Delyan Kratunov
9c2136be08 sched/tracing: Append prev_state to tp args instead
Commit fa2c3254d7 (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.

This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.

If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.

Fixes: fa2c3254d7 (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
2022-05-12 00:37:11 +02:00
Stefan Roesch
c4bb964fa0 io_uring: add tracing for additional CQE32 fields
This adds tracing for the extra1 and extra2 fields.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-10-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09 06:35:34 -06:00
Dylan Yudaken
033b87d24f io_uring: use the text representation of ops in trace
It is annoying to translate opcodes to textwhen tracing io_uring. Use the
io_uring_get_opcode function instead to use the text representation.

A downside here might have been that if the opcode is invalid it will not
be obvious, however the opcode is already overridden in these cases to
0 (NOP) in io_init_req(). Therefore this is a non issue.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-5-dylany@fb.com
[axboe: don't include register, those are not req opcodes]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-28 17:06:03 -06:00
Dylan Yudaken
1460af7de6 io_uring: rename op -> opcode
do this for consistency with the other trace messages

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-26 06:50:42 -06:00
Jens Axboe
0200ce6a57 io_uring: fix trace for reduced sqe padding
__pad2 is only 1 u64 now, the other one is addr3. Adjust the trace so
that it matches up.

Fixes: a56834e0fa ("io_uring: add fgetxattr and getxattr support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:46 -06:00
Dylan Yudaken
47894438e9 io_uring: add trace support for CQE overflow
Add trace function for overflowing CQ ring.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Thomas Gleixner
ce8abf340e Provide the NMI safe accessor to clock TAI for the tracing tree.
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJYLboTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoW94D/wPA3Pf3Dk5MBILSSd9RFwHQK9z+X2l
 Np60Cuh0aetlwNILGafJ6VG34zwjR4Z5eutAM1zy14ehRXmQSEoE5OG5ixLOg83o
 8IPZRKwdG7C3WnLn+s0OdTEpgGaDTxHPOrJOXsgG9G5NwVzUEad+srePSxhDSnLn
 LCKWaLnGWGC2ymbz/0TTZkhzkVyOEElY7SubF6qn8J4T9XPjdYUsZ/r1cNGBSfFD
 uViQYZpjUpNFmalwNldpgIZidDBHvTnlaE610jZHQKEczs9mg2EpmAPyb/e5MKcs
 tMmFcFQoNX3o0MAdRmRQD46fDAD0RKb9mP2LF+52/QqI6V8Pvk5OV/RXl/WGR/1B
 KxZ6qQFXasoXujfaELAnRSRve+k2xrsyEY6yikg3G/JmrMgCSVgvfSr0CTFbbdnG
 pAXC4eO94SqyCYdVU9DJZO7fhwREGF8P7qI01PYGF1B4GseQ8gaNuXCTBrkYsio2
 60Nv7GFiajRjUOdPNvhnWAMvnLRpnItgV3yB7nXAt15io9AkdCXsqPq3x69cxYYU
 X+CQZPG0l+tG/BERYdjAjr7/Ij0NCqteKz7UqHIF0747DLQHagms7dmk/UWsKRfU
 bFyCKvLttAFYRU5lPEqjTIDfPJcqoePreqB1xRRmeerV4id26If59a9yMkFcUuF/
 ZQAkAZGCc82YJA==
 =mALw
 -----END PGP SIGNATURE-----

Merge tag 'tai-for-tracing' into timers/core

Pull in the NMI safe TAI accessor which was provided for the tracing tree
to prepare for further changes in this area.
2022-04-14 16:55:47 +02:00
Anna-Maria Behnsen
fde33ca4cb tracing/timer: Add missing argument documentation of trace points
Documentation of trace points timer_start, timer_expire_entry and
hrtimer_start lack always the last argument. Add it to keep implementation
and documentation in sync.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220411140115.24185-1-anna-maria@linutronix.de
2022-04-14 16:14:49 +02:00
Linus Torvalds
c1488c9751 NFSD bug fixes for 5.18-rc:
- Fix a write performance regression
 - Fix crashes during request deferral on RDMA transports
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmJVnIgACgkQM2qzM29m
 f5cdQA//SS2pbiAcjePL6zuVrd/eeALWRJIdCj8fP9eQ4dfi7OM0xQE3qHyLDC3n
 2fjciUSnoE+lIYiRXD7Ii+GqYThSrrxlR7aCT+vrIAL/ksnzTeUiUSuEihPOOYjn
 ZEz9vkP4wlCTFh4SdeBoETEjksvRU3ql37l4S04nxxgidpY/NkF5wZH5iUlV8Z14
 Q8OxJJk1tFJARPz6RqGrRkMws24NhYzeuXQktVA+AOoKjUeQSp4bN1ZSCJv3eX/E
 EjAPDFMYVdenp8OY9RJoP3Xpxb6e1mv5flXYQa7YpJUVH3AccMJ8aKuFadjGfGNS
 2tEzoipJR0fmxkxpDX1g0Bzm8fAIc467l4tskVzUxLnqNS/FjXv2P85h8p9U/R06
 BWbF5CxRu89h1tZ/ZIh4lfw/Ro94XvYaYCDeu9V1TndP+QroNDDY9Ypl13+uy6uS
 zLwEEo5nMUbc7FVmT7UidHqeEukwbNzmXEXYrgQD5hRaT9L+85R0L5/Y+gi+ODDK
 7SKu1Bomi3WN7WKzjvZspHMivVT6JNy/ngHlKSYkWYl/dJzMy5I4Z4sNRVCHKoKX
 17SpKHfKkZAt5oS54dZ40O1PhICZTMclAB7Vb8bs/yggrHTsqwqf21v8FTJW79K9
 MjnYigEWgwi62GC5WdZr5LqAkqRHi2K2rLnFwVSLtZvpb2sxPdM=
 =3AeS
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a write performance regression

 - Fix crashes during request deferral on RDMA transports

* tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix the svc_deferred_event trace class
  SUNRPC: Fix NFSD's request deferral on RDMA transports
  nfsd: Clean up nfsd_file_put()
  nfsd: Fix a write performance regression
  SUNRPC: Return true/false (not 1/0) from bool functions
2022-04-12 14:23:19 -10:00
Linus Torvalds
1a3b1bba7c NFS client bugfixes for Linux 5.18
Highlights include:
 
 Stable fixes:
 - SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
 
 Bugfixes:
 - Fix an Oopsable condition due to SLAB_ACCOUNT setting in the NFSv4.2
   xattr code.
 - Fix for open() using an file open mode of '3' in NFSv4
 - Replace readdir's use of xxhash() with hash_64()
 - Several patches to handle malloc() failure in SUNRPC
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJQb5IACgkQZwvnipYK
 APLVeRAAheQX422gIrjrWsQrZxJWmMDBBj9qwiV14XMs6g5KF/DtNbIJm1BGjA9Y
 aLgNVknSDcGHMmvdh5MCVF7R61sqkCNbQWP3Bx6OBP6ei3FEgQBPyujcFXAaZ7UK
 T4fBzfPoEYaxVkEG/L5BsAUnh6TvyCfNm1oqZZv2bv9M0U6UlXnRLWZN9I6cAtcw
 GI9HgnufWOxJWIqPd9dY45aF44Ru+aJ953eecPh0G83Reoc99EAU+PvJJD18Wl0H
 BZqM6ar5pNush50yVIjnPFXg+sM97dKGlLvYL11Uy7f8valSaBXPZLZQ/jG4Vx/a
 m/l8FVwBEV89BG7z6jKNju7ERbO+xbPgXP8lSwkj69fXOpuvzo/G6VAxS6ZJww12
 p6TJnhCMSEF7qrQc5mejA803dT4MiJWo4i2th482Ws/tRN5H+y9pDYLsvBPM8iHo
 sVkJBco04tBHEH3qKgDTu8EY7/shH+GVO3Wmtcjz47T2rbmQB7gJtfipHcLmV2qd
 Jy1tiz1T7rhGs7KNtWpu8210MY9iFhIt3rAvdGdeVmgnNjrmRex0Sxqx8vgKUAFE
 aQjXkkpwHA3rHWnuGPMKu5i48BPFQ8m4QgdVC9F6ylNT6RIbImqplQiwIYNIxYC+
 cBohUG1yYZRjFEMkXBdfK5cImkTLW7RkLCatRh5v8OU8xqpRVL8=
 =edfs
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Stable fixes:

   - SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()

  Bugfixes:

   - Fix an Oopsable condition due to SLAB_ACCOUNT setting in the
     NFSv4.2 xattr code.

   - Fix for open() using an file open mode of '3' in NFSv4

   - Replace readdir's use of xxhash() with hash_64()

   - Several patches to handle malloc() failure in SUNRPC"

* tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()
  SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
  SUNRPC: Handle allocation failure in rpc_new_task()
  NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename()
  NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutget
  SUNRPC: Handle low memory situations in call_status()
  SUNRPC: Handle ENOMEM in call_transmit_status()
  NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation
  SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
  NFS: Replace readdir's use of xxhash() with hash_64()
  SUNRPC: handle malloc failure in ->request_prepare
  NFSv4: fix open failure with O_ACCMODE flag
  Revert "NFSv4: Handle the special Linux file open access mode"
2022-04-08 07:39:17 -10:00
Trond Myklebust
f00432063d SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.

Reported-by: Felix Fu <foyjog@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: a73881c96d ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect
Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable@vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07 16:19:47 -04:00
Chuck Lever
4d5004451a SUNRPC: Fix the svc_deferred_event trace class
Fix a NULL deref crash that occurs when an svc_rqst is deferred
while the sunrpc tracing subsystem is enabled. svc_revisit() sets
dr->xprt to NULL, so it can't be relied upon in the tracepoint to
provide the remote's address.

Unfortunately we can't revert the "svc_deferred_class" hunk in
commit ece200ddd5 ("sunrpc: Save remote presentation address in
svc_xprt for trace events") because there is now a specific check
of event format specifiers for unsafe dereferences. The warning
that check emits is:

  event svc_defer_recv has unsafe dereference of argument 1

A "%pISpc" format specifier with a "struct sockaddr *" is indeed
flagged by this check.

Instead, take the brute-force approach used by the svcrdma_qp_error
tracepoint. Convert the dr::addr field into a presentation address
in the TP_fast_assign() arm of the trace event, and store that as
a string. This fix can be backported to -stable kernels.

In the meantime, commit c6ced22997 ("tracing: Update print fmt
check to handle new __get_sockaddr() macro") is now in v5.18, so
this wonky fix can be replaced with __sockaddr() and friends
properly during the v5.19 merge window.

Fixes: ece200ddd5 ("sunrpc: Save remote presentation address in svc_xprt for trace events")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-04-07 10:22:51 -04:00
Peter Zijlstra
dc1f7893a7 locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning
Have the trace_contention_*() tracepoints consistently include
adaptive spinning. In order to differentiate between the spinning and
non-spinning states add LCB_F_MUTEX and combine with LCB_F_SPIN.

The consequence is that a mutex contention can now triggler multiple
_begin() tracepoints before triggering an _end().

Additionally, this fixes one path where mutex would trigger _end()
without ever seeing a _begin().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-04-05 10:24:36 +02:00
Namhyung Kim
16edd9b511 locking: Add lock contention tracepoints
This adds two new lock contention tracepoints like below:

 * lock:contention_begin
 * lock:contention_end

The lock:contention_begin takes a flags argument to classify locks.  I
found it useful to identify what kind of locks it's tracing like if
it's spinning or sleeping, reader-writer lock, real-time, and per-cpu.

Move tracepoint definitions into mutex.c so that we can use them
without lockdep.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Link: https://lkml.kernel.org/r/20220322185709.141236-2-namhyung@kernel.org
2022-04-05 10:24:35 +02:00
Linus Torvalds
09bb8856d4 Updates to Tracing:
- Rename the staging files to give them some meaning.
   Just stage1,stag2,etc, does not show what they are for
 
 - Check for NULL from allocation in bootconfig
 
 - Hold event mutex for dyn_event call in user events
 
 - Mark user events to broken (to work on the API)
 
 - Remove eBPF updates from user events
 
 - Remove user events from uapi header to keep it from being installed.
 
 - Move ftrace_graph_is_dead() into inline as it is called from hot paths
   and also convert it into a static branch.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYkmyIBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qutfAQD90gbUgFMFe2akF5sKhonF5T6mm0+w
 BsWqNlBEKBxmfwD+Krfpxql/PKp/gCufcIUUkYC4E6Wl9akf3eO1qQel1Ao=
 =ZTn1
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:

 - Rename the staging files to give them some meaning. Just
   stage1,stag2,etc, does not show what they are for

 - Check for NULL from allocation in bootconfig

 - Hold event mutex for dyn_event call in user events

 - Mark user events to broken (to work on the API)

 - Remove eBPF updates from user events

 - Remove user events from uapi header to keep it from being installed.

 - Move ftrace_graph_is_dead() into inline as it is called from hot
   paths and also convert it into a static branch.

* tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Move user_events.h temporarily out of include/uapi
  ftrace: Make ftrace_graph_is_dead() a static branch
  tracing: Set user_events to BROKEN
  tracing/user_events: Remove eBPF interfaces
  tracing/user_events: Hold event_mutex during dyn_event_add
  proc: bootconfig: Add null pointer check
  tracing: Rename the staging files for trace_events
2022-04-03 12:26:01 -07:00
Steven Rostedt (Google)
84055411d8 tracing: Rename the staging files for trace_events
When looking for implementation of different phases of the creation of the
TRACE_EVENT() macro, it is pretty useless when all helper macro
redefinitions are in files labeled "stageX_defines.h". Rename them to
state which phase the files are for. For instance, when looking for the
defines that are used to create the event fields, seeing
"stage4_event_fields.h" gives the developer a good idea that the defines
are in that file.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02 08:40:04 -04:00
Linus Torvalds
f008b1d6e1 Netfs prep for write helpers
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmI1HOwACgkQ+7dXa6fL
 C2u9mA/+LUdXHqlvET/PAtFTg75bUPeOFGLnuDnYl1Ng2FCKMSodAohpbVtENxsK
 E/gTVS7uiVZFQgC+YmNA00z6eIQkAaDVyvKyEcUbKREBbUgONfJ/HLeaK/NvVKxx
 TY5gx/POdG6yHRQXL6JGBqSJUB8bZrGKwnJm8ebzeKOji9n7GSJBYiMlYBA7EAhs
 Aut/P7Y39ISHLw3y+y5czBeRoubljmTyznbP20xUZEzrRwhTpNwpJVzBGUZU635T
 93Sqcp//0U5LIdn6Pg6DUGHBMBTNDNJChb21ZoBusF/HHswXsOOnf/mcRUBSJUTI
 M1WSpNLk8PRBgajMdIymQpGU1sCZZzJ3krrSA3RcXdN6GPHwZg8kKjoroHsLDL6l
 igPbDSMJ5wfiwA2A2gXbY1CkAl3ik5ccb7ZqhTwS0WBk0vOnHmAsE9cs/bBo7Xii
 GTiWXEFOgtJiXANPMS2P9DiOS3ZQNf+wxotCYdkGPOXuX9wnIo1Kmy8XfujQ1bXf
 pJsEZKfeyROKrzyKWgqLI64/Kg5xNueoFQZfDpOlZYzF1uDstynADPUt0eQD706q
 jcuKaXLN3rn5gSPun5mWOYbRtXVgOLdFL/7zptMVJwFKBFguQENhjG4UMNZcjkVA
 3Mr0kGocsgoCSk1oDBkFlrw1wIsXxWbkRBL1Pww6kovivuGUwoo=
 =j0yx
 -----END PGP SIGNATURE-----

Merge tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull netfs updates from David Howells:
 "Netfs prep for write helpers.

  Having had a go at implementing write helpers and content encryption
  support in netfslib, it seems that the netfs_read_{,sub}request
  structs and the equivalent write request structs were almost the same
  and so should be merged, thereby requiring only one set of
  alloc/get/put functions and a common set of tracepoints.

  Merging the structs also has the advantage that if a bounce buffer is
  added to the request struct, a read operation can be performed to fill
  the bounce buffer, the contents of the buffer can be modified and then
  a write operation can be performed on it to send the data wherever it
  needs to go using the same request structure all the way through. The
  I/O handlers would then transparently perform any required crypto.
  This should make it easier to perform RMW cycles if needed.

  The potentially common functions and structs, however, by their names
  all proclaim themselves to be associated with the read side of things.

  The bulk of these changes alter this in the following ways:

   - Rename struct netfs_read_{,sub}request to netfs_io_{,sub}request.

   - Rename some enums, members and flags to make them more appropriate.

   - Adjust some comments to match.

   - Drop "read"/"rreq" from the names of common functions. For
     instance, netfs_get_read_request() becomes netfs_get_request().

   - The ->init_rreq() and ->issue_op() methods become ->init_request()
     and ->issue_read(). I've kept the latter as a read-specific
     function and in another branch added an ->issue_write() method.

  The driver source is then reorganised into a number of files:

        fs/netfs/buffered_read.c        Create read reqs to the pagecache
        fs/netfs/io.c                   Dispatchers for read and write reqs
        fs/netfs/main.c                 Some general miscellaneous bits
        fs/netfs/objects.c              Alloc, get and put functions
        fs/netfs/stats.c                Optional procfs statistics.

  and future development can be fitted into this scheme, e.g.:

        fs/netfs/buffered_write.c       Modify the pagecache
        fs/netfs/buffered_flush.c       Writeback from the pagecache
        fs/netfs/direct_read.c          DIO read support
        fs/netfs/direct_write.c         DIO write support
        fs/netfs/unbuffered_write.c     Write modifications directly back

  Beyond the above changes, there are also some changes that affect how
  things work:

   - Make fscache_end_operation() generally available.

   - In the netfs tracing header, generate enums from the symbol ->
     string mapping tables rather than manually coding them.

   - Add a struct for filesystems that uses netfslib to put into their
     inode wrapper structs to hold extra state that netfslib is
     interested in, such as the fscache cookie. This allows netfslib
     functions to be set in filesystem operation tables and jumped to
     directly without having to have a filesystem wrapper.

   - Add a member to the struct added above to track the remote inode
     length as that may differ if local modifications are buffered. We
     may need to supply an appropriate EOF pointer when storing data (in
     AFS for example).

   - Pass extra information to netfs_alloc_request() so that the
     ->init_request() hook can access it and retain information to
     indicate the origin of the operation.

   - Make the ->init_request() hook return an error, thereby allowing a
     filesystem that isn't allowed to cache an inode (ceph or cifs, for
     example) to skip readahead.

   - Switch to using refcount_t for subrequests and add tracepoints to
     log refcount changes for the request and subrequest structs.

   - Add a function to consolidate dispatching a read request. Similar
     code is used in three places and another couple are likely to be
     added in the future"

Link: https://lore.kernel.org/all/2639515.1648483225@warthog.procyon.org.uk/

* tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Maintain netfs_i_context::remote_i_size
  netfs: Keep track of the actual remote file size
  netfs: Split some core bits out into their own file
  netfs: Split fs/netfs/read_helper.c
  netfs: Rename read_helper.c to io.c
  netfs: Prepare to split read_helper.c
  netfs: Add a function to consolidate beginning a read
  netfs: Add a netfs inode context
  ceph: Make ceph_init_request() check caps on readahead
  netfs: Change ->init_request() to return an error code
  netfs: Refactor arguments for netfs_alloc_read_request
  netfs: Adjust the netfs_failure tracepoint to indicate non-subreq lines
  netfs: Trace refcounting on the netfs_io_subrequest struct
  netfs: Trace refcounting on the netfs_io_request struct
  netfs: Adjust the netfs_rreq tracepoint slightly
  netfs: Split netfs_io_* object handling out
  netfs: Finish off rename of netfs_read_request to netfs_io_request
  netfs: Rename netfs_read_*request to netfs_io_*request
  netfs: Generate enums from trace symbol mapping lists
  fscache: export fscache_end_operation()
2022-03-31 15:49:36 -07:00
Linus Torvalds
2975dbdc39 Networking fixes for 5.18-rc1 and rethook patches.
Features:
 
  - kprobes: rethook: x86: replace kretprobe trampoline with rethook
 
 Current release - regressions:
 
  - sfc: avoid null-deref on systems without NUMA awareness
    in the new queue sizing code
 
 Current release - new code bugs:
 
  - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices
 
  - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl
    when interface is down
 
 Previous releases - always broken:
 
  - openvswitch: correct neighbor discovery target mask field
    in the flow dump
 
  - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak
 
  - rxrpc: fix call timer start racing with call destruction
 
  - rxrpc: fix null-deref when security type is rxrpc_no_security
 
  - can: fix UAF bugs around echo skbs in multiple drivers
 
 Misc:
 
  - docs: move netdev-FAQ to the "process" section of the documentation
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmJF3S0ACgkQMUZtbf5S
 IruIvA/+NZx+c+fBBrbjOh63avRL7kYIqIDREf+v6lh4ZXmbrp22xalcjIdxgWeK
 vAiYfYmzZblWAGkilcvPG3blCBc+9b+YE+pPJXFe60Huv3eYpjKfgTKwQOg/lIeM
 8MfPP7eBwcJ/ltSTRtySRl9LYgyVcouP9rAVJavFVYrvuDYunwhfChswVfGCYon8
 42O4nRwrtkTE1MjHD8HS3YxvwGlo+iIyhsxgG/gWx8F2xeIG22H6adzjDXcCQph8
 air/awrJ4enYkVMRokGNfNppK9Z3vjJDX5xha3CREpvXNPe0F24cAE/L8XqyH7+r
 /bXP5y9VC9mmEO7x4Le3VmDhOJGbCOtR89gTlevftDRdSIrbNHffZhbPW48tR7o8
 NJFlhiSJb4HEMN0q7BmxnWaKlbZUlvLEXLuU5ytZE/G7i+nETULlunfZrCD4eNYH
 gBGYhiob2I/XotJA9QzG/RDyaFwDaC/VARsyv37PSeBAl/yrEGAeP7DsKkKX/ayg
 LM9ItveqHXK30J0xr3QJA8s49EkIYejjYR3l0hQ9esf9QvGK99dE/fo44Apf3C3A
 Lz6XpnRc5Xd7tZ9Aopwb3FqOH6WR9Hq9Qlbk0qifsL/2sRbatpuZbbDK6L3CR3Ir
 WFNcOoNbbqv85kCKFXFjj0jdpoNa9Yej8XFkMkVSkM3sHImYmYQ=
 =5Bvy
 -----END PGP SIGNATURE-----

Merge tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull more networking updates from Jakub Kicinski:
 "Networking fixes and rethook patches.

  Features:

   - kprobes: rethook: x86: replace kretprobe trampoline with rethook

  Current release - regressions:

   - sfc: avoid null-deref on systems without NUMA awareness in the new
     queue sizing code

  Current release - new code bugs:

   - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices

   - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when
     interface is down

  Previous releases - always broken:

   - openvswitch: correct neighbor discovery target mask field in the
     flow dump

   - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak

   - rxrpc: fix call timer start racing with call destruction

   - rxrpc: fix null-deref when security type is rxrpc_no_security

   - can: fix UAF bugs around echo skbs in multiple drivers

  Misc:

   - docs: move netdev-FAQ to the 'process' section of the
     documentation"

* tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices
  openvswitch: Add recirc_id to recirc warning
  rxrpc: fix some null-ptr-deref bugs in server_key.c
  rxrpc: Fix call timer start racing with call destruction
  net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
  net: hns3: fix the concurrency between functions reading debugfs
  docs: netdev: move the netdev-FAQ to the process pages
  docs: netdev: broaden the new vs old code formatting guidelines
  docs: netdev: call out the merge window in tag checking
  docs: netdev: add missing back ticks
  docs: netdev: make the testing requirement more stringent
  docs: netdev: add a question about re-posting frequency
  docs: netdev: rephrase the 'should I update patchwork' question
  docs: netdev: rephrase the 'Under review' question
  docs: netdev: shorten the name and mention msgid for patch status
  docs: netdev: note that RFC postings are allowed any time
  docs: netdev: turn the net-next closed into a Warning
  docs: netdev: move the patch marking section up
  docs: netdev: minor reword
  docs: netdev: replace references to old archives
  ...
2022-03-31 11:23:31 -07:00
David Howells
4a7f62f919 rxrpc: Fix call timer start racing with call destruction
The rxrpc_call struct has a timer used to handle various timed events
relating to a call.  This timer can get started from the packet input
routines that are run in softirq mode with just the RCU read lock held.
Unfortunately, because only the RCU read lock is held - and neither ref or
other lock is taken - the call can start getting destroyed at the same time
a packet comes in addressed to that call.  This causes the timer - which
was already stopped - to get restarted.  Later, the timer dispatch code may
then oops if the timer got deallocated first.

Fix this by trying to take a ref on the rxrpc_call struct and, if
successful, passing that ref along to the timer.  If the timer was already
running, the ref is discarded.

The timer completion routine can then pass the ref along to the call's work
item when it queues it.  If the timer or work item where already
queued/running, the extra ref is discarded.

Fixes: a158bdd324 ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005073.html
Link: https://lore.kernel.org/r/164865115696.2943015.11097991776647323586.stgit@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-31 12:25:25 +02:00
Linus Torvalds
965181d7ef NFS client updates for Linux 5.18
Highlights include:
 
 Features:
 - Switch NFS to use readahead instead of the obsolete readpages.
 - Readdir fixes to improve cacheability of large directories when there
   are multiple readers and writers.
 - Readdir performance improvements when doing a seekdir() immediately
   after opening the directory (common when re-exporting NFS).
 - NFS swap improvements from Neil Brown.
 - Loosen up memory allocation to permit direct reclaim and write back
   in cases where there is no danger of deadlocking the writeback code or
   NFS swap.
 - Avoid sillyrename when the NFSv4 server claims to support the
   necessary features to recover the unlinked but open file after reboot.
 
 Bugfixes:
 - Patch from Olga to add a mount option to control NFSv4.1 session
   trunking discovery, and default it to being off.
 - Fix a lockup in nfs_do_recoalesce().
 - Two fixes for list iterator variables being used when pointing to the
   list head.
 - Fix a kernel memory scribble when reading from a non-socket transport
   in /sys/kernel/sunrpc.
 - Fix a race where reconnecting to a server could leave the TCP socket
   stuck forever in the connecting state.
 - Patch from Neil to fix a shutdown race which can leave the SUNRPC
   transport timer primed after we free the struct xprt itself.
 - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2 copy
   offload.
 - Sunrpc patch from Olga to avoid resending a task on an offlined
   transport.
 
 Cleanups:
 - Patches from Dave Wysochanski to clean up the fscache code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJDLLUACgkQZwvnipYK
 APJZ5Q/9FH5S38RcUOxpFS+XaVdmxltJTMQv9Bz5mcyljq6PwRjTn73Qs7MfBQfD
 k90O9Pn/EXgQpFWNW4o/saQGLAO1MbVaesdNNF92Le9NziMz7lr95fsFXOqpg9EQ
 rEVOImWinv/N0+P0yv5tgjx6Fheu70L4dnwsxepnmtv2R1VjXOVchK7WjsV+9qey
 Xvn15OzhtVCGhRvvKfmGuGW+2I0D29tRNn2X0Pi+LjxAEQ0We/rcsJeGDMGYtbeC
 2U4ADk7KB0kJq04WYCH8k8tTrljL3PUONzO6QP1lltmAZ0FOwXf/AjyaD3SML2Xu
 A1yc2bEn3/S6wCQ3BiMxoE7UCSewg/ZyQD7B9GjZ/nWYNXREhLKvtVZt+ULTCA6o
 UKAI2vlEHljzQI2YfbVVZFyL8KCZqi+BFcJ/H/dtFKNazIkiefkaF+703Pflazis
 HjhA5G6IT0Pk6qZ6OBp+Z/4kCDYZYXXi9ysiNuMPWCiYEfZzt7ocUjm44uRAi+vR
 Qpf0V01Rg4iplTj3YFXqb9TwxrudzC7AcQ7hbLJia0kD/8djG+aGZs3SJWeO+VuE
 Xa41oxAx9jqOSRvSKPvj7nbexrwn8PtQI2NtDRqjPXlg6+Jbk23u8lIEVhsCH877
 xoMUHtTKLyqECskzRN4nh7qoBD3QfiLuI74aDcPtGEuXzU1+Ox8=
 =jeGt
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - Switch NFS to use readahead instead of the obsolete readpages.

   - Readdir fixes to improve cacheability of large directories when
     there are multiple readers and writers.

   - Readdir performance improvements when doing a seekdir() immediately
     after opening the directory (common when re-exporting NFS).

   - NFS swap improvements from Neil Brown.

   - Loosen up memory allocation to permit direct reclaim and write back
     in cases where there is no danger of deadlocking the writeback code
     or NFS swap.

   - Avoid sillyrename when the NFSv4 server claims to support the
     necessary features to recover the unlinked but open file after
     reboot.

  Bugfixes:

   - Patch from Olga to add a mount option to control NFSv4.1 session
     trunking discovery, and default it to being off.

   - Fix a lockup in nfs_do_recoalesce().

   - Two fixes for list iterator variables being used when pointing to
     the list head.

   - Fix a kernel memory scribble when reading from a non-socket
     transport in /sys/kernel/sunrpc.

   - Fix a race where reconnecting to a server could leave the TCP
     socket stuck forever in the connecting state.

   - Patch from Neil to fix a shutdown race which can leave the SUNRPC
     transport timer primed after we free the struct xprt itself.

   - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2
     copy offload.

   - Sunrpc patch from Olga to avoid resending a task on an offlined
     transport.

  Cleanups:

   - Patches from Dave Wysochanski to clean up the fscache code"

* tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (91 commits)
  NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
  NFS: Don't loop forever in nfs_do_recoalesce()
  SUNRPC: Don't return error values in sysfs read of closed files
  SUNRPC: Do not dereference non-socket transports in sysfs
  NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
  SUNRPC don't resend a task on an offlined transport
  NFS: replace usage of found with dedicated list iterator variable
  SUNRPC: avoid race between mod_timer() and del_timer_sync()
  pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod
  pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod
  NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod
  NFS: Avoid writeback threads getting stuck in mempool_alloc()
  NFS: nfsiod should not block forever in mempool_alloc()
  SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent
  SUNRPC: Fix unx_lookup_cred() allocation
  NFS: Fix memory allocation in rpc_alloc_task()
  NFS: Fix memory allocation in rpc_malloc()
  SUNRPC: Improve accuracy of socket ENOBUFS determination
  SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACE
  SUNRPC: Fix socket waits for write buffer space
  ...
2022-03-29 18:55:37 -07:00
Linus Torvalds
02e2af20f4 Char/Misc and other driver updates for 5.18-rc1
Here is the big set of char/misc and other small driver subsystem
 updates for 5.18-rc1.
 
 Included in here are merges from driver subsystems which contain:
 	- iio driver updates and new drivers
 	- fsi driver updates
 	- fpga driver updates
 	- habanalabs driver updates and support for new hardware
 	- soundwire driver updates and new drivers
 	- phy driver updates and new drivers
 	- coresight driver updates
 	- icc driver updates
 
 Individual changes include:
 	- mei driver updates
 	- interconnect driver updates
 	- new PECI driver subsystem added
 	- vmci driver updates
 	- lots of tiny misc/char driver updates
 
 There will be two merge conflicts with your tree, one in MAINTAINERS
 which is obvious to fix up, and one in drivers/phy/freescale/Kconfig
 which also should be easy to resolve.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYkG3fQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykNEgCfaRG8CRxewDXOO4+GSeA3NGK+AIoAnR89donC
 R4bgCjfg8BWIBcVVXg3/
 =WWXC
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc and other driver updates from Greg KH:
 "Here is the big set of char/misc and other small driver subsystem
  updates for 5.18-rc1.

  Included in here are merges from driver subsystems which contain:

   - iio driver updates and new drivers

   - fsi driver updates

   - fpga driver updates

   - habanalabs driver updates and support for new hardware

   - soundwire driver updates and new drivers

   - phy driver updates and new drivers

   - coresight driver updates

   - icc driver updates

  Individual changes include:

   - mei driver updates

   - interconnect driver updates

   - new PECI driver subsystem added

   - vmci driver updates

   - lots of tiny misc/char driver updates

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (556 commits)
  firmware: google: Properly state IOMEM dependency
  kgdbts: fix return value of __setup handler
  firmware: sysfb: fix platform-device leak in error path
  firmware: stratix10-svc: add missing callback parameter on RSU
  arm64: dts: qcom: add non-secure domain property to fastrpc nodes
  misc: fastrpc: Add dma handle implementation
  misc: fastrpc: Add fdlist implementation
  misc: fastrpc: Add helper function to get list and page
  misc: fastrpc: Add support to secure memory map
  dt-bindings: misc: add fastrpc domain vmid property
  misc: fastrpc: check before loading process to the DSP
  misc: fastrpc: add secure domain support
  dt-bindings: misc: add property to support non-secure DSP
  misc: fastrpc: Add support to get DSP capabilities
  misc: fastrpc: add support for FASTRPC_IOCTL_MEM_MAP/UNMAP
  misc: fastrpc: separate fastrpc device from channel context
  dt-bindings: nvmem: brcm,nvram: add basic NVMEM cells
  dt-bindings: nvmem: make "reg" property optional
  nvmem: brcm_nvram: parse NVRAM content into NVMEM cells
  nvmem: dt-bindings: Fix the error of dt-bindings check
  ...
2022-03-28 12:27:35 -07:00
Linus Torvalds
5627ecb837 Merge branch 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:

 - tracepoints when Linux acts as an I2C client

 - added support for AMD PSP

 - whole subsystem now uses generic_handle_irq_safe()

 - piix4 driver gained MMIO access enabling so far missed controllers
   with AMD chipsets

 - a bulk of device driver updates, refactorization, and fixes.

* 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (61 commits)
  i2c: mux: demux-pinctrl: do not deactivate a master that is not active
  i2c: meson: Fix wrong speed use from probe
  i2c: add tracepoints for I2C slave events
  i2c: designware: Remove code duplication
  i2c: cros-ec-tunnel: Fix syntax errors in comments
  MAINTAINERS: adjust XLP9XX I2C DRIVER after removing the devicetree binding
  i2c: designware: Mark dw_i2c_plat_{suspend,resume}() as __maybe_unused
  i2c: mediatek: Add i2c compatible for Mediatek MT8168
  dt-bindings: i2c: update bindings for MT8168 SoC
  i2c: mt65xx: Simplify with clk-bulk
  i2c: i801: Drop two outdated comments
  i2c: xiic: Make bus names unique
  i2c: i801: Add support for the Process Call command
  i2c: i801: Drop useless masking in i801_access
  i2c: tegra: Add SMBus block read function
  i2c: designware: Use the i2c_mark_adapter_suspended/resumed() helpers
  i2c: designware: Lock the adapter while setting the suspended flag
  i2c: mediatek: remove redundant null check
  i2c: mediatek: modify bus speed calculation formula
  i2c: designware: Fix improper usage of readl
  ...
2022-03-26 12:46:08 -07:00
Linus Torvalds
561593a048 for-5.18/write-streams-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI1AHwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplPjEACVJzKg5NkxpdkDThvq5tejws9KxB/4mHJg
 NoDMcv1TF+Orsd/HNW6XrgYnbU0ObHom3568/xb8BNegRVFe7V4ME/4IYNRyGOmV
 qbfciu04L1UkJhI52CIidkOioFABL3r1zgLCIz5vk0Cv9X7Le9x0UabHxJf7u9S+
 Z3lNdyxezN0SGx8VT86l/7lSoHtG3VHO9IsQCuNGF02SB+6uGpXBlptbEoQ4nTxd
 T7/H9FNOe2Wf7eKvcOOds8UlvZYAfYcY0GcRrIOXdHIy25mKFWwn5cDgFTMOH5ID
 xXpm+JFkDkrfSW1o4FFPxbN9Z6RbVXbGCsrXlIragLO2MJQdXiIUxS1OPT5oAado
 H9MlX6QtkwziLW9zUWa/N/jmRjc2vzHAxD6JFg/wXxNdtY0kd8TQpaxwTB8mVDPN
 VCGutt7lJS1CQInQ+ppzbdqzzuLHC1RHAyWSmfUE9rb8cbjxtJBnSIorYRLUesMT
 GRwqVTXW0osxSgCb1iDiBCJANrX1yPZcemv4Wh1gzbT6IE9sWxWXsE5sy9KvswNc
 M+E4nu/TYYTfkynItJjLgmDLOoi+V0FBY6ba0mRPBjkriSP4AVlwsZLGVsAHQzuA
 o5paW1GjRCCwhIQ6+AzZIoOz6wqvprBlUgUkUneyYAQ2ZKC3pZi8zPnpoVdFucVa
 VaTzP71C1Q==
 =efaq
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block

Pull NVMe write streams removal from Jens Axboe:
 "This removes the write streams support in NVMe. No vendor ever really
  shipped working support for this, and they are not interested in
  supporting it.

  With the NVMe support gone, we have nothing in the tree that supports
  this. Remove passing around of the hints.

  The only discussion point in this patchset imho is the fact that the
  file specific write hint setting/getting fcntl helpers will now return
  -1/EINVAL like they did before we supported write hints. No known
  applications use these functions, I only know of one prototype that I
  help do for RocksDB, and that's not used. That said, with a change
  like this, it's always a bit controversial. Alternatively, we could
  just make them return 0 and pretend it worked. It's placement based
  hints after all"

* tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block:
  fs: remove fs.f_write_hint
  fs: remove kiocb.ki_hint
  block: remove the per-bio/request write hint
  nvme: remove support or stream based temperature hint
2022-03-26 11:51:46 -07:00
David Hildenbrand
363106c4ce mm/khugepaged: remove reuse_swap_page() usage
reuse_swap_page() currently indicates if we can write to an anon page
without COW.  A COW is required if the page is shared by multiple
processes (either already mapped or via swap entries) or if there is
concurrent writeback that cannot tolerate concurrent page modifications.

However, in the context of khugepaged we're not actually going to write to
a read-only mapped page, we'll copy the page content to our newly
allocated THP and map that THP writable.  All we have to make sure is that
the read-only mapped page we're about to copy won't get reused by another
process sharing the page, otherwise, page content would get modified.  But
that is already guaranteed via multiple mechanisms (e.g., holding a
reference, holding the page lock, removing the rmap after copying the
page).

The swapcache handling was introduced in commit 10359213d0 ("mm:
incorporate read-only pages into transparent huge pages") and it sounds
like it merely wanted to mimic what do_swap_page() would do when trying to
map a page obtained via the swapcache writable.

As that logic is unnecessary, let's just remove it, removing the last user
of reuse_swap_page().

Link: https://lkml.kernel.org/r/20220131162940.210846-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Liang Zhang <zhangliang5@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:51 -07:00
Andrey Konovalov
9353ffa6e9 kasan, page_alloc: allow skipping memory init for HW_TAGS
Add a new GFP flag __GFP_SKIP_ZERO that allows to skip memory
initialization.  The flag is only effective with HW_TAGS KASAN.

This flag will be used by vmalloc code for page_alloc allocations backing
vmalloc() mappings in a following patch.  The reason to skip memory
initialization for these pages in page_alloc is because vmalloc code will
be initializing them instead.

With the current implementation, when __GFP_SKIP_ZERO is provided,
__GFP_ZEROTAGS is ignored.  This doesn't matter, as these two flags are
never provided at the same time.  However, if this is changed in the
future, this particular implementation detail can be changed as well.

Link: https://lkml.kernel.org/r/0d53efeff345de7d708e0baa0d8829167772521e.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:47 -07:00
Andrey Konovalov
53ae233c30 kasan, page_alloc: allow skipping unpoisoning for HW_TAGS
Add a new GFP flag __GFP_SKIP_KASAN_UNPOISON that allows skipping KASAN
poisoning for page_alloc allocations.  The flag is only effective with
HW_TAGS KASAN.

This flag will be used by vmalloc code for page_alloc allocations backing
vmalloc() mappings in a following patch.  The reason to skip KASAN
poisoning for these pages in page_alloc is because vmalloc code will be
poisoning them instead.

Also reword the comment for __GFP_SKIP_KASAN_POISON.

Link: https://lkml.kernel.org/r/35c97d77a704f6ff971dd3bfe4be95855744108e.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:47 -07:00
Andrey Konovalov
f49d9c5bb1 kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS
Only define the ___GFP_SKIP_KASAN_POISON flag when CONFIG_KASAN_HW_TAGS is
enabled.

This patch it not useful by itself, but it prepares the code for additions
of new KASAN-specific GFP patches.

Link: https://lkml.kernel.org/r/44e5738a584c11801b2b8f1231898918efc8634a.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:47 -07:00
Anshuman Khandual
4cc79b3303 mm/migration: add trace events for base page and HugeTLB migrations
This adds two trace events for base page and HugeTLB page migrations.
These events, closely follow the implementation details like setting and
removing of PTE migration entries, which are essential operations for
migration.  The new CREATE_TRACE_POINTS in <mm/rmap.c> covers both
<events/migration.h> and <events/tlb.h> based trace events.  Hence drop
redundant CREATE_TRACE_POINTS from other places which could have otherwise
conflicted during build.

Link: https://lkml.kernel.org/r/1643368182-9588-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:45 -07:00
Anshuman Khandual
283fd6fe05 mm/migration: add trace events for THP migrations
Patch series "mm/migration: Add trace events", v3.

This adds trace events for all migration scenarios including base page,
THP and HugeTLB.

This patch (of 3):

This adds two trace events for PMD based THP migration without split.
These events closely follow the implementation details like setting and
removing of PMD migration entries, which are essential operations for THP
migration.  This moves CREATE_TRACE_POINTS into generic THP from powerpc
for these new trace events to be available on other platforms as well.

Link: https://lkml.kernel.org/r/1643368182-9588-1-git-send-email-anshuman.khandual@arm.com
Link: https://lkml.kernel.org/r/1643368182-9588-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:45 -07:00
Linus Torvalds
169e77764a Networking changes for 5.18.
Core
 ----
 
  - Introduce XDP multi-buffer support, allowing the use of XDP with
    jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
 
  - Speed up netns dismantling (5x) and lower the memory cost a little.
    Remove unnecessary per-netns sockets. Scope some lists to a netns.
    Cut down RCU syncing. Use batch methods. Allow netdev registration
    to complete out of order.
 
  - Support distinguishing timestamp types (ingress vs egress) and
    maintaining them across packet scrubbing points (e.g. redirect).
 
  - Continue the work of annotating packet drop reasons throughout
    the stack.
 
  - Switch netdev error counters from an atomic to dynamically
    allocated per-CPU counters.
 
  - Rework a few preempt_disable(), local_irq_save() and busy waiting
    sections problematic on PREEMPT_RT.
 
  - Extend the ref_tracker to allow catching use-after-free bugs.
 
 BPF
 ---
 
  - Introduce "packing allocator" for BPF JIT images. JITed code is
    marked read only, and used to be allocated at page granularity.
    Custom allocator allows for more efficient memory use, lower
    iTLB pressure and prevents identity mapping huge pages from
    getting split.
 
  - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
    the correct probe read access method, add appropriate helpers.
 
  - Convert the BPF preload to use light skeleton and drop
    the user-mode-driver dependency.
 
  - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
    its use as a packet generator.
 
  - Allow local storage memory to be allocated with GFP_KERNEL if called
    from a hook allowed to sleep.
 
  - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
    bits to come later).
 
  - Add unstable conntrack lookup helpers for BPF by using the BPF
    kfunc infra.
 
  - Allow cgroup BPF progs to return custom errors to user space.
 
  - Add support for AF_UNIX iterator batching.
 
  - Allow iterator programs to use sleepable helpers.
 
  - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
 
  - Add BTFGen support to bpftool which allows to use CO-RE in kernels
    without BTF info.
 
  - Large number of libbpf API improvements, cleanups and deprecations.
 
 Protocols
 ---------
 
  - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
 
  - Adjust TSO packet sizes based on min_rtt, allowing very low latency
    links (data centers) to always send full-sized TSO super-frames.
 
  - Make IPv6 flow label changes (AKA hash rethink) more configurable,
    via sysctl and setsockopt. Distinguish between server and client
    behavior.
 
  - VxLAN support to "collect metadata" devices to terminate only
    configured VNIs. This is similar to VLAN filtering in the bridge.
 
  - Support inserting IPv6 IOAM information to a fraction of frames.
 
  - Add protocol attribute to IP addresses to allow identifying where
    given address comes from (kernel-generated, DHCP etc.)
 
  - Support setting socket and IPv6 options via cmsg on ping6 sockets.
 
  - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
    Define dscp_t and stop taking ECN bits into account in fib-rules.
 
  - Add support for locked bridge ports (for 802.1X).
 
  - tun: support NAPI for packets received from batched XDP buffs,
    doubling the performance in some scenarios.
 
  - IPv6 extension header handling in Open vSwitch.
 
  - Support IPv6 control message load balancing in bonding, prevent
    neighbor solicitation and advertisement from using the wrong port.
    Support NS/NA monitor selection similar to existing ARP monitor.
 
  - SMC
    - improve performance with TCP_CORK and sendfile()
    - support auto-corking
    - support TCP_NODELAY
 
  - MCTP (Management Component Transport Protocol)
    - add user space tag control interface
    - I2C binding driver (as specified by DMTF DSP0237)
 
  - Multi-BSSID beacon handling in AP mode for WiFi.
 
  - Bluetooth:
    - handle MSFT Monitor Device Event
    - add MGMT Adv Monitor Device Found/Lost events
 
  - Multi-Path TCP:
    - add support for the SO_SNDTIMEO socket option
    - lots of selftest cleanups and improvements
 
  - Increase the max PDU size in CAN ISOTP to 64 kB.
 
 Driver API
 ----------
 
  - Add HW counters for SW netdevs, a mechanism for devices which
    offload packet forwarding to report packet statistics back to
    software interfaces such as tunnels.
 
  - Select the default NIC queue count as a fraction of number of
    physical CPU cores, instead of hard-coding to 8.
 
  - Expose devlink instance locks to drivers. Allow device layer of
    drivers to use that lock directly instead of creating their own
    which always runs into ordering issues in devlink callbacks.
 
  - Add header/data split indication to guide user space enabling
    of TCP zero-copy Rx.
 
  - Allow configuring completion queue event size.
 
  - Refactor page_pool to enable fragmenting after allocation.
 
  - Add allocation and page reuse statistics to page_pool.
 
  - Improve Multiple Spanning Trees support in the bridge to allow
    reuse of topologies across VLANs, saving HW resources in switches.
 
  - DSA (Distributed Switch Architecture):
    - replay and offload of host VLAN entries
    - offload of static and local FDB entries on LAG interfaces
    - FDB isolation and unicast filtering
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - LAN937x T1 PHYs
    - Davicom DM9051 SPI NIC driver
    - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
    - Microchip ksz8563 switches
    - Netronome NFP3800 SmartNICs
    - Fungible SmartNICs
    - MediaTek MT8195 switches
 
  - WiFi:
    - mt76: MediaTek mt7916
    - mt76: MediaTek mt7921u USB adapters
    - brcmfmac: Broadcom BCM43454/6
 
  - Mobile:
    - iosm: Intel M.2 7360 WWAN card
 
 Drivers
 -------
 
  - Convert many drivers to the new phylink API built for split PCS
    designs but also simplifying other cases.
 
  - Intel Ethernet NICs:
    - add TTY for GNSS module for E810T device
    - improve AF_XDP performance
    - GTP-C and GTP-U filter offload
    - QinQ VLAN support
 
  - Mellanox Ethernet NICs (mlx5):
    - support xdp->data_meta
    - multi-buffer XDP
    - offload tc push_eth and pop_eth actions
 
  - Netronome Ethernet NICs (nfp):
    - flow-independent tc action hardware offload (police / meter)
    - AF_XDP
 
  - Other Ethernet NICs:
    - at803x: fiber and SFP support
    - xgmac: mdio: preamble suppression and custom MDC frequencies
    - r8169: enable ASPM L1.2 if system vendor flags it as safe
    - macb/gem: ZynqMP SGMII
    - hns3: add TX push mode
    - dpaa2-eth: software TSO
    - lan743x: multi-queue, mdio, SGMII, PTP
    - axienet: NAPI and GRO support
 
  - Mellanox Ethernet switches (mlxsw):
    - source and dest IP address rewrites
    - RJ45 ports
 
  - Marvell Ethernet switches (prestera):
    - basic routing offload
    - multi-chain TC ACL offload
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - PTP over UDP with the ocelot-8021q DSA tagging protocol
    - basic QoS classification on Felix DSA switch using dcbnl
    - port mirroring for ocelot switches
 
  - Microchip high-speed industrial Ethernet (sparx5):
    - offloading of bridge port flooding flags
    - PTP Hardware Clock
 
  - Other embedded switches:
    - lan966x: PTP Hardward Clock
    - qca8k: mdio read/write operations via crafted Ethernet packets
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
    - enable RX PPDU stats in monitor co-exist mode
 
  - Intel WiFi (iwlwifi):
    - UHB TAS enablement via BIOS
    - band disablement via BIOS
    - channel switch offload
    - 32 Rx AMPDU sessions in newer devices
 
  - MediaTek WiFi (mt76):
    - background radar detection
    - thermal management improvements on mt7915
    - SAR support for more mt76 platforms
    - MBSSID and 6 GHz band on mt7915
 
  - RealTek WiFi:
    - rtw89: AP mode
    - rtw89: 160 MHz channels and 6 GHz band
    - rtw89: hardware scan
 
  - Bluetooth:
    - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
 
  - Microchip CAN (mcp251xfd):
    - multiple RX-FIFOs and runtime configurable RX/TX rings
    - internal PLL, runtime PM handling simplification
    - improve chip detection and error handling after wakeup
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
 IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
 FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
 AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
 KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
 K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
 cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
 krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
 dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
 =ELpe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Linus Torvalds
b4bc93bd76 ARM driver updates for 5.18
There are a few separately maintained driver subsystems that we merge through
 the SoC tree, notable changes are:
 
  - Memory controller updates, mainly for Tegra and Mediatek SoCs,
    and clarifications for the memory controller DT bindings
 
  - SCMI firmware interface updates, in particular a new transport based
    on OPTEE and support for atomic operations.
 
  - Cleanups to the TEE subsystem, refactoring its memory management
 
 For SoC specific drivers without a separate subsystem, changes include
 
  - Smaller updates and fixes for TI, AT91/SAMA5, Qualcomm and NXP
    Layerscape SoCs.
 
  - Driver support for Microchip SAMA5D29, Tesla FSD, Renesas RZ/G2L,
    and Qualcomm SM8450.
 
  - Better power management on Mediatek MT81xx, NXP i.MX8MQ
    and older NVIDIA Tegra chips
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI4nOUACgkQmmx57+YA
 GNlNNhAApPQw+FKQ6yVj2EZYcaAgik8PJAJoNQWYED52iQfm5uXgjt3aQewvrPNW
 nkKx5Mx+fPUfaKx5mkVOFMhME5Bw9tYbXHm2/RpRp+n8jOdUlQpAhzIPOyWPHOJS
 QX6qu4t+agrQzjbOCGouAJXgyxhTJFUMviM2EgVHbQHXPtdF8i2kyanfCP7Rw8cx
 sVtLwpvhbLm849+deYRXuv2Xw9I3M1Np7018s5QciimI2eLLEb+lJ/C5XWz5pMYn
 M1nZ7uwCLKPCewpMETTuhKOv0ioOXyY9C1ghyiGZFhHQfoCYTu94Hrx9t8x5gQmL
 qWDinXWXVk8LBegyrs8Bp4wcjtmvMMLnfWtsGSfT5uq24JOGg22OmtUNhNJbS9+p
 VjEvBgkXYD7UEl5npI9v9/KQWr3/UDir0zvkuV40gJyeBWNEZ/PB8olXAxgL7wZv
 cXRYSaUYYt3DKQf1k5I4GUyQtkP/4RaBy6AqvH5Sx0lCwuY6G6ISK+kCPaaSRKnX
 WR+nFw84dKCu7miehmW9qSzMQ4kiSCKIDqk7ilHcwv0J2oXDrlqVPKGGGTzZjUc8
 +feqM/eSoYvDDEDemuXNSnl3hc1Zlvm7Apd5AN6kdTaNgoACDYdyvGuJ3CvzcA+K
 1gBHUBvGS/ODA25KnYabr7wCMgxYqf7dXfkyKIBwFHwxOnRHtgs=
 =Cfbk
 -----END PGP SIGNATURE-----

Merge tag 'arm-drivers-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM driver updates from Arnd Bergmann:
 "There are a few separately maintained driver subsystems that we merge
  through the SoC tree, notable changes are:

   - Memory controller updates, mainly for Tegra and Mediatek SoCs, and
     clarifications for the memory controller DT bindings

   - SCMI firmware interface updates, in particular a new transport
     based on OPTEE and support for atomic operations.

   - Cleanups to the TEE subsystem, refactoring its memory management

  For SoC specific drivers without a separate subsystem, changes include

   - Smaller updates and fixes for TI, AT91/SAMA5, Qualcomm and NXP
     Layerscape SoCs.

   - Driver support for Microchip SAMA5D29, Tesla FSD, Renesas RZ/G2L,
     and Qualcomm SM8450.

   - Better power management on Mediatek MT81xx, NXP i.MX8MQ and older
     NVIDIA Tegra chips"

* tag 'arm-drivers-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (154 commits)
  ARM: spear: fix typos in comments
  soc/microchip: fix invalid free in mpfs_sys_controller_delete
  soc: s4: Add support for power domains controller
  dt-bindings: power: add Amlogic s4 power domains bindings
  ARM: at91: add support in soc driver for new SAMA5D29
  soc: mediatek: mmsys: add sw0_rst_offset in mmsys driver data
  dt-bindings: memory: renesas,rpc-if: Document RZ/V2L SoC
  memory: emif: check the pointer temp in get_device_details()
  memory: emif: Add check for setup_interrupts
  dt-bindings: arm: mediatek: mmsys: add support for MT8186
  dt-bindings: mediatek: add compatible for MT8186 pwrap
  soc: mediatek: pwrap: add pwrap driver for MT8186 SoC
  soc: mediatek: mmsys: add mmsys reset control for MT8186
  soc: mediatek: mtk-infracfg: Disable ACP on MT8192
  soc: ti: k3-socinfo: Add AM62x JTAG ID
  soc: mediatek: add MTK mutex support for MT8186
  soc: mediatek: mmsys: add mt8186 mmsys routing table
  soc: mediatek: pm-domains: Add support for mt8186
  dt-bindings: power: Add MT8186 power domains
  soc: mediatek: pm-domains: Add support for mt8195
  ...
2022-03-23 18:23:13 -07:00
Linus Torvalds
1bc191051d Tracing updates for 5.18:
- New user_events interface. User space can register an event with the kernel
   describing the format of the event. Then it will receive a byte in a page
   mapping that it can check against. A privileged task can then enable that
   event like any other event, which will change the mapped byte to true,
   telling the user space application to start writing the event to the
   tracing buffer.
 
 - Add new "ftrace_boot_snapshot" kernel command line parameter. When set,
   the tracing buffer will be saved in the snapshot buffer at boot up when
   the kernel hands things over to user space. This will keep the traces that
   happened at boot up available even if user space boot up has tracing as
   well.
 
 - Have TRACE_EVENT_ENUM() also update trace event field type descriptions.
   Thus if a static array defines its size with an enum, the user space trace
   event parsers can still know how to parse that array.
 
 - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
   TRACE_EVENT() macro, but will attach to an existing tracepoint. This will
   make one tracepoint be able to trace different content and not be stuck at
   only what the original TRACE_EVENT() macro exports.
 
 - Fixes to tracing error logging.
 
 - Better saving of cmdlines to PIDs when tracing (use the wakeup events for
   mapping).
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYjiO3RQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhQzAQDtek5p80p/zkMGymm14wSH6qq0NdgN
 Kv7fTBwEewUa0gD/UCOVLw4Oj+JtHQhCa3sCGZopmRv0BT1+4UQANqosKQY=
 =Au08
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - New user_events interface. User space can register an event with the
   kernel describing the format of the event. Then it will receive a
   byte in a page mapping that it can check against. A privileged task
   can then enable that event like any other event, which will change
   the mapped byte to true, telling the user space application to start
   writing the event to the tracing buffer.

 - Add new "ftrace_boot_snapshot" kernel command line parameter. When
   set, the tracing buffer will be saved in the snapshot buffer at boot
   up when the kernel hands things over to user space. This will keep
   the traces that happened at boot up available even if user space boot
   up has tracing as well.

 - Have TRACE_EVENT_ENUM() also update trace event field type
   descriptions. Thus if a static array defines its size with an enum,
   the user space trace event parsers can still know how to parse that
   array.

 - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
   TRACE_EVENT() macro, but will attach to an existing tracepoint. This
   will make one tracepoint be able to trace different content and not
   be stuck at only what the original TRACE_EVENT() macro exports.

 - Fixes to tracing error logging.

 - Better saving of cmdlines to PIDs when tracing (use the wakeup events
   for mapping).

* tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
  tracing: Have type enum modifications copy the strings
  user_events: Add trace event call as root for low permission cases
  tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
  tracing: Add snapshot at end of kernel boot up
  tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
  tracing: Fix strncpy warning in trace_events_synth.c
  user_events: Prevent dyn_event delete racing with ioctl add/delete
  tracing: Add TRACE_CUSTOM_EVENT() macro
  tracing: Move the defines to create TRACE_EVENTS into their own files
  tracing: Add sample code for custom trace events
  tracing: Allow custom events to be added to the tracefs directory
  tracing: Fix last_cmd_set() string management in histogram code
  user_events: Fix potential uninitialized pointer while parsing field
  tracing: Fix allocation of last_cmd in last_cmd_set()
  user_events: Add documentation file
  user_events: Add sample code for typical usage
  user_events: Add self-test for validator boundaries
  user_events: Add self-test for perf_event integration
  user_events: Add self-test for dynamic_events integration
  user_events: Add self-test for ftrace integration
  ...
2022-03-23 11:40:25 -07:00
Linus Torvalds
6b1f86f8e9 Filesystem folio changes for 5.18
Primarily this series converts some of the address_space operations
 to take a folio instead of a page.
 
 ->is_partially_uptodate() takes a folio instead of a page and changes the
 type of the 'from' and 'count' arguments to make it obvious they're bytes.
 ->invalidatepage() becomes ->invalidate_folio() and has a similar type change.
 ->launder_page() becomes ->launder_folio()
 ->set_page_dirty() becomes ->dirty_folio() and adds the address_space as
 an argument.
 
 There are a couple of other misc changes up front that weren't worth
 separating into their own pull request.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4hqMACgkQDpNsjXcp
 gj7r7Af/fVJ7m8kKqjP/IayX3HiJRuIDQw+vM++BlRNXdjz+IyED6whdmFGxJeOY
 BMyT+8ApOAz7ErS4G+7fAv4ScJK/aEgFUsnSeAiCp0PliiEJ5NNJzElp6sVmQ7H5
 SX7+Ek444FZUGsQuy0qL7/ELpR3ditnD7x+5U2g0p5TeaHGUQn84crRyfR4xuhNG
 EBD9D71BOb7OxUcOHe93pTkK51QsQ0aCrcIsB1tkK5KR0BAthn1HqF7ehL90Rvrr
 omx5M7aDWGY4oj7IKrhlAs+55Ah2WaOzrZBp0FXNbr4UENDBKWKyUxErwa4xPkf6
 Gm1iQG/CspOHnxN3YWsd5WjtlL3A+A==
 =cOiq
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache

Pull filesystem folio updates from Matthew Wilcox:
 "Primarily this series converts some of the address_space operations to
  take a folio instead of a page.

  Notably:

   - a_ops->is_partially_uptodate() takes a folio instead of a page and
     changes the type of the 'from' and 'count' arguments to make it
     obvious they're bytes.

   - a_ops->invalidatepage() becomes ->invalidate_folio() and has a
     similar type change.

   - a_ops->launder_page() becomes ->launder_folio()

   - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the
     address_space as an argument.

  There are a couple of other misc changes up front that weren't worth
  separating into their own pull request"

* tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits)
  fs: Remove aops ->set_page_dirty
  fb_defio: Use noop_dirty_folio()
  fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
  fs: Convert __set_page_dirty_buffers to block_dirty_folio
  nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio()
  mm: Convert swap_set_page_dirty() to swap_dirty_folio()
  ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio
  f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio
  f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio
  f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio
  afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio()
  btrfs: Convert extent_range_redirty_for_io() to use folios
  fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio
  btrfs: Convert from set_page_dirty to dirty_folio
  fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio()
  fs: Add aops->dirty_folio
  fs: Remove aops->launder_page
  orangefs: Convert launder_page to launder_folio
  nfs: Convert from launder_page to launder_folio
  fuse: Convert from launder_page to launder_folio
  ...
2022-03-22 18:26:56 -07:00
Linus Torvalds
9030fb0bb9 Folio changes for 5.18
- Rewrite how munlock works to massively reduce the contention
    on i_mmap_rwsem (Hugh Dickins):
    https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
  - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
    https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
  - Convert GUP to use folios and make pincount available for order-1
    pages. (Matthew Wilcox)
  - Convert a few more truncation functions to use folios (Matthew Wilcox)
  - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
  - Convert rmap_walk to use folios (Matthew Wilcox)
  - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
  - Add support for creating large folios in readahead (Matthew Wilcox)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
 gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
 P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
 s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
 Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
 CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
 v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
 =n9Ad
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Rewrite how munlock works to massively reduce the contention on
   i_mmap_rwsem (Hugh Dickins):

     https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/

 - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
   Hellwig):

     https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/

 - Convert GUP to use folios and make pincount available for order-1
   pages. (Matthew Wilcox)

 - Convert a few more truncation functions to use folios (Matthew
   Wilcox)

 - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
   Wilcox)

 - Convert rmap_walk to use folios (Matthew Wilcox)

 - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)

 - Add support for creating large folios in readahead (Matthew Wilcox)

* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
  mm/damon: minor cleanup for damon_pa_young
  selftests/vm/transhuge-stress: Support file-backed PMD folios
  mm/filemap: Support VM_HUGEPAGE for file mappings
  mm/readahead: Switch to page_cache_ra_order
  mm/readahead: Align file mappings for non-DAX
  mm/readahead: Add large folio readahead
  mm: Support arbitrary THP sizes
  mm: Make large folios depend on THP
  mm: Fix READ_ONLY_THP warning
  mm/filemap: Allow large folios to be added to the page cache
  mm: Turn can_split_huge_page() into can_split_folio()
  mm/vmscan: Convert pageout() to take a folio
  mm/vmscan: Turn page_check_references() into folio_check_references()
  mm/vmscan: Account large folios correctly
  mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
  mm/vmscan: Free non-shmem folios without splitting them
  mm/rmap: Constify the rmap_walk_control argument
  mm/rmap: Convert rmap_walk() to take a folio
  mm: Turn page_anon_vma() into folio_anon_vma()
  mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
  ...
2022-03-22 17:03:12 -07:00
Linus Torvalds
3bf03b9a08 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs

 - Most the MM patches which precede the patches in Willy's tree: kasan,
   pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
   sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
   userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
   cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
   zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits)
  mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
  Docs/ABI/testing: add DAMON sysfs interface ABI document
  Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
  selftests/damon: add a test for DAMON sysfs interface
  mm/damon/sysfs: support DAMOS stats
  mm/damon/sysfs: support DAMOS watermarks
  mm/damon/sysfs: support schemes prioritization
  mm/damon/sysfs: support DAMOS quotas
  mm/damon/sysfs: support DAMON-based Operation Schemes
  mm/damon/sysfs: support the physical address space monitoring
  mm/damon/sysfs: link DAMON for virtual address spaces monitoring
  mm/damon: implement a minimal stub for sysfs-based DAMON interface
  mm/damon/core: add number of each enum type values
  mm/damon/core: allow non-exclusive DAMON start/stop
  Docs/damon: update outdated term 'regions update interval'
  Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
  Docs/vm/damon: call low level monitoring primitives the operations
  mm/damon: remove unnecessary CONFIG_DAMON option
  mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
  mm/damon/dbgfs-test: fix is_target_id() change
  ...
2022-03-22 16:11:53 -07:00
Baolin Wang
abd4349ff9 mm: compaction: cleanup the compaction trace events
As Steven suggested [1], we should access the pointers from the trace
event to avoid dereferencing them to the tracepoint function when the
tracepoint is disabled.

[1] https://lkml.org/lkml/2021/11/3/409

Link: https://lkml.kernel.org/r/4cd393b4d57f8f01ed72c001509b28e3a3b1a8c1.1646985115.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:09 -07:00
NeilBrown
a88f2096d5 remove congestion tracking framework
This framework is no longer used - so discard it.

Link: https://lkml.kernel.org/r/164549983747.9187.6171768583526866601.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:01 -07:00
Linus Torvalds
3fe2f7446f Changes in this cycle were:
- Cleanups for SCHED_DEADLINE
  - Tracing updates/fixes
  - CPU Accounting fixes
  - First wave of changes to optimize the overhead of the scheduler build,
    from the fast-headers tree - including placeholder *_api.h headers for
    later header split-ups.
  - Preempt-dynamic using static_branch() for ARM64
  - Isolation housekeeping mask rework; preperatory for further changes
  - NUMA-balancing: deal with CPU-less nodes
  - NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD)
  - Updates to RSEQ UAPI in preparation for glibc usage
  - Lots of RSEQ/selftests, for same
  - Add Suren as PSI co-maintainer
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI5rg8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hGrw/+M3QOk6fH7G48wjlNnBvcOife6ls+Ni4k
 ixOAcF4JKoixO8HieU5vv0A7yf/83tAa6fpeXeMf1hkCGc0NSlmLtuIux+WOmoAL
 LzCyDEYfiP8KnVh0A1Tui/lK0+AkGo21O6ADhQE2gh8o2LpslOHQMzvtyekSzeeb
 mVxMYQN+QH0m518xdO2D8IQv9ctOYK0eGjmkqdNfntOlytypPZHeNel/tCzwklP/
 dElJUjNiSKDlUgTBPtL3DfpoLOI/0mHF2p6NEXvNyULxSOqJTu8pv9Z2ADb2kKo1
 0D56iXBDngMi9MHIJLgvzsA8gKzHLFSuPbpODDqkTZCa28vaMB9NYGhJ643NtEie
 IXTJEvF1rmNkcLcZlZxo0yjL0fjvPkczjw4Vj27gbrUQeEBfb4mfuI4BRmij63Ep
 qEkgQTJhduCqqrQP1rVyhwWZRk1JNcVug+F6N42qWW3fg1xhj0YSrLai2c9nPez6
 3Zt98H8YGS1Z/JQomSw48iGXVqfTp/ETI7uU7jqHK8QcjzQ4lFK5H4GZpwuqGBZi
 NJJ1l97XMEas+rPHiwMEN7Z1DVhzJLCp8omEj12QU+tGLofxxwAuuOVat3CQWLRk
 f80Oya3TLEgd22hGIKDRmHa22vdWnNQyS0S15wJotawBzQf+n3auS9Q3/rh979+t
 ES/qvlGxTIs=
 =Z8uT
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Cleanups for SCHED_DEADLINE

 - Tracing updates/fixes

 - CPU Accounting fixes

 - First wave of changes to optimize the overhead of the scheduler
   build, from the fast-headers tree - including placeholder *_api.h
   headers for later header split-ups.

 - Preempt-dynamic using static_branch() for ARM64

 - Isolation housekeeping mask rework; preperatory for further changes

 - NUMA-balancing: deal with CPU-less nodes

 - NUMA-balancing: tune systems that have multiple LLC cache domains per
   node (eg. AMD)

 - Updates to RSEQ UAPI in preparation for glibc usage

 - Lots of RSEQ/selftests, for same

 - Add Suren as PSI co-maintainer

* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
  sched/headers: ARM needs asm/paravirt_api_clock.h too
  sched/numa: Fix boot crash on arm64 systems
  headers/prep: Fix header to build standalone: <linux/psi.h>
  sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
  cgroup: Fix suspicious rcu_dereference_check() usage warning
  sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
  sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
  sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
  sched/deadline,rt: Remove unused functions for !CONFIG_SMP
  sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
  sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
  sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
  sched/deadline: Remove unused def_dl_bandwidth
  sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
  sched/tracing: Don't re-read p->state when emitting sched_switch event
  sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
  sched/cpuacct: Remove redundant RCU read lock
  sched/cpuacct: Optimize away RCU read lock
  sched/cpuacct: Fix charge percpu cpuusage
  sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
  ...
2022-03-22 14:39:12 -07:00