Commit 67db3e4bfb ("tcp: no longer hold ehash lock while calling
tcp_get_info()") removes the only users of u64_stats_update_end/begin_raw()
without removing the function in header file.
Remove no longer used functions.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree, more relevant updates in this batch are:
1) Add Maglev support to IPVS. Moreover, store lastest server weight in
IPVS since this is needed by maglev, patches from from Inju Song.
2) Preparation works to add iptables flowtable support, patches
from Felix Fietkau.
3) Hand over flows back to conntrack slow path in case of TCP RST/FIN
packet is seen via new teardown state, also from Felix.
4) Add support for extended netlink error reporting for nf_tables.
5) Support for larger timeouts that 23 days in nf_tables, patch from
Florian Westphal.
6) Always set an upper limit to dynamic sets, also from Florian.
7) Allow number generator to make map lookups, from Laura Garcia.
8) Use hash_32() instead of opencode hashing in IPVS, from Vicent Bernat.
9) Extend ip6tables SRH match to support previous, next and last SID,
from Ahmed Abdelsalam.
10) Move Passive OS fingerprint nf_osf.c, from Fernando Fernandez.
11) Expose nf_conntrack_max through ctnetlink, from Florent Fourcot.
12) Several housekeeping patches for xt_NFLOG, x_tables and ebtables,
from Taehee Yoo.
13) Unify meta bridge with core nft_meta, then make nft_meta built-in.
Make rt and exthdr built-in too, again from Florian.
14) Missing initialization of tbl->entries in IPVS, from Cong Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
IPCTNL_MSG_CT_GET_STATS netlink command allow to monitor current number
of conntrack entries. However, if one wants to compare it with the
maximum (and detect exhaustion), the only solution is currently to read
sysctl value.
This patch add nf_conntrack_max value in netlink message, and simplify
monitoring for application built on netlink API.
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add nf_osf_ttl() and nf_osf_match() into nf_osf.c to prepare for
nf_tables support.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These macros allow conveniently declaring arrays which use NFT_{RT,CT}_*
values as indexes.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed
by SR encapsulated packet. Each SID is encoded as an IPv6 prefix.
When a Firewall receives an SR encapsulated packet, it should be able
to identify which node previously processed the packet (previous SID),
which node is going to process the packet next (next SID), and which
node is the last to process the packet (last SID) which represent the
final destination of the packet in case of inline SR mode.
An example use-case of using these features could be SID list that
includes two firewalls. When the second firewall receives a packet,
it can check whether the packet has been processed by the first firewall
or not. Based on that check, it decides to apply all rules, apply just
subset of the rules, or totally skip all rules and forward the packet to
the next SID.
This patch extends SRH match to support matching previous SID, next SID,
and last SID.
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch includes a new attribute in the numgen structure to allow
the lookup of an element based on the number generator as a key.
For this purpose, different ops have been included to extend the
current numgen inc functions.
Currently, only supported for numgen incremental operations, but
it will be supported for random in a follow-up patch.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This slipped through the cracks in the followup set to the fib6_info flip.
Rename rt6_next to fib6_next.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Various sockmap fixes from John Fastabend (pinned map handling,
blocking in recvmsg, double page put, error handling during redirect
failures, etc.)
2) Fix dead code handling in x86-64 JIT, from Gianluca Borello.
3) Missing device put in RDS IB code, from Dag Moxnes.
4) Don't process fast open during repair mode in TCP< from Yuchung
Cheng.
5) Move address/port comparison fixes in SCTP, from Xin Long.
6) Handle add a bond slave's master into a bridge properly, from
Hangbin Liu.
7) IPv6 multipath code can operate on unitialized memory due to an
assumption that the icmp header is in the linear SKB area. Fix from
Eric Dumazet.
8) Don't invoke do_tcp_sendpages() recursively via TLS, from Dave
Watson.
9) Fix memory leaks in x86-64 JIT, from Daniel Borkmann.
10) RDS leaks kernel memory to userspace, from Eric Dumazet.
11) DCCP can invoke a tasklet on a freed socket, take a refcount. Also
from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
dccp: fix tasklet usage
smc: fix sendpage() call
net/smc: handle unregistered buffers
net/smc: call consolidation
qed: fix spelling mistake: "offloded" -> "offloaded"
net/mlx5e: fix spelling mistake: "loobpack" -> "loopback"
tcp: restore autocorking
rds: do not leak kernel memory to user land
qmi_wwan: do not steal interfaces from class drivers
ipv4: fix fnhe usage by non-cached routes
bpf: sockmap, fix error handling in redirect failures
bpf: sockmap, zero sg_size on error when buffer is released
bpf: sockmap, fix scatterlist update on error path in send with apply
net_sched: fq: take care of throttled flows before reuse
ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6"
bpf, x64: fix memleak when not converging on calls
bpf, x64: fix memleak when not converging after image
net/smc: restrict non-blocking connect finish
8139too: Use disable_irq_nosync() in rtl8139_poll_controller()
sctp: fix the issue that the cookie-ack with auth can't get processed
...
The following patch enables sending notifications also for events on FDB
entries that weren't added by the user. Give the drivers the information
necessary to distinguish between the two origins of FDB entries.
To maintain the current behavior, have switchdev-implementing drivers
bail out on notifications about non-user-added FDB entries. In case of
mlxsw driver, allow a call to mlxsw_sp_span_respin() so that SPAN over
bridge catches up with the changed FDB.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Tracepoints should not give warning on OOM failures
- Use special field for function pointer in trace event
- Fix igrab issues in uprobes
- Fixes to the new histogram triggers
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCWuoYdBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qtFnAP9X4+AVDQH0VfsMLSc9D+rK6WmcRIhv
q8J2gNPv3anM+AD/SFXWGO4ihN+0KDw/TqmJxESNEybq47vTZ/s5lM6A4gQ=
=fQbj
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various fixes in tracing:
- Tracepoints should not give warning on OOM failures
- Use special field for function pointer in trace event
- Fix igrab issues in uprobes
- Fixes to the new histogram triggers"
* tag 'trace-v4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracepoint: Do not warn on ENOMEM
tracing: Add field modifier parsing hist error for hist triggers
tracing: Add field parsing hist error for hist triggers
tracing: Restore proper field flag printing when displaying triggers
tracing: initcall: Ordered comparison of function pointers
tracing: Remove igrab() iput() call from uprobes.c
tracing: Fix bad use of igrab in trace_uprobe.c
This reverts commit edd7ceb782 ("ipv6: Allow non-gateway ECMP for
IPv6").
Eric reported a division by zero in rt6_multipath_rebalance() which is
caused by above commit that considers identical local routes to be
siblings. The division by zero happens because a nexthop weight is not
set for local routes.
Revert the commit as it does not fix a bug and has side effects.
To reproduce:
# ip -6 address add 2001:db8::1/64 dev dummy0
# ip -6 address add 2001:db8::1/64 dev dummy1
Fixes: edd7ceb782 ("ipv6: Allow non-gateway ECMP for IPv6")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 93c0d549c4c5a7382ad70de6b86610b7aae57406.
Unfortunately the padding will break 32 bit userspace.
Ouch. Need to add some compat code, revert for now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is reported that in some cases, write_space may be called in
do_tcp_sendpages, such that we recursively invoke do_tcp_sendpages again:
[ 660.468802] ? do_tcp_sendpages+0x8d/0x580
[ 660.468826] ? tls_push_sg+0x74/0x130 [tls]
[ 660.468852] ? tls_push_record+0x24a/0x390 [tls]
[ 660.468880] ? tls_write_space+0x6a/0x80 [tls]
...
tls_push_sg already does a loop over all sending sg's, so ignore
any tls_write_space notifications until we are done sending.
We then have to call the previous write_space to wake up
poll() waiters after we are done with the send loop.
Reported-by: Andre Tomt <andre@tomt.net>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.
The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.
Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.
Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.
With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.
V3 change-log:
As suggested by David Miller, added loads with barrier
to check whether we have multiple threads calling recvmsg
in parallel. When that happens we lock the socket to
calculate inq.
V4 change-log:
Removed inline from a static function.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The intention is to get notified of process failures as soon
as possible, before a possible core dumping (which could be very long)
(e.g. in some process-manager). Coredump and exit process events
are perfect for such use cases (see 2b5faa4c55 "connector: Added
coredumping event to the process connector").
The problem is that for now the process-manager cannot know the parent
of a dying process using connectors. This could be useful if the
process-manager should monitor for failures only children of certain
parents, so we could filter the coredump and exit events by parent
process and/or thread ID.
Add parent pid and tgid to coredump and exit process connectors event
data.
Signed-off-by: Stefan Strogin <sstrogin@cisco.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do not require this inline function to be used in multiple different
locations, just inline it where it gets used in register_netdevice().
Suggested-by: David Miller <davem@davemloft.net>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is valid to have static routes where the nexthop
is an interface not an address such as tunnels.
For IPv4 it was possible to use ECMP on these routes
but not for IPv6.
Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz>
Cc: David Ahern <dsahern@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a 32 bit hole just after type. It's best to
give it a name, this way compiler is forced to initialize
it with rest of the structure.
Reported-by: Kevin Easton <kevin@guarana.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And do so if the skb doesn't have enough space for the payload.
This is a preparation for the next patch.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add routines for manipulating TLS TX offload contexts.
In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.
Add implementation for Innova TLS (FPGA-based) hardware.
These routines will be used by the TLS offload support in a later patch
mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs
to work directly with mlx5_core rather than Innova FPGA or other mlx5
acceleration providers.
In the future, when IPSec/TLS or any other acceleration gets integrated
into ConnectX chip, mlx5/accel layer will provide the integrated
acceleration, rather than the Innova one.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a generic infrastructure to offload TLS crypto to a
network device. It enables the kernel TLS socket to skip encryption
and authentication operations on the transmit side of the data path.
Leaving those computationally expensive operations to the NIC.
The NIC offload infrastructure builds TLS records and pushes them to
the TCP layer just like the SW KTLS implementation and using the same
API.
TCP segmentation is mostly unaffected. Currently the only exception is
that we prevent mixed SKBs where only part of the payload requires
offload. In the future we are likely to add a similar restriction
following a change cipher spec record.
The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. The offloaded implementation builds "plaintext TLS record", those
records contain plaintext instead of ciphertext and place holder bytes
instead of authentication tags.
2. The offloaded implementation maintains a mapping from TCP sequence
number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
TLS socket, we can use the tls NIC offload infrastructure to obtain
enough context to encrypt the payload of the SKB.
A TLS record is released when the last byte of the record is ack'ed,
this is done through the new icsk_clean_acked callback.
The infrastructure should be extendable to support various NIC offload
implementations. However it is currently written with the
implementation below in mind:
The NIC assumes that packets from each offloaded stream are sent as
plaintext and in-order. It keeps track of the TLS records in the TCP
stream. When a packet marked for offload is transmitted, the NIC
encrypts the payload in-place and puts authentication tags in the
relevant place holders.
The responsibility for handling out-of-order packets (i.e. TCP
retransmission, qdisc drops) falls on the netdev driver.
The netdev driver keeps track of the expected TCP SN from the NIC's
perspective. If the next packet to transmit matches the expected TCP
SN, the driver advances the expected TCP SN, and transmits the packet
with TLS offload indication.
If the next packet to transmit does not match the expected TCP SN. The
driver calls the TLS layer to obtain the TLS record that includes the
TCP of the packet for transmission. Using this TLS record, the driver
posts a work entry on the transmit queue to reconstruct the NIC TLS
state required for the offload of the out-of-order packet. It updates
the expected TCP SN accordingly and transmits the now in-order packet.
The same queue is used for packet transmission and TLS context
reconstruction to avoid the need for flushing the transmit queue before
issuing the context reconstruction request.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In TLS inline crypto, we can have one direction in software
and another in hardware. Thus, we split the TLS configuration to separate
structures for receive and transmit.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a netdev feature to configure TLS TX offloads.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new netdev ops to add and delete tls context
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With socket dependent offloads we rely on the netdev to transform
the transmitted packets before sending them to the wire.
When a packet from an offloaded socket is rerouted to a different
device we need to detect it and do the transformation in software.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
copy_skb_header is renamed to skb_copy_header and
exported. Exposing this function give more flexibility
in copying SKBs.
skb_copy and skb_copy_expand do not give enough control
over which parts are copied.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Called when a TCP segment is acknowledged.
Could be used by application protocols who hold additional
metadata associated with the stream data.
This is required by TLS device offload to release
metadata associated with acknowledged TLS records.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a couple new functions to allow querying FDB and vlan settings of a
bridge.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have about 53 netdev_features_t bits defined and counting, add a
build time check to catch when an u64 type will not be enough and we
will have to convert that to a bitmap. This is done in
register_netdevice() for convenience.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.
Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.
1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
This operation does not involve any TCP locking.
2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
the transfert of pages from skbs to one VMA.
This operation only uses down_read(¤t->mm->mmap_sem) after
holding TCP lock, thus solving the lockdep issue.
This new implementation was suggested by Andy Lutomirski with great details.
Benefits are :
- Better scalability, in case multiple threads reuse VMAS
(without mmap()/munmap() calls) since mmap_sem wont be write locked.
- Better error recovery.
The previous mmap() model had to provide the expected size of the
mapping. If for some reason one part could not be mapped (partial MSS),
the whole operation had to be aborted.
With the tcp_zerocopy_receive struct, kernel can report how
many bytes were successfuly mapped, and how many bytes should
be read to skip the problematic sequence.
- No more memory allocation to hold an array of page pointers.
16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/
- skbs are freed while mmap_sem has been released
Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
Note that memcg might require additional changes.
Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull timer fixes from Thomas Gleixner:
"Two fixes from the timer departement:
- Fix a long standing issue in the NOHZ tick code which causes RB
tree corruption, delayed timers and other malfunctions. The cause
for this is code which modifies the expiry time of an enqueued
hrtimer.
- Revert the CLOCK_MONOTONIC/CLOCK_BOOTTIME unification due to
regression reports. Seems userspace _is_ relying on the documented
behaviour despite our hope that it wont"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME
tick/sched: Do not mess with an enqueued hrtimer
The comment claims that this helper will try not to loose bits, but for
64bit long it looses the high bits before hashing 64bit long into 32bit
int. Use the helper hash_long() to do the right thing for 64bit long.
For 32bit long, there is no change.
All the callers of end_name_hash() either assign the result to
qstr->hash, which is u32 or return the result as an int value (e.g.
full_name_hash()). Change the helper return type to int to conform to
its users.
[ It took me a while to apply this, because my initial reaction to it
was - incorrectly - that it could make for slower code.
After having looked more at it, I take back all my complaints about
the patch, Amir was right and I was mis-reading things or just being
stupid.
I also don't worry too much about the possible performance impact of
this on 64-bit, since most architectures that actually care about
performance end up not using this very much (the dcache code is the
most performance-critical, but the word-at-a-time case uses its own
hashing anyway).
So this ends up being mostly used for filesystems that do their own
degraded hashing (usually because they want a case-insensitive
comparison function).
A _tiny_ worry remains, in that not everybody uses DCACHE_WORD_ACCESS,
and then this potentially makes things more expensive on 64-bit
architectures with slow or lacking multipliers even for the normal
case.
That said, realistically the only such architecture I can think of is
PA-RISC. Nobody really cares about performance on that, it's more of a
"look ma, I've got warts^W an odd machine" platform.
So the patch is fine, and all my initial worries were just misplaced
from not looking at this properly. - Linus ]
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit c59530d0d5 ("net: Move PHY statistics code into PHY
library helpers") we made net/core/ethtool.c reference symbols which are
part of the library which can be modular. David introduced a temporary
fix with 1ecd6e8ad9 ("phy: Temporary build fix after phylib changes.")
which would prevent such modularity.
This is not desireable of course, so instead, just inline the functions
into include/linux/phy.h to keep both options available.
Fixes: c59530d0d5 ("net: Move PHY statistics code into PHY library helpers")
Fixes: 1ecd6e8ad9 ("phy: Temporary build fix after phylib changes.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ARM:
- PSCI selection API, a leftover from 4.16 (for stable)
- Kick vcpu on active interrupt affinity change
- Plug a VMID allocation race on oversubscribed systems
- Silence debug messages
- Update Christoffer's email address (linaro -> arm)
x86:
- Expose userspace-relevant bits of a newly added feature
- Fix TLB flushing on VMX with VPID, but without EPT
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJa44lQAAoJEED/6hsPKofo1dIH/3n9AZSWvavgL2V3j6agT8Yy
hxF4nHCFEJd5aqDNwbG9QEzivKw88r3o3mdB2XAQESB2MlCYR1jkTONm7yvVJTs/
/P9gj+DEQbCj2AgT//u3BGsAsZDKFhB9JwfmV2Mp4zDIqWFa6oCOGeq/iPVAGDcN
vUpuYeIicuH9SRoxH7de3z+BEXW0O+gCABXQtvA93FKTMz35yFTgmbDVCnvaV0zL
3B+3/4/jdbTRICW8EX6Li43+gEBUMtnVNkdqxLPTuCtDG8iuPUGfgF02gH99/9gj
hliV3Q4VUZKkSABW5AqKPe4+9rbsHCh9eL0LpHFGI9y+6LeUIOXAX4CtohR8gWE=
=W9Vz
-----END PGP SIGNATURE-----
rMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- PSCI selection API, a leftover from 4.16 (for stable)
- Kick vcpu on active interrupt affinity change
- Plug a VMID allocation race on oversubscribed systems
- Silence debug messages
- Update Christoffer's email address (linaro -> arm)
x86:
- Expose userspace-relevant bits of a newly added feature
- Fix TLB flushing on VMX with VPID, but without EPT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI
kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in use
arm/arm64: KVM: Add PSCI version selection API
KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration
arm64: KVM: Demote SVE and LORegion warnings to debug only
MAINTAINERS: Update e-mail address for Christoffer Dall
KVM: arm/arm64: Close VMID generation race
We are now keeping the MTU information synced between asoc, transport
and dst, which makes the check at sctp_packet_config() not needed
anymore. As it was the sole caller to this function, lets remove it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Which makes sure that the MTU respects the minimum value of
SCTP_DEFAULT_MINSEGMENT and that it is correctly aligned.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
and avoid the open-coded versions of it.
Now sctp_datamsg_from_user can just re-use asoc->frag_point as it will
always be updated.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When given a MTU, this function calculates how much payload we can carry
on it. Without a MTU, it calculates the amount of header overhead we
have.
So that when we have extra overhead, like the one added for IP options
on SELinux patches, it is easier to handle it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All changes to asoc PMTU should now go through this wrapper, making it
easier to track them and to do other actions upon it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This value is not used anywhere in the code. In essence it is a
duplicate of SCTP_DEFAULT_MINSEGMENT, which is used by the stack.
SCTP_MIN_PMTU value makes more sense, but we should not change to it now
as it would risk breaking applications.
So this patch removes SCTP_MIN_PMTU and adjust the comment above it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are some small char and misc driver fixes for 4.17-rc3
A variety of small things that have fallen out after 4.17-rc1 was out.
Some vboxguest fixes for systems with lots of memory, amba bus fixes,
some MAINTAINERS updates, uio_hv_generic driver fixes, and a few other
minor things that resolve problems that people reported.
The amba bus fixes took twice to get right, the first time I messed up
applying the patches in the wrong order, hence the revert and later
addition again with the correct fix, sorry about that.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWuMvtA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynNswCfcg2lXYFZzS1OcMYX4TEyo/cWbHQAniYIU6st
dP0MG/whdUu7q6r6otVc
=l/Bn
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 4.17-rc3
A variety of small things that have fallen out after 4.17-rc1 was out.
Some vboxguest fixes for systems with lots of memory, amba bus fixes,
some MAINTAINERS updates, uio_hv_generic driver fixes, and a few other
minor things that resolve problems that people reported.
The amba bus fixes took twice to get right, the first time I messed up
applying the patches in the wrong order, hence the revert and later
addition again with the correct fix, sorry about that.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
ARM: amba: Fix race condition with driver_override
ARM: amba: Make driver_override output consistent with other buses
Revert "ARM: amba: Fix race condition with driver_override"
ARM: amba: Don't read past the end of sysfs "driver_override" buffer
ARM: amba: Fix race condition with driver_override
virt: vbox: Log an error when we fail to get the host version
virt: vbox: Use __get_free_pages instead of kmalloc for DMA32 memory
virt: vbox: Add vbg_req_free() helper function
virt: vbox: Move declarations of vboxguest private functions to private header
slimbus: Fix out-of-bounds access in slim_slicesize()
MAINTAINERS: add dri-devel&linaro-mm for Android ION
fpga-manager: altera-ps-spi: preserve nCONFIG state
MAINTAINERS: update my email address
uio_hv_generic: fix subchannel ring mmap
uio_hv_generic: use correct channel in isr
uio_hv_generic: make ring buffer attribute for primary channel
uio_hv_generic: set size of ring buffer attribute
ANDROID: binder: prevent transactions into own process.
Here are some small driver core and firmware fixes for 4.17-rc3
There's a kobject WARN() removal to make syzkaller a lot happier about
some "normal" error paths that it keeps hitting, which should reduce the
number of false-positives we have been getting recently.
There's also some fimware test and documentation fixes, and the
coredump() function signature change that needed to happen after -rc1
before drivers started to take advantage of it.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWuMxrw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymFpQCg1JM62/W8e6mQ4vdZNQmAzgMKMEMAniOMcVRX
/oDWXp64mYwJu+GTxnIJ
=+9Gk
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg Kroah-Hartman:
"Here are some small driver core and firmware fixes for 4.17-rc3
There's a kobject WARN() removal to make syzkaller a lot happier about
some "normal" error paths that it keeps hitting, which should reduce
the number of false-positives we have been getting recently.
There's also some fimware test and documentation fixes, and the
coredump() function signature change that needed to happen after -rc1
before drivers started to take advantage of it.
All of these have been in linux-next with no reported issues"
* tag 'driver-core-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
firmware: some documentation fixes
selftests:firmware: fixes a call to a wrong function name
kobject: don't use WARN for registration failures
firmware: Fix firmware documentation for recent file renames
test_firmware: fix setting old custom fw path back on exit, second try
test_firmware: Install all scripts
drivers: change struct device_driver::coredump() return type to void
Here are some tty and serial driver fixes for reported issues for
4.17-rc3.
Nothing major, but a number of small things:
- device tree fixes/updates for serial ports
- earlycon fixes
- n_gsm fixes
- tty core change reverted to help resolve syszkaller reports
- other serial driver small fixes
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWuM0Vg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yksXgCgqT2E9Ah5O/uZp3yGAjRILKjX294AoI46+t3W
Xy8WA1Fw1NCMBvfvCVIv
=gBd0
-----END PGP SIGNATURE-----
Merge tag 'tty-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some tty and serial driver fixes for reported issues for
4.17-rc3.
Nothing major, but a number of small things:
- device tree fixes/updates for serial ports
- earlycon fixes
- n_gsm fixes
- tty core change reverted to help resolve syszkaller reports
- other serial driver small fixes
All of these have been in linux-next with no reported issues"
* tag 'tty-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Use __GFP_NOFAIL for tty_ldisc_get()
tty: serial: xuartps: Setup early console when uartclk is also passed
tty: Don't call panic() at tty_ldisc_init()
tty: Avoid possible error pointer dereference at tty_ldisc_restore().
dt-bindings: mvebu-uart: DT fix s/interrupts-names/interrupt-names/
tty: serial: qcom_geni_serial: Use signed variable to get IRQ
earlycon: Use a pointer table to fix __earlycon_table stride
serial: sh-sci: Document r8a77470 bindings
dt-bindings: meson-uart: DT fix s/clocks-names/clock-names/
serial: imx: fix cached UCR2 read on software reset
serial: imx: warn user when using unsupported configuration
serial: mvebu-uart: Fix local flags handling on termios update
tty: n_gsm: Fix DLCI handling for ADM mode if debug & 2 is not set
tty: n_gsm: Fix long delays with control frame timeouts in ADM mode
Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of
capabilities.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Here are 2 staging driver fixups for 4.17-rc3.
The first is the remaining stragglers of the irda code removal that you
pointed out during the merge window. The second is a fix for the
wilc1000 driver due to a patch that got merged in 4.17-rc1.
Both of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWuMyew8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymXxACffYtMbj0Vg5pD0yAPqRzJ2iVMVE0AnRkp4BYQ
kXgAjDeSyrdKPUwQ7Hl2
=UNuF
-----END PGP SIGNATURE-----
Merge tag 'staging-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are two staging driver fixups for 4.17-rc3.
The first is the remaining stragglers of the irda code removal that
you pointed out during the merge window. The second is a fix for the
wilc1000 driver due to a patch that got merged in 4.17-rc1.
Both of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: wilc1000: fix NULL pointer exception in host_int_parse_assoc_resp_info()
staging: irda: remove remaining remants of irda code removal