From b212be3528e64ff306f7477628cc042e448742c5 Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Mon, 6 May 2013 17:03:32 +0800 Subject: [PATCH 01/63] irqchip: renesas-irqc: Fix irqc_probe error handling commit dfaf820a13ec160f06556e08dab423818ba87f14 upstream. The code in goto err3 path is wrong because it will call fee_irq() with k == 0, which means it does free_irq(p->irq[-1].requested_irq, &p->irq[-1]); Signed-off-by: Axel Lin Signed-off-by: Simon Horman Signed-off-by: Greg Kroah-Hartman --- drivers/irqchip/irq-renesas-irqc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-renesas-irqc.c b/drivers/irqchip/irq-renesas-irqc.c index 927bff373aac..2f404ba61c6c 100644 --- a/drivers/irqchip/irq-renesas-irqc.c +++ b/drivers/irqchip/irq-renesas-irqc.c @@ -248,8 +248,8 @@ static int irqc_probe(struct platform_device *pdev) return 0; err3: - for (; k >= 0; k--) - free_irq(p->irq[k - 1].requested_irq, &p->irq[k - 1]); + while (--k >= 0) + free_irq(p->irq[k].requested_irq, &p->irq[k]); irq_domain_remove(p->irq_domain); err2: From bc77a313b995bdd4c8806593326ffea8d01ec7bc Mon Sep 17 00:00:00 2001 From: Magnus Damm Date: Wed, 18 Sep 2013 15:01:16 -0500 Subject: [PATCH 02/63] clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast commit 2199a5574b6d94b9ca26c6345356f45ec60fef8b upstream. Update the STI driver by setting cpu_possible_mask to make EMEV2 SMP work as expected together with the ARM broadcast timer. This breakage was introduced by: f7db706 ARM: 7674/1: smp: Avoid dummy clockevent being preferred over real hardware clock-event Without this fix SMP operation is broken on EMEV2 since no broadcast timer interrupts trigger on the secondary CPU cores. Signed-off-by: Magnus Damm Tested-by: Simon Horman Reviewed-by: Stephen Boyd Signed-off-by: Simon Horman Signed-off-by: Daniel Lezcano Signed-off-by: Greg Kroah-Hartman --- drivers/clocksource/em_sti.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c index 4329a29a5310..314184932293 100644 --- a/drivers/clocksource/em_sti.c +++ b/drivers/clocksource/em_sti.c @@ -301,7 +301,7 @@ static void em_sti_register_clockevent(struct em_sti_priv *p) ced->name = dev_name(&p->pdev->dev); ced->features = CLOCK_EVT_FEAT_ONESHOT; ced->rating = 200; - ced->cpumask = cpumask_of(0); + ced->cpumask = cpu_possible_mask; ced->set_next_event = em_sti_clock_event_next; ced->set_mode = em_sti_clock_event_mode; From 42ddc36a02c01afb397e60489223cc3d82e6942d Mon Sep 17 00:00:00 2001 From: Kuninori Morimoto Date: Wed, 17 Apr 2013 23:40:57 -0700 Subject: [PATCH 03/63] gpio-rcar: R-Car GPIO IRQ share interrupt commit c234962b808f289237a40e4ce5fc1c8066d1c9d0 upstream. R-Car H1 or Gen2 GPIO interrupts are assigned per each GPIO domain, but, Gen1 E1/M1 GPIO interrupts are shared for all GPIO domain. gpio-rcar driver needs IRQF_SHARED flags for these. This patch was tested on Bock-W board Signed-off-by: Kuninori Morimoto Signed-off-by: Simon Horman Signed-off-by: Greg Kroah-Hartman --- drivers/gpio/gpio-rcar.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpio/gpio-rcar.c b/drivers/gpio/gpio-rcar.c index c45ad478f462..f3c5d39f138e 100644 --- a/drivers/gpio/gpio-rcar.c +++ b/drivers/gpio/gpio-rcar.c @@ -333,7 +333,7 @@ static int gpio_rcar_probe(struct platform_device *pdev) } if (devm_request_irq(&pdev->dev, irq->start, - gpio_rcar_irq_handler, 0, name, p)) { + gpio_rcar_irq_handler, IRQF_SHARED, name, p)) { dev_err(&pdev->dev, "failed to request IRQ\n"); ret = -ENOENT; goto err1; From ac8471394930cd333a54373e4351131f85f80100 Mon Sep 17 00:00:00 2001 From: Nestor Lopez Casado Date: Thu, 18 Jul 2013 06:21:30 -0700 Subject: [PATCH 04/63] HID: Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue"" commit c63e0e370028d7e4033bd40165f18499872b5183 upstream. This reverts commit 8af6c08830b1ae114d1a8b548b1f8b056e068887. This patch re-adds the workaround introduced by 596264082f10dd4 which was reverted by 8af6c08830b1ae114. The original patch 596264 was needed to overcome a situation where the hid-core would drop incoming reports while probe() was being executed. This issue was solved by c849a6143bec520af which added hid_device_io_start() and hid_device_io_stop() that enable a specific hid driver to opt-in for input reports while its probe() is being executed. Commit a9dd22b730857347 modified hid-logitech-dj so as to use the functionality added to hid-core. Having done that, workaround 596264 was no longer necessary and was reverted by 8af6c08. We now encounter a different problem that ends up 'again' thwarting the Unifying receiver enumeration. The problem is time and usb controller dependent. Ocasionally the reports sent to the usb receiver to start the paired devices enumeration fail with -EPIPE and the receiver never gets to enumerate the paired devices. With dcd9006b1b053c7b1c the problem was "hidden" as the call to the usb driver became asynchronous and none was catching the error from the failing URB. As the root cause for this failing SET_REPORT is not understood yet, -possibly a race on the usb controller drivers or a problem with the Unifying receiver- reintroducing this workaround solves the problem. Overall what this workaround does is: If an input report from an unknown device is received, then a (re)enumeration is performed. related bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1194649 Signed-off-by: Nestor Lopez Casado Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman --- drivers/hid/hid-logitech-dj.c | 45 +++++++++++++++++++++++++++++++++++ drivers/hid/hid-logitech-dj.h | 1 + 2 files changed, 46 insertions(+) diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c index 4f762bc9456a..1be9156a3950 100644 --- a/drivers/hid/hid-logitech-dj.c +++ b/drivers/hid/hid-logitech-dj.c @@ -192,6 +192,7 @@ static struct hid_ll_driver logi_dj_ll_driver; static int logi_dj_output_hidraw_report(struct hid_device *hid, u8 * buf, size_t count, unsigned char report_type); +static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev); static void logi_dj_recv_destroy_djhid_device(struct dj_receiver_dev *djrcv_dev, struct dj_report *dj_report) @@ -232,6 +233,7 @@ static void logi_dj_recv_add_djhid_device(struct dj_receiver_dev *djrcv_dev, if (dj_report->report_params[DEVICE_PAIRED_PARAM_SPFUNCTION] & SPFUNCTION_DEVICE_LIST_EMPTY) { dbg_hid("%s: device list is empty\n", __func__); + djrcv_dev->querying_devices = false; return; } @@ -242,6 +244,12 @@ static void logi_dj_recv_add_djhid_device(struct dj_receiver_dev *djrcv_dev, return; } + if (djrcv_dev->paired_dj_devices[dj_report->device_index]) { + /* The device is already known. No need to reallocate it. */ + dbg_hid("%s: device is already known\n", __func__); + return; + } + dj_hiddev = hid_allocate_device(); if (IS_ERR(dj_hiddev)) { dev_err(&djrcv_hdev->dev, "%s: hid_allocate_device failed\n", @@ -305,6 +313,7 @@ static void delayedwork_callback(struct work_struct *work) struct dj_report dj_report; unsigned long flags; int count; + int retval; dbg_hid("%s\n", __func__); @@ -337,6 +346,25 @@ static void delayedwork_callback(struct work_struct *work) logi_dj_recv_destroy_djhid_device(djrcv_dev, &dj_report); break; default: + /* A normal report (i. e. not belonging to a pair/unpair notification) + * arriving here, means that the report arrived but we did not have a + * paired dj_device associated to the report's device_index, this + * means that the original "device paired" notification corresponding + * to this dj_device never arrived to this driver. The reason is that + * hid-core discards all packets coming from a device while probe() is + * executing. */ + if (!djrcv_dev->paired_dj_devices[dj_report.device_index]) { + /* ok, we don't know the device, just re-ask the + * receiver for the list of connected devices. */ + retval = logi_dj_recv_query_paired_devices(djrcv_dev); + if (!retval) { + /* everything went fine, so just leave */ + break; + } + dev_err(&djrcv_dev->hdev->dev, + "%s:logi_dj_recv_query_paired_devices " + "error:%d\n", __func__, retval); + } dbg_hid("%s: unexpected report type\n", __func__); } } @@ -367,6 +395,12 @@ static void logi_dj_recv_forward_null_report(struct dj_receiver_dev *djrcv_dev, if (!djdev) { dbg_hid("djrcv_dev->paired_dj_devices[dj_report->device_index]" " is NULL, index %d\n", dj_report->device_index); + kfifo_in(&djrcv_dev->notif_fifo, dj_report, sizeof(struct dj_report)); + + if (schedule_work(&djrcv_dev->work) == 0) { + dbg_hid("%s: did not schedule the work item, was already " + "queued\n", __func__); + } return; } @@ -397,6 +431,12 @@ static void logi_dj_recv_forward_report(struct dj_receiver_dev *djrcv_dev, if (dj_device == NULL) { dbg_hid("djrcv_dev->paired_dj_devices[dj_report->device_index]" " is NULL, index %d\n", dj_report->device_index); + kfifo_in(&djrcv_dev->notif_fifo, dj_report, sizeof(struct dj_report)); + + if (schedule_work(&djrcv_dev->work) == 0) { + dbg_hid("%s: did not schedule the work item, was already " + "queued\n", __func__); + } return; } @@ -444,6 +484,10 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev) struct dj_report *dj_report; int retval; + /* no need to protect djrcv_dev->querying_devices */ + if (djrcv_dev->querying_devices) + return 0; + dj_report = kzalloc(sizeof(struct dj_report), GFP_KERNEL); if (!dj_report) return -ENOMEM; @@ -455,6 +499,7 @@ static int logi_dj_recv_query_paired_devices(struct dj_receiver_dev *djrcv_dev) return retval; } + static int logi_dj_recv_switch_to_dj_mode(struct dj_receiver_dev *djrcv_dev, unsigned timeout) { diff --git a/drivers/hid/hid-logitech-dj.h b/drivers/hid/hid-logitech-dj.h index fd28a5e0ca3b..4a4000340ce1 100644 --- a/drivers/hid/hid-logitech-dj.h +++ b/drivers/hid/hid-logitech-dj.h @@ -101,6 +101,7 @@ struct dj_receiver_dev { struct work_struct work; struct kfifo notif_fifo; spinlock_t lock; + bool querying_devices; }; struct dj_device { From 5af58eb0bf051ad897859a0599ee6c6c9f199c10 Mon Sep 17 00:00:00 2001 From: Kamala R Date: Mon, 2 Dec 2013 19:55:21 +0530 Subject: [PATCH 05/63] IPv6: Fixed support for blackhole and prohibit routes [ Upstream commit 7150aede5dd241539686e17d9592f5ebd28a2cda ] The behaviour of blackhole and prohibit routes has been corrected by setting the input and output pointers of the dst variable appropriately. For blackhole routes, they are set to dst_discard and to ip6_pkt_discard and ip6_pkt_discard_out respectively for prohibit routes. ipv6: ip6_pkt_prohibit(_out) should not depend on CONFIG_IPV6_MULTIPLE_TABLES We need ip6_pkt_prohibit(_out) available without CONFIG_IPV6_MULTIPLE_TABLES Signed-off-by: Kamala R Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 5a8bf536026c..8a8ae819c1d8 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -84,6 +84,8 @@ static int ip6_dst_gc(struct dst_ops *ops); static int ip6_pkt_discard(struct sk_buff *skb); static int ip6_pkt_discard_out(struct sk_buff *skb); +static int ip6_pkt_prohibit(struct sk_buff *skb); +static int ip6_pkt_prohibit_out(struct sk_buff *skb); static void ip6_link_failure(struct sk_buff *skb); static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu); @@ -233,9 +235,6 @@ static const struct rt6_info ip6_null_entry_template = { #ifdef CONFIG_IPV6_MULTIPLE_TABLES -static int ip6_pkt_prohibit(struct sk_buff *skb); -static int ip6_pkt_prohibit_out(struct sk_buff *skb); - static const struct rt6_info ip6_prohibit_entry_template = { .dst = { .__refcnt = ATOMIC_INIT(1), @@ -1498,21 +1497,24 @@ int ip6_route_add(struct fib6_config *cfg) goto out; } } - rt->dst.output = ip6_pkt_discard_out; - rt->dst.input = ip6_pkt_discard; rt->rt6i_flags = RTF_REJECT|RTF_NONEXTHOP; switch (cfg->fc_type) { case RTN_BLACKHOLE: rt->dst.error = -EINVAL; + rt->dst.output = dst_discard; + rt->dst.input = dst_discard; break; case RTN_PROHIBIT: rt->dst.error = -EACCES; + rt->dst.output = ip6_pkt_prohibit_out; + rt->dst.input = ip6_pkt_prohibit; break; case RTN_THROW: - rt->dst.error = -EAGAIN; - break; default: - rt->dst.error = -ENETUNREACH; + rt->dst.error = (cfg->fc_type == RTN_THROW) ? -EAGAIN + : -ENETUNREACH; + rt->dst.output = ip6_pkt_discard_out; + rt->dst.input = ip6_pkt_discard; break; } goto install_route; @@ -2077,8 +2079,6 @@ static int ip6_pkt_discard_out(struct sk_buff *skb) return ip6_pkt_drop(skb, ICMPV6_NOROUTE, IPSTATS_MIB_OUTNOROUTES); } -#ifdef CONFIG_IPV6_MULTIPLE_TABLES - static int ip6_pkt_prohibit(struct sk_buff *skb) { return ip6_pkt_drop(skb, ICMPV6_ADM_PROHIBITED, IPSTATS_MIB_INNOROUTES); @@ -2090,8 +2090,6 @@ static int ip6_pkt_prohibit_out(struct sk_buff *skb) return ip6_pkt_drop(skb, ICMPV6_ADM_PROHIBITED, IPSTATS_MIB_OUTNOROUTES); } -#endif - /* * Allocate a dst for local (unicast / anycast) address. */ From 908e2eb543cf0c2e88d93cfd1ce9b83b2e951aa5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 2 Dec 2013 08:51:13 -0800 Subject: [PATCH 06/63] net: do not pretend FRAGLIST support [ Upstream commit 28e24c62ab3062e965ef1b3bcc244d50aee7fa85 ] Few network drivers really supports frag_list : virtual drivers. Some drivers wrongly advertise NETIF_F_FRAGLIST feature. If skb with a frag_list is given to them, packet on the wire will be corrupt. Remove this flag, as core networking stack will make sure to provide packets that can be sent without corruption. Signed-off-by: Eric Dumazet Cc: Thadeu Lima de Souza Cascardo Cc: Anirudha Sarangi Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/ibm/ehea/ehea_main.c | 2 +- drivers/net/ethernet/tehuti/tehuti.c | 1 - drivers/net/ethernet/xilinx/ll_temac_main.c | 2 +- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 2 +- 4 files changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c index 90ea0b1673ca..716dae7b9d8e 100644 --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c @@ -3023,7 +3023,7 @@ static struct ehea_port *ehea_setup_single_port(struct ehea_adapter *adapter, dev->hw_features = NETIF_F_SG | NETIF_F_TSO | NETIF_F_IP_CSUM | NETIF_F_HW_VLAN_CTAG_TX; - dev->features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_TSO | + dev->features = NETIF_F_SG | NETIF_F_TSO | NETIF_F_HIGHDMA | NETIF_F_IP_CSUM | NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_RXCSUM; diff --git a/drivers/net/ethernet/tehuti/tehuti.c b/drivers/net/ethernet/tehuti/tehuti.c index 571452e786d5..61a1540f1347 100644 --- a/drivers/net/ethernet/tehuti/tehuti.c +++ b/drivers/net/ethernet/tehuti/tehuti.c @@ -2019,7 +2019,6 @@ bdx_probe(struct pci_dev *pdev, const struct pci_device_id *ent) ndev->features = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO | NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_RXCSUM - /*| NETIF_F_FRAGLIST */ ; ndev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO | NETIF_F_HW_VLAN_CTAG_TX; diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c b/drivers/net/ethernet/xilinx/ll_temac_main.c index 5444f2b87d01..52de70bd0bba 100644 --- a/drivers/net/ethernet/xilinx/ll_temac_main.c +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c @@ -1016,7 +1016,7 @@ static int temac_of_probe(struct platform_device *op) dev_set_drvdata(&op->dev, ndev); SET_NETDEV_DEV(ndev, &op->dev); ndev->flags &= ~IFF_MULTICAST; /* clear multicast */ - ndev->features = NETIF_F_SG | NETIF_F_FRAGLIST; + ndev->features = NETIF_F_SG; ndev->netdev_ops = &temac_netdev_ops; ndev->ethtool_ops = &temac_ethtool_ops; #if 0 diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 24748e8367a1..2e2eeba37a06 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -1488,7 +1488,7 @@ static int axienet_of_probe(struct platform_device *op) SET_NETDEV_DEV(ndev, &op->dev); ndev->flags &= ~IFF_MULTICAST; /* clear multicast */ - ndev->features = NETIF_F_SG | NETIF_F_FRAGLIST; + ndev->features = NETIF_F_SG; ndev->netdev_ops = &axienet_netdev_ops; ndev->ethtool_ops = &axienet_ethtool_ops; From bb309e485a2c7e5c354ed0c3ed24a0714643e6bd Mon Sep 17 00:00:00 2001 From: Venkat Venkatsubra Date: Mon, 2 Dec 2013 15:41:39 -0800 Subject: [PATCH 07/63] rds: prevent BUG_ON triggered on congestion update to loopback [ Upstream commit 18fc25c94eadc52a42c025125af24657a93638c0 ] After congestion update on a local connection, when rds_ib_xmit returns less bytes than that are there in the message, rds_send_xmit calls back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE) to trigger. For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually the message contains 8240 bytes. rds_send_xmit thinks there is more to send and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header) =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k]. The commit 6094628bfd94323fc1cea05ec2c6affd98c18f7f "rds: prevent BUG_ON triggering on congestion map updates" introduced this regression. That change was addressing the triggering of a different BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE: BUG_ON(ret != 0 && conn->c_xmit_sg == rm->data.op_nents); This was the sequence it was going through: (rds_ib_xmit) /* Do not send cong updates to IB loopback */ if (conn->c_loopback && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) { rds_cong_map_updated(conn->c_fcong, ~(u64) 0); return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES; } rds_ib_xmit returns 8240 rds_send_xmit: c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time) = 8192 c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again rds_ib_xmit returns 8240 rds_send_xmit: c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again and so on (c_xmit_data_off 24672,32912,41152,49392,57632) rds_ib_xmit returns 8240 On this iteration this sequence causes the BUG_ON in rds_send_xmit: while (ret) { tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off); [tmp = 65536 - 57632 = 7904] conn->c_xmit_data_off += tmp; [c_xmit_data_off = 57632 + 7904 = 65536] ret -= tmp; [ret = 8240 - 7904 = 336] if (conn->c_xmit_data_off == sg->length) { conn->c_xmit_data_off = 0; sg++; conn->c_xmit_sg++; BUG_ON(ret != 0 && conn->c_xmit_sg == rm->data.op_nents); [c_xmit_sg = 1, rm->data.op_nents = 1] What the current fix does: Since the congestion update over loopback is not actually transmitted as a message, all that rds_ib_xmit needs to do is let the caller think the full message has been transmitted and not return partial bytes. It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb. And 64Kb+48 when page size is 64Kb. Reported-by: Josh Hunt Tested-by: Honggang Li Acked-by: Bang Nguyen Signed-off-by: Venkat Venkatsubra Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rds/ib_send.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c index e59094981175..37be6e226d1b 100644 --- a/net/rds/ib_send.c +++ b/net/rds/ib_send.c @@ -552,9 +552,8 @@ int rds_ib_xmit(struct rds_connection *conn, struct rds_message *rm, && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) { rds_cong_map_updated(conn->c_fcong, ~(u64) 0); scat = &rm->data.op_sg[sg]; - ret = sizeof(struct rds_header) + RDS_CONG_MAP_BYTES; - ret = min_t(int, ret, scat->length - conn->c_xmit_data_off); - return ret; + ret = max_t(int, RDS_CONG_MAP_BYTES, scat->length); + return sizeof(struct rds_header) + ret; } /* FIXME we may overallocate here */ From 5938ed90c0087020ba5c27d57b87174afde6907a Mon Sep 17 00:00:00 2001 From: Vlad Yasevich Date: Tue, 26 Nov 2013 12:37:12 -0500 Subject: [PATCH 08/63] macvtap: Do not double-count received packets [ Upstream commit 006da7b07bc4d3a7ffabad17cf639eec6849c9dc ] Currently macvlan will count received packets after calling each vlans receive handler. Macvtap attempts to count the packet yet again when the user reads the packet from the tap socket. This code doesn't do this consistently either. Remove the counting from macvtap and let only macvlan count received packets. Signed-off-by: Vlad Yasevich Acked-by: Michael S. Tsirkin Acked-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/macvtap.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index c70ff7dac00e..8aab10dc13e2 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -797,7 +797,6 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, const struct sk_buff *skb, const struct iovec *iv, int len) { - struct macvlan_dev *vlan; int ret; int vnet_hdr_len = 0; int vlan_offset = 0; @@ -851,12 +850,6 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, copied += len; done: - rcu_read_lock_bh(); - vlan = rcu_dereference_bh(q->vlan); - if (vlan) - macvlan_count_rx(vlan, copied - vnet_hdr_len, ret == 0, 0); - rcu_read_unlock_bh(); - return ret ? ret : copied; } From e1cef3c9843709fd913021679c3bf711c8301627 Mon Sep 17 00:00:00 2001 From: Zhi Yong Wu Date: Fri, 6 Dec 2013 14:16:50 +0800 Subject: [PATCH 09/63] macvtap: update file current position [ Upstream commit e6ebc7f16ca1434a334647aa56399c546be4e64b ] Signed-off-by: Zhi Yong Wu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/macvtap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 8aab10dc13e2..66e37010b5ed 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -903,6 +903,8 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv, ret = macvtap_do_read(q, iocb, iv, len, file->f_flags & O_NONBLOCK); ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */ + if (ret > 0) + iocb->ki_pos = ret; out: return ret; } From bfc2ba016161cc523ed6ea9262651b34070b8fff Mon Sep 17 00:00:00 2001 From: Zhi Yong Wu Date: Fri, 6 Dec 2013 14:16:51 +0800 Subject: [PATCH 10/63] tun: update file current position [ Upstream commit d0b7da8afa079ffe018ab3e92879b7138977fc8f ] Signed-off-by: Zhi Yong Wu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/tun.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 9ef85fea1d1e..582497103fe8 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1412,6 +1412,8 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, ret = tun_do_read(tun, tfile, iocb, iv, len, file->f_flags & O_NONBLOCK); ret = min_t(ssize_t, ret, len); + if (ret > 0) + iocb->ki_pos = ret; out: tun_put(tun); return ret; From da47609f56067ea8515e79af821190a4b8bd6e6d Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Wed, 11 Dec 2013 13:08:34 +0800 Subject: [PATCH 11/63] macvtap: signal truncated packets [ Upstream commit ce232ce01d61b184202bb185103d119820e1260c ] macvtap_put_user() never return a value grater than iov length, this in fact bypasses the truncated checking in macvtap_recvmsg(). Fix this by always returning the size of packet plus the possible vlan header to let the trunca checking work. Cc: Vlad Yasevich Cc: Zhi Yong Wu Cc: Michael S. Tsirkin Signed-off-by: Jason Wang Acked-by: Vlad Yasevich Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/macvtap.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 66e37010b5ed..9e56eb479a4f 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -800,7 +800,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, int ret; int vnet_hdr_len = 0; int vlan_offset = 0; - int copied; + int copied, total; if (q->flags & IFF_VNET_HDR) { struct virtio_net_hdr vnet_hdr; @@ -815,7 +815,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, if (memcpy_toiovecend(iv, (void *)&vnet_hdr, 0, sizeof(vnet_hdr))) return -EFAULT; } - copied = vnet_hdr_len; + total = copied = vnet_hdr_len; + total += skb->len; if (!vlan_tx_tag_present(skb)) len = min_t(int, skb->len, len); @@ -830,6 +831,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto); len = min_t(int, skb->len + VLAN_HLEN, len); + total += VLAN_HLEN; copy = min_t(int, vlan_offset, len); ret = skb_copy_datagram_const_iovec(skb, 0, iv, copied, copy); @@ -847,10 +849,9 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, } ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, copied, len); - copied += len; done: - return ret ? ret : copied; + return ret ? ret : total; } static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb, @@ -902,7 +903,7 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv, } ret = macvtap_do_read(q, iocb, iv, len, file->f_flags & O_NONBLOCK); - ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */ + ret = min_t(ssize_t, ret, len); if (ret > 0) iocb->ki_pos = ret; out: From 4ea85d9e4113ba0b2d67d762a29c809df948426a Mon Sep 17 00:00:00 2001 From: Andrey Vagin Date: Thu, 5 Dec 2013 18:36:21 +0400 Subject: [PATCH 12/63] virtio: delete napi structures from netdev before releasing memory [ Upstream commit d4fb84eefe5164f6a6ea51d0a9e26280c661a0dd ] free_netdev calls netif_napi_del too, but it's too late, because napi structures are placed on vi->rq. netif_napi_add() is called from virtnet_alloc_queues. general protection fault: 0000 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables virtio_balloon pcspkr virtio_net(-) i2c_pii CPU: 1 PID: 347 Comm: rmmod Not tainted 3.13.0-rc2+ #171 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800b779c420 ti: ffff8800379e0000 task.ti: ffff8800379e0000 RIP: 0010:[] [] __list_del_entry+0x29/0xd0 RSP: 0018:ffff8800379e1dd0 EFLAGS: 00010a83 RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800379c2fd0 RCX: dead000000200200 RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000001 RDI: ffff8800379c2fd0 RBP: ffff8800379e1dd0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800379c2f90 R13: ffff880037839160 R14: 0000000000000000 R15: 00000000013352f0 FS: 00007f1400e34740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f464124c763 CR3: 00000000b68cf000 CR4: 00000000000006e0 Stack: ffff8800379e1df0 ffffffff8155beab 6b6b6b6b6b6b6b2b ffff8800378391c0 ffff8800379e1e18 ffffffff8156499b ffff880037839be0 ffff880037839d20 ffff88003779d3f0 ffff8800379e1e38 ffffffffa003477c ffff88003779d388 Call Trace: [] netif_napi_del+0x1b/0x80 [] free_netdev+0x8b/0x110 [] virtnet_remove+0x7c/0x90 [virtio_net] [] virtio_dev_remove+0x23/0x80 [] __device_release_driver+0x7f/0xf0 [] driver_detach+0xc0/0xd0 [] bus_remove_driver+0x58/0xd0 [] driver_unregister+0x2c/0x50 [] unregister_virtio_driver+0xe/0x10 [] virtio_net_driver_exit+0x10/0x6ce [virtio_net] [] SyS_delete_module+0x172/0x220 [] ? trace_hardirqs_on+0xd/0x10 [] ? __audit_syscall_entry+0x9c/0xf0 [] system_call_fastpath+0x16/0x1b Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 RIP [] __list_del_entry+0x29/0xd0 RSP ---[ end trace d5931cd3f87c9763 ]--- Fixes: 986a4f4d452d (virtio_net: multiqueue support) Cc: Rusty Russell Cc: "Michael S. Tsirkin" Signed-off-by: Andrey Vagin Acked-by: Michael S. Tsirkin Acked-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/virtio_net.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 64cf70247048..5055eb7527c6 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1285,6 +1285,11 @@ static void virtnet_config_changed(struct virtio_device *vdev) static void virtnet_free_queues(struct virtnet_info *vi) { + int i; + + for (i = 0; i < vi->max_queue_pairs; i++) + netif_napi_del(&vi->rq[i].napi); + kfree(vi->rq); kfree(vi->sq); } From c3ac8a134305ed1522d8987e62249a85efb09c98 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 6 Dec 2013 11:36:15 +0100 Subject: [PATCH 13/63] packet: fix send path when running with proto == 0 [ Upstream commit 66e56cd46b93ef407c60adcac62cf33b06119d50 ] Commit e40526cb20b5 introduced a cached dev pointer, that gets hooked into register_prot_hook(), __unregister_prot_hook() to update the device used for the send path. We need to fix this up, as otherwise this will not work with sockets created with protocol = 0, plus with sll_protocol = 0 passed via sockaddr_ll when doing the bind. So instead, assign the pointer directly. The compiler can inline these helper functions automagically. While at it, also assume the cached dev fast-path as likely(), and document this variant of socket creation as it seems it is not widely used (seems not even the author of TX_RING was aware of that in his reference example [1]). Tested with reproducer from e40526cb20b5. [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap#Example Fixes: e40526cb20b5 ("packet: fix use after free race in send path when dev is released") Signed-off-by: Daniel Borkmann Tested-by: Salam Noureddine Tested-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- Documentation/networking/packet_mmap.txt | 10 ++++ net/packet/af_packet.c | 65 +++++++++++++++--------- 2 files changed, 50 insertions(+), 25 deletions(-) diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 23dd80e82b8e..0f4376ec8852 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -123,6 +123,16 @@ Transmission process is similar to capture as shown below. [shutdown] close() --------> destruction of the transmission socket and deallocation of all associated resources. +Socket creation and destruction is also straight forward, and is done +the same way as in capturing described in the previous paragraph: + + int fd = socket(PF_PACKET, mode, 0); + +The protocol can optionally be 0 in case we only want to transmit +via this socket, which avoids an expensive call to packet_rcv(). +In this case, you also need to bind(2) the TX_RING with sll_protocol = 0 +set. Otherwise, htons(ETH_P_ALL) or any other protocol, for example. + Binding the socket to your network interface is mandatory (with zero copy) to know the header size of frames used in the circular buffer. diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index c503ad6f610f..e8b5a0dfca21 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -237,6 +237,30 @@ struct packet_skb_cb { static void __fanout_unlink(struct sock *sk, struct packet_sock *po); static void __fanout_link(struct sock *sk, struct packet_sock *po); +static struct net_device *packet_cached_dev_get(struct packet_sock *po) +{ + struct net_device *dev; + + rcu_read_lock(); + dev = rcu_dereference(po->cached_dev); + if (likely(dev)) + dev_hold(dev); + rcu_read_unlock(); + + return dev; +} + +static void packet_cached_dev_assign(struct packet_sock *po, + struct net_device *dev) +{ + rcu_assign_pointer(po->cached_dev, dev); +} + +static void packet_cached_dev_reset(struct packet_sock *po) +{ + RCU_INIT_POINTER(po->cached_dev, NULL); +} + /* register_prot_hook must be invoked with the po->bind_lock held, * or from a context in which asynchronous accesses to the packet * socket is not possible (packet_create()). @@ -246,12 +270,10 @@ static void register_prot_hook(struct sock *sk) struct packet_sock *po = pkt_sk(sk); if (!po->running) { - if (po->fanout) { + if (po->fanout) __fanout_link(sk, po); - } else { + else dev_add_pack(&po->prot_hook); - rcu_assign_pointer(po->cached_dev, po->prot_hook.dev); - } sock_hold(sk); po->running = 1; @@ -270,12 +292,11 @@ static void __unregister_prot_hook(struct sock *sk, bool sync) struct packet_sock *po = pkt_sk(sk); po->running = 0; - if (po->fanout) { + + if (po->fanout) __fanout_unlink(sk, po); - } else { + else __dev_remove_pack(&po->prot_hook); - RCU_INIT_POINTER(po->cached_dev, NULL); - } __sock_put(sk); @@ -2048,19 +2069,6 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, return tp_len; } -static struct net_device *packet_cached_dev_get(struct packet_sock *po) -{ - struct net_device *dev; - - rcu_read_lock(); - dev = rcu_dereference(po->cached_dev); - if (dev) - dev_hold(dev); - rcu_read_unlock(); - - return dev; -} - static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) { struct sk_buff *skb; @@ -2077,7 +2085,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) mutex_lock(&po->pg_vec_lock); - if (saddr == NULL) { + if (likely(saddr == NULL)) { dev = packet_cached_dev_get(po); proto = po->num; addr = NULL; @@ -2231,7 +2239,7 @@ static int packet_snd(struct socket *sock, * Get and verify the address. */ - if (saddr == NULL) { + if (likely(saddr == NULL)) { dev = packet_cached_dev_get(po); proto = po->num; addr = NULL; @@ -2440,6 +2448,8 @@ static int packet_release(struct socket *sock) spin_lock(&po->bind_lock); unregister_prot_hook(sk, false); + packet_cached_dev_reset(po); + if (po->prot_hook.dev) { dev_put(po->prot_hook.dev); po->prot_hook.dev = NULL; @@ -2495,14 +2505,17 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 protoc spin_lock(&po->bind_lock); unregister_prot_hook(sk, true); + po->num = protocol; po->prot_hook.type = protocol; if (po->prot_hook.dev) dev_put(po->prot_hook.dev); - po->prot_hook.dev = dev; + po->prot_hook.dev = dev; po->ifindex = dev ? dev->ifindex : 0; + packet_cached_dev_assign(po, dev); + if (protocol == 0) goto out_unlock; @@ -2615,7 +2628,8 @@ static int packet_create(struct net *net, struct socket *sock, int protocol, po = pkt_sk(sk); sk->sk_family = PF_PACKET; po->num = proto; - RCU_INIT_POINTER(po->cached_dev, NULL); + + packet_cached_dev_reset(po); sk->sk_destruct = packet_sock_destruct; sk_refcnt_debug_inc(sk); @@ -3369,6 +3383,7 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void sk->sk_error_report(sk); } if (msg == NETDEV_UNREGISTER) { + packet_cached_dev_reset(po); po->ifindex = -1; if (po->prot_hook.dev) dev_put(po->prot_hook.dev); From e28c2d64f8b676b7defe6a3f25aa80269aa49c7e Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Sat, 7 Dec 2013 03:33:45 +0100 Subject: [PATCH 14/63] ipv6: don't count addrconf generated routes against gc limit [ Upstream commit a3300ef4bbb1f1e33ff0400e1e6cf7733d988f4f ] Brett Ciphery reported that new ipv6 addresses failed to get installed because the addrconf generated dsts where counted against the dst gc limit. We don't need to count those routes like we currently don't count administratively added routes. Because the max_addresses check enforces a limit on unbounded address generation first in case someone plays with router advertisments, we are still safe here. Reported-by: Brett Ciphery Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 8a8ae819c1d8..566c1b4b21b8 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2099,12 +2099,10 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev, bool anycast) { struct net *net = dev_net(idev->dev); - struct rt6_info *rt = ip6_dst_alloc(net, net->loopback_dev, 0, NULL); - - if (!rt) { - net_warn_ratelimited("Maximum number of routes reached, consider increasing route/max_size\n"); + struct rt6_info *rt = ip6_dst_alloc(net, net->loopback_dev, + DST_NOCOUNT, NULL); + if (!rt) return ERR_PTR(-ENOMEM); - } in6_dev_hold(idev); From 90af0bf9303883865a8626f23b4b0ed60d4d10dc Mon Sep 17 00:00:00 2001 From: Changli Gao Date: Sun, 8 Dec 2013 09:36:56 -0500 Subject: [PATCH 15/63] net: drop_monitor: fix the value of maxattr [ Upstream commit d323e92cc3f4edd943610557c9ea1bb4bb5056e8 ] maxattr in genl_family should be used to save the max attribute type, but not the max command type. Drop monitor doesn't support any attributes, so we should leave it as zero. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/drop_monitor.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index d23b6682f4e9..a974dfec4bf1 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -64,7 +64,6 @@ static struct genl_family net_drop_monitor_family = { .hdrsize = 0, .name = "NET_DM", .version = 2, - .maxattr = NET_DM_CMD_MAX, }; static DEFINE_PER_CPU(struct per_cpu_dm_data, dm_cpu_data); From d90d9ff6cac64e701f27291ecef12fdc48606bb6 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Sat, 7 Dec 2013 17:26:27 -0500 Subject: [PATCH 16/63] net: unix: allow set_peek_off to fail [ Upstream commit 12663bfc97c8b3fdb292428105dd92d563164050 ] unix_dgram_recvmsg() will hold the readlock of the socket until recv is complete. In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until unix_dgram_recvmsg() will complete (which can take a while) without allowing us to break out of it, triggering a hung task spew. Instead, allow set_peek_off to fail, this way userspace will not hang. Signed-off-by: Sasha Levin Acked-by: Pavel Emelyanov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/net.h | 2 +- net/core/sock.c | 2 +- net/unix/af_unix.c | 8 ++++++-- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index 0c4ae5d94de9..65545ac6fb9c 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -180,7 +180,7 @@ struct proto_ops { int offset, size_t size, int flags); ssize_t (*splice_read)(struct socket *sock, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); - void (*set_peek_off)(struct sock *sk, int val); + int (*set_peek_off)(struct sock *sk, int val); }; #define DECLARE_SOCKADDR(type, dst, src) \ diff --git a/net/core/sock.c b/net/core/sock.c index 6565431b0e6d..50a345e5a26f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -885,7 +885,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname, case SO_PEEK_OFF: if (sock->ops->set_peek_off) - sock->ops->set_peek_off(sk, val); + ret = sock->ops->set_peek_off(sk, val); else ret = -EOPNOTSUPP; break; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 8664ad0d5797..2ee993c7732a 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -529,13 +529,17 @@ static int unix_seqpacket_sendmsg(struct kiocb *, struct socket *, static int unix_seqpacket_recvmsg(struct kiocb *, struct socket *, struct msghdr *, size_t, int); -static void unix_set_peek_off(struct sock *sk, int val) +static int unix_set_peek_off(struct sock *sk, int val) { struct unix_sock *u = unix_sk(sk); - mutex_lock(&u->readlock); + if (mutex_lock_interruptible(&u->readlock)) + return -EINTR; + sk->sk_peek_off = val; mutex_unlock(&u->readlock); + + return 0; } From fa57cd8a526cb76ae372f0963339883f8e9f69bb Mon Sep 17 00:00:00 2001 From: Nat Gurumoorthy Date: Mon, 9 Dec 2013 10:43:21 -0800 Subject: [PATCH 17/63] tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0 [ Upstream commit 388d3335575f4c056dcf7138a30f1454e2145cd8 ] The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120) uninitialized. From power on reset this register may have garbage in it. The Register Base Address register defines the device local address of a register. The data pointed to by this location is read or written using the Register Data register (PCI config offset 128). When REG_BASE_ADDR has garbage any read or write of Register Data Register (PCI 128) will cause the PCI bus to lock up. The TCO watchdog will fire and bring down the system. Signed-off-by: Nat Gurumoorthy Acked-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/tg3.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 36a0b438e65e..11ae0811e4bf 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -16297,6 +16297,9 @@ static int tg3_get_invariants(struct tg3 *tp, const struct pci_device_id *ent) /* Clear this out for sanity. */ tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0); + /* Clear TG3PCI_REG_BASE_ADDR to prevent hangs. */ + tw32(TG3PCI_REG_BASE_ADDR, 0); + pci_read_config_dword(tp->pdev, TG3PCI_PCISTATE, &pci_state_reg); if ((pci_state_reg & PCISTATE_CONV_PCI_MODE) == 0 && From 02cbaec37a24f39e83b51a519147f79921299e24 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Fri, 13 Dec 2013 17:21:27 +0800 Subject: [PATCH 18/63] netvsc: don't flush peers notifying work during setting mtu [ Upstream commit 50dc875f2e6e2e04aed3b3033eb0ac99192d6d02 ] There's a possible deadlock if we flush the peers notifying work during setting mtu: [ 22.991149] ====================================================== [ 22.991173] [ INFO: possible circular locking dependency detected ] [ 22.991198] 3.10.0-54.0.1.el7.x86_64.debug #1 Not tainted [ 22.991219] ------------------------------------------------------- [ 22.991243] ip/974 is trying to acquire lock: [ 22.991261] ((&(&net_device_ctx->dwork)->work)){+.+.+.}, at: [] flush_work+0x5/0x2e0 [ 22.991307] but task is already holding lock: [ 22.991330] (rtnl_mutex){+.+.+.}, at: [] rtnetlink_rcv+0x1b/0x40 [ 22.991367] which lock already depends on the new lock. [ 22.991398] the existing dependency chain (in reverse order) is: [ 22.991426] -> #1 (rtnl_mutex){+.+.+.}: [ 22.991449] [] __lock_acquire+0xb19/0x1260 [ 22.991477] [] lock_acquire+0xa2/0x1f0 [ 22.991501] [] mutex_lock_nested+0x89/0x4f0 [ 22.991529] [] rtnl_lock+0x17/0x20 [ 22.991552] [] netdev_notify_peers+0x12/0x30 [ 22.991579] [] netvsc_send_garp+0x22/0x30 [hv_netvsc] [ 22.991610] [] process_one_work+0x211/0x6e0 [ 22.991637] [] worker_thread+0x11b/0x3a0 [ 22.991663] [] kthread+0xed/0x100 [ 22.991686] [] ret_from_fork+0x7c/0xb0 [ 22.991715] -> #0 ((&(&net_device_ctx->dwork)->work)){+.+.+.}: [ 22.991715] [] check_prevs_add+0x967/0x970 [ 22.991715] [] __lock_acquire+0xb19/0x1260 [ 22.991715] [] lock_acquire+0xa2/0x1f0 [ 22.991715] [] flush_work+0x4e/0x2e0 [ 22.991715] [] __cancel_work_timer+0x95/0x130 [ 22.991715] [] cancel_delayed_work_sync+0x13/0x20 [ 22.991715] [] netvsc_change_mtu+0x84/0x200 [hv_netvsc] [ 22.991715] [] dev_set_mtu+0x34/0x80 [ 22.991715] [] do_setlink+0x23a/0xa00 [ 22.991715] [] rtnl_newlink+0x394/0x5e0 [ 22.991715] [] rtnetlink_rcv_msg+0x9c/0x260 [ 22.991715] [] netlink_rcv_skb+0xa9/0xc0 [ 22.991715] [] rtnetlink_rcv+0x2a/0x40 [ 22.991715] [] netlink_unicast+0xdd/0x190 [ 22.991715] [] netlink_sendmsg+0x337/0x750 [ 22.991715] [] sock_sendmsg+0x99/0xd0 [ 22.991715] [] ___sys_sendmsg+0x39e/0x3b0 [ 22.991715] [] __sys_sendmsg+0x42/0x80 [ 22.991715] [] SyS_sendmsg+0x12/0x20 [ 22.991715] [] system_call_fastpath+0x16/0x1b This is because we hold the rtnl_lock() before ndo_change_mtu() and try to flush the work in netvsc_change_mtu(), in the mean time, netdev_notify_peers() may be called from worker and also trying to hold the rtnl_lock. This will lead the flush won't succeed forever. Solve this by not canceling and flushing the work, this is safe because the transmission done by NETDEV_NOTIFY_PEERS was synchronized with the netif_tx_disable() called by netvsc_change_mtu(). Reported-by: Yaju Cao Tested-by: Yaju Cao Cc: K. Y. Srinivasan Cc: Haiyang Zhang Signed-off-by: Jason Wang Acked-by: Haiyang Zhang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/hyperv/netvsc_drv.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 23a0fff0df52..aea78fc2e48f 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -328,7 +328,6 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) return -EINVAL; nvdev->start_remove = true; - cancel_delayed_work_sync(&ndevctx->dwork); cancel_work_sync(&ndevctx->work); netif_tx_disable(ndev); rndis_filter_device_remove(hdev); From 36ffb5708649eb5215c14548d692406bc287cb24 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Fri, 13 Dec 2013 15:12:27 +0100 Subject: [PATCH 19/63] ipv6: fix illegal mac_header comparison on 32bit Signed-off-by: Hannes Frederic Sowa Signed-off-by: Greg Kroah-Hartman --- include/linux/skbuff.h | 5 +++++ net/ipv6/udp_offload.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 74db47ec09ea..ded45ec6b22b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1741,6 +1741,11 @@ static inline void skb_set_mac_header(struct sk_buff *skb, const int offset) } #endif /* NET_SKBUFF_DATA_USES_OFFSET */ +static inline void skb_pop_mac_header(struct sk_buff *skb) +{ + skb->mac_header = skb->network_header; +} + static inline void skb_probe_transport_header(struct sk_buff *skb, const int offset_hint) { diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index 76f165ef8d49..3696aa28784a 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -85,7 +85,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, /* Check if there is enough headroom to insert fragment header. */ tnl_hlen = skb_tnl_header_len(skb); - if (skb->mac_header < (tnl_hlen + frag_hdr_sz)) { + if (skb_mac_header(skb) < skb->head + tnl_hlen + frag_hdr_sz) { if (gso_pskb_expand_head(skb, tnl_hlen + frag_hdr_sz)) goto out; } From 57bc52eb25cff3ddd096598bf40e9be7673eb83a Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Fri, 13 Dec 2013 10:54:22 -0500 Subject: [PATCH 20/63] net: unix: allow bind to fail on mutex lock [ Upstream commit 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490 ] This is similar to the set_peek_off patch where calling bind while the socket is stuck in unix_dgram_recvmsg() will block and cause a hung task spew after a while. This is also the last place that did a straightforward mutex_lock(), so there shouldn't be any more of these patches. Signed-off-by: Sasha Levin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/unix/af_unix.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 2ee993c7732a..3ca7927520b0 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -717,7 +717,9 @@ static int unix_autobind(struct socket *sock) int err; unsigned int retries = 0; - mutex_lock(&u->readlock); + err = mutex_lock_interruptible(&u->readlock); + if (err) + return err; err = 0; if (u->addr) @@ -876,7 +878,9 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) goto out; addr_len = err; - mutex_lock(&u->readlock); + err = mutex_lock_interruptible(&u->readlock); + if (err) + goto out; err = -EINVAL; if (u->addr) From 85e963b89ad0727eca17bf0ac65a1cbf110b1bb6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Timo=20Ter=C3=A4s?= Date: Mon, 16 Dec 2013 11:02:09 +0200 Subject: [PATCH 21/63] ip_gre: fix msg_name parsing for recvfrom/recvmsg MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 0e3da5bb8da45890b1dc413404e0f978ab71173e ] ipgre_header_parse() needs to parse the tunnel's ip header and it uses mac_header to locate the iphdr. This got broken when gre tunneling was refactored as mac_header is no longer updated to point to iphdr. Introduce skb_pop_mac_header() helper to do the mac_header assignment and use it in ipgre_rcv() to fix msg_name parsing. Bug introduced in commit c54419321455 (GRE: Refactor GRE tunneling code.) Cc: Pravin B Shelar Signed-off-by: Timo Teräs Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/ip_gre.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 64e4e98c8786..828b2e8631e7 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -335,6 +335,7 @@ static int ipgre_rcv(struct sk_buff *skb) iph->saddr, iph->daddr, tpi.key); if (tunnel) { + skb_pop_mac_header(skb); ip_tunnel_rcv(tunnel, skb, &tpi, hdr_len, log_ecn_error); return 0; } From def8361a33908749838138125db0828420546815 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Tue, 17 Dec 2013 00:38:39 +0100 Subject: [PATCH 22/63] net: inet_diag: zero out uninitialized idiag_{src,dst} fields [ Upstream commit b1aac815c0891fe4a55a6b0b715910142227700f ] Jakub reported while working with nlmon netlink sniffer that parts of the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6. That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3]. In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab] memory through this. At least, in udp_dump_one(), we allocate a skb in ... rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL); ... and then pass that to inet_sk_diag_fill() that puts the whole struct inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0], r->id.idiag_dst[0] and leave the rest untouched: r->id.idiag_src[0] = inet->inet_rcv_saddr; r->id.idiag_dst[0] = inet->inet_daddr; struct inet_diag_msg embeds struct inet_diag_sockid that is correctly / fully filled out in IPv6 case, but for IPv4 not. So just zero them out by using plain memset (for this little amount of bytes it's probably not worth the extra check for idiag_family == AF_INET). Similarly, fix also other places where we fill that out. Reported-by: Jakub Zawadzki Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/inet_diag.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 5f648751fce2..31cf54d18221 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -106,6 +106,10 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, r->id.idiag_sport = inet->inet_sport; r->id.idiag_dport = inet->inet_dport; + + memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src)); + memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst)); + r->id.idiag_src[0] = inet->inet_rcv_saddr; r->id.idiag_dst[0] = inet->inet_daddr; @@ -240,12 +244,19 @@ static int inet_twsk_diag_fill(struct inet_timewait_sock *tw, r->idiag_family = tw->tw_family; r->idiag_retrans = 0; + r->id.idiag_if = tw->tw_bound_dev_if; sock_diag_save_cookie(tw, r->id.idiag_cookie); + r->id.idiag_sport = tw->tw_sport; r->id.idiag_dport = tw->tw_dport; + + memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src)); + memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst)); + r->id.idiag_src[0] = tw->tw_rcv_saddr; r->id.idiag_dst[0] = tw->tw_daddr; + r->idiag_state = tw->tw_substate; r->idiag_timer = 3; r->idiag_expires = DIV_ROUND_UP(tmo * 1000, HZ); @@ -732,8 +743,13 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk, r->id.idiag_sport = inet->inet_sport; r->id.idiag_dport = ireq->rmt_port; + + memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src)); + memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst)); + r->id.idiag_src[0] = ireq->loc_addr; r->id.idiag_dst[0] = ireq->rmt_addr; + r->idiag_expires = jiffies_to_msecs(tmo); r->idiag_rqueue = 0; r->idiag_wqueue = 0; From 2173ca37c090930f51f701fdd963b63fe2502789 Mon Sep 17 00:00:00 2001 From: Wenliang Fan Date: Tue, 17 Dec 2013 11:25:28 +0800 Subject: [PATCH 23/63] drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl() [ Upstream commit e9db5c21d3646a6454fcd04938dd215ac3ab620a ] The local variable 'bi' comes from userspace. If userspace passed a large number to 'bi.data.calibrate', there would be an integer overflow in the following line: s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; Signed-off-by: Wenliang Fan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/hamradio/hdlcdrv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/hamradio/hdlcdrv.c b/drivers/net/hamradio/hdlcdrv.c index 3169252613fa..5d78c1d08abd 100644 --- a/drivers/net/hamradio/hdlcdrv.c +++ b/drivers/net/hamradio/hdlcdrv.c @@ -571,6 +571,8 @@ static int hdlcdrv_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) case HDLCDRVCTL_CALIBRATE: if(!capable(CAP_SYS_RAWIO)) return -EPERM; + if (bi.data.calibrate > INT_MAX / s->par.bitrate) + return -EINVAL; s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; return 0; From 8f3843be6e6d1fcc5bcdad42b569a5d4797d9d1f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Salva=20Peir=C3=B3?= Date: Tue, 17 Dec 2013 10:06:30 +0100 Subject: [PATCH 24/63] hamradio/yam: fix info leak in ioctl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 8e3fbf870481eb53b2d3a322d1fc395ad8b367ed ] The yam_ioctl() code fails to initialise the cmd field of the struct yamdrv_ioctl_cfg. Add an explicit memset(0) before filling the structure to avoid the 4-byte info leak. Signed-off-by: Salva Peiró Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/hamradio/yam.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c index 0721e72f9299..82529a2658d7 100644 --- a/drivers/net/hamradio/yam.c +++ b/drivers/net/hamradio/yam.c @@ -1058,6 +1058,7 @@ static int yam_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) break; case SIOCYAMGCFG: + memset(&yi, 0, sizeof(yi)); yi.cfg.mask = 0xffffffff; yi.cfg.iobase = yp->iobase; yi.cfg.irq = yp->irq; From cf3daa7cbcf9ac14a7549239d7ff3464138a79c8 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 19 Dec 2013 10:53:02 -0800 Subject: [PATCH 25/63] net: fec: fix potential use after free [ Upstream commit 7a2a84518cfb263d2c4171b3d63671f88316adb2 ] skb_tx_timestamp(skb) should be called _before_ TX completion has a chance to trigger, otherwise it is too late and we access freed memory. Signed-off-by: Eric Dumazet Fixes: de5fb0a05348 ("net: fec: put tx to napi poll function to fix dead lock") Cc: Frank Li Cc: Richard Cochran Acked-by: Richard Cochran Acked-by: Frank Li Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/freescale/fec_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index d48099f03b7f..fbd0d7df67d8 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -371,6 +371,8 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev) else bdp = fec_enet_get_nextdesc(bdp, fep->bufdesc_ex); + skb_tx_timestamp(skb); + fep->cur_tx = bdp; if (fep->cur_tx == fep->dirty_tx) @@ -379,8 +381,6 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev) /* Trigger transmission start */ writel(0, fep->hwp + FEC_X_DES_ACTIVE); - skb_tx_timestamp(skb); - return NETDEV_TX_OK; } From 717e66b3375ed5ea69e8cea4352aeacc64007499 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Thu, 19 Dec 2013 12:40:26 +0800 Subject: [PATCH 26/63] ipv6: always set the new created dst's from in ip6_rt_copy [ Upstream commit 24f5b855e17df7e355eacd6c4a12cc4d6a6c9ff0 ] ip6_rt_copy only sets dst.from if ort has flag RTF_ADDRCONF and RTF_DEFAULT. but the prefix routes which did get installed by hand locally can have an expiration, and no any flag combination which can ensure a potential from does never expire, so we should always set the new created dst's from. This also fixes the new created dst is always expired since the ort, which is created by RA, maybe has RTF_EXPIRES and RTF_ADDRCONF, but no RTF_DEFAULT. Suggested-by: Hannes Frederic Sowa CC: Gao feng Signed-off-by: Li RongQing Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 566c1b4b21b8..6c389300f4e9 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1838,9 +1838,7 @@ static struct rt6_info *ip6_rt_copy(struct rt6_info *ort, else rt->rt6i_gateway = *dest; rt->rt6i_flags = ort->rt6i_flags; - if ((ort->rt6i_flags & (RTF_DEFAULT | RTF_ADDRCONF)) == - (RTF_DEFAULT | RTF_ADDRCONF)) - rt6_set_from(rt, ort); + rt6_set_from(rt, ort); rt->rt6i_metric = 0; #ifdef CONFIG_IPV6_SUBTREES From 1b99874ddd5197368486a48d62916a1bd33b0fc6 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Wed, 18 Dec 2013 23:49:42 -0500 Subject: [PATCH 27/63] rds: prevent dereference of a NULL device [ Upstream commit c2349758acf1874e4c2b93fe41d072336f1a31d0 ] Binding might result in a NULL device, which is dereferenced causing this BUG: [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097 4 [ 1317.261847] IP: [] rds_ib_laddr_check+0x82/0x110 [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0 [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 1317.264179] Dumping ftrace buffer: [ 1317.264774] (ftrace buffer empty) [ 1317.265220] Modules linked in: [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G W 3.13.0-rc4- next-20131218-sasha-00013-g2cebb9b-dirty #4159 [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000 [ 1317.268399] RIP: 0010:[] [] rds_ib_laddr_check+ 0x82/0x110 [ 1317.269670] RSP: 0000:ffff8803cd31bdf8 EFLAGS: 00010246 [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000 [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286 [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000 [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031 [ 1317.270230] FS: 00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000 0000 [ 1317.270230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0 [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602 [ 1317.270230] Stack: [ 1317.270230] 0000000054086700 5408670000a25de0 5408670000000002 0000000000000000 [ 1317.270230] ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160 [ 1317.270230] ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280 [ 1317.270230] Call Trace: [ 1317.270230] [] ? rds_trans_get_preferred+0x42/0xa0 [ 1317.270230] [] rds_trans_get_preferred+0x56/0xa0 [ 1317.270230] [] rds_bind+0x73/0xf0 [ 1317.270230] [] SYSC_bind+0x92/0xf0 [ 1317.270230] [] ? context_tracking_user_exit+0xb8/0x1d0 [ 1317.270230] [] ? trace_hardirqs_on+0xd/0x10 [ 1317.270230] [] ? syscall_trace_enter+0x32/0x290 [ 1317.270230] [] SyS_bind+0xe/0x10 [ 1317.270230] [] tracesys+0xdd/0xe2 [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00 89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7 4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02 [ 1317.270230] RIP [] rds_ib_laddr_check+0x82/0x110 [ 1317.270230] RSP [ 1317.270230] CR2: 0000000000000974 Signed-off-by: Sasha Levin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rds/ib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/rds/ib.c b/net/rds/ib.c index b4c8b0022fee..ba2dffeff608 100644 --- a/net/rds/ib.c +++ b/net/rds/ib.c @@ -338,7 +338,8 @@ static int rds_ib_laddr_check(__be32 addr) ret = rdma_bind_addr(cm_id, (struct sockaddr *)&sin); /* due to this, we will claim to support iWARP devices unless we check node_type. */ - if (ret || cm_id->device->node_type != RDMA_NODE_IB_CA) + if (ret || !cm_id->device || + cm_id->device->node_type != RDMA_NODE_IB_CA) ret = -EADDRNOTAVAIL; rdsdebug("addr %pI4 ret %d node type %d\n", From 05df3bfbc832046a774fce22654cc55260cd1583 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 23 Dec 2013 00:32:31 +0100 Subject: [PATCH 28/63] net: rose: restore old recvmsg behavior [ Upstream commit f81152e35001e91997ec74a7b4e040e6ab0acccf ] recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen. After commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic), we now always take the else branch due to namelen being initialized to 0. Digging in netdev-vger-cvs git repo shows that msg_namelen was initialized with a fixed-size since at least 1995, so the else branch was never taken. Compile tested only. Signed-off-by: Florian Westphal Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rose/af_rose.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c index abf0ad6311d0..27e6896705e6 100644 --- a/net/rose/af_rose.c +++ b/net/rose/af_rose.c @@ -1253,6 +1253,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock, if (msg->msg_name) { struct sockaddr_rose *srose; + struct full_sockaddr_rose *full_srose = msg->msg_name; memset(msg->msg_name, 0, sizeof(struct full_sockaddr_rose)); srose = msg->msg_name; @@ -1260,18 +1261,9 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock, srose->srose_addr = rose->dest_addr; srose->srose_call = rose->dest_call; srose->srose_ndigis = rose->dest_ndigis; - if (msg->msg_namelen >= sizeof(struct full_sockaddr_rose)) { - struct full_sockaddr_rose *full_srose = (struct full_sockaddr_rose *)msg->msg_name; - for (n = 0 ; n < rose->dest_ndigis ; n++) - full_srose->srose_digis[n] = rose->dest_digis[n]; - msg->msg_namelen = sizeof(struct full_sockaddr_rose); - } else { - if (rose->dest_ndigis >= 1) { - srose->srose_ndigis = 1; - srose->srose_digi = rose->dest_digis[0]; - } - msg->msg_namelen = sizeof(struct sockaddr_rose); - } + for (n = 0 ; n < rose->dest_ndigis ; n++) + full_srose->srose_digis[n] = rose->dest_digis[n]; + msg->msg_namelen = sizeof(struct full_sockaddr_rose); } skb_free_datagram(sk, skb); From c726095ec74daabd48a9a4ed48d46304601017b4 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Tue, 31 Dec 2013 16:23:35 -0500 Subject: [PATCH 29/63] vlan: Fix header ops passthru when doing TX VLAN offload. [ Upstream commit 2205369a314e12fcec4781cc73ac9c08fc2b47de ] When the vlan code detects that the real device can do TX VLAN offloads in hardware, it tries to arrange for the real device's header_ops to be invoked directly. But it does so illegally, by simply hooking the real device's header_ops up to the VLAN device. This doesn't work because we will end up invoking a set of header_ops routines which expect a device type which matches the real device, but will see a VLAN device instead. Fix this by providing a pass-thru set of header_ops which will arrange to pass the proper real device instead. To facilitate this add a dev_rebuild_header(). There are implementations which provide a ->cache and ->create but not a ->rebuild (f.e. PLIP). So we need a helper function just like dev_hard_header() to avoid crashes. Use this helper in the one existing place where the header_ops->rebuild was being invoked, the neighbour code. With lots of help from Florian Westphal. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/netdevice.h | 9 +++++++++ net/8021q/vlan_dev.c | 19 ++++++++++++++++++- net/core/neighbour.c | 2 +- 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 96e4c21e15e0..abf7756eaf9e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1772,6 +1772,15 @@ static inline int dev_parse_header(const struct sk_buff *skb, return dev->header_ops->parse(skb, haddr); } +static inline int dev_rebuild_header(struct sk_buff *skb) +{ + const struct net_device *dev = skb->dev; + + if (!dev->header_ops || !dev->header_ops->rebuild) + return 0; + return dev->header_ops->rebuild(skb); +} + typedef int gifconf_func_t(struct net_device * dev, char __user * bufptr, int len); extern int register_gifconf(unsigned int family, gifconf_func_t * gifconf); static inline int unregister_gifconf(unsigned int family) diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 1cd3d2a406f5..4af64afc7022 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -549,6 +549,23 @@ static const struct header_ops vlan_header_ops = { .parse = eth_header_parse, }; +static int vlan_passthru_hard_header(struct sk_buff *skb, struct net_device *dev, + unsigned short type, + const void *daddr, const void *saddr, + unsigned int len) +{ + struct vlan_dev_priv *vlan = vlan_dev_priv(dev); + struct net_device *real_dev = vlan->real_dev; + + return dev_hard_header(skb, real_dev, type, daddr, saddr, len); +} + +static const struct header_ops vlan_passthru_header_ops = { + .create = vlan_passthru_hard_header, + .rebuild = dev_rebuild_header, + .parse = eth_header_parse, +}; + static struct device_type vlan_type = { .name = "vlan", }; @@ -592,7 +609,7 @@ static int vlan_dev_init(struct net_device *dev) dev->needed_headroom = real_dev->needed_headroom; if (real_dev->features & NETIF_F_HW_VLAN_CTAG_TX) { - dev->header_ops = real_dev->header_ops; + dev->header_ops = &vlan_passthru_header_ops; dev->hard_header_len = real_dev->hard_header_len; } else { dev->header_ops = &vlan_header_ops; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 0034b611fa5e..49aeab86f317 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1274,7 +1274,7 @@ int neigh_compat_output(struct neighbour *neigh, struct sk_buff *skb) if (dev_hard_header(skb, dev, ntohs(skb->protocol), NULL, NULL, skb->len) < 0 && - dev->header_ops->rebuild(skb)) + dev_rebuild_header(skb)) return 0; return dev_queue_xmit(skb); From e32904ec6251df8e552074c9eb068606955d894c Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Thu, 26 Dec 2013 15:32:47 +0200 Subject: [PATCH 30/63] virtio_net: fix error handling for mergeable buffers Eric Dumazet noticed that if we encounter an error when processing a mergeable buffer, we don't dequeue all of the buffers from this packet, the result is almost sure to be loss of networking. Fix this issue. Cc: Rusty Russell Cc: Michael Dalton Acked-by: Michael Dalton Cc: Eric Dumazet Cc: Jason Wang Cc: David S. Miller Signed-off-by: Michael S. Tsirkin (cherry picked from commit 8fc3b9e9a229778e5af3aa453c44f1a3857ba769) Acked-by: Jason Wang Signed-off-by: Greg Kroah-Hartman --- drivers/net/virtio_net.c | 66 ++++++++++++++++++++++++++++------------ 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 5055eb7527c6..ad9b385e1f3d 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -294,26 +294,33 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq, return skb; } -static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb) +static struct sk_buff *receive_mergeable(struct net_device *dev, + struct receive_queue *rq, + void *buf, + unsigned int len) { - struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb); - struct page *page; - int num_buf, i, len; + struct skb_vnet_hdr *hdr = page_address(buf); + int num_buf = hdr->mhdr.num_buffers; + struct page *page = buf; + struct sk_buff *skb = page_to_skb(rq, page, len); + int i; + + if (unlikely(!skb)) + goto err_skb; - num_buf = hdr->mhdr.num_buffers; while (--num_buf) { i = skb_shinfo(skb)->nr_frags; if (i >= MAX_SKB_FRAGS) { pr_debug("%s: packet too long\n", skb->dev->name); skb->dev->stats.rx_length_errors++; - return -EINVAL; + return NULL; } page = virtqueue_get_buf(rq->vq, &len); if (!page) { - pr_debug("%s: rx error: %d buffers missing\n", - skb->dev->name, hdr->mhdr.num_buffers); - skb->dev->stats.rx_length_errors++; - return -EINVAL; + pr_debug("%s: rx error: %d buffers %d missing\n", + dev->name, hdr->mhdr.num_buffers, num_buf); + dev->stats.rx_length_errors++; + goto err_buf; } if (len > PAGE_SIZE) @@ -323,7 +330,25 @@ static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb) --rq->num; } - return 0; + return skb; +err_skb: + give_pages(rq, page); + while (--num_buf) { + buf = virtqueue_get_buf(rq->vq, &len); + if (unlikely(!buf)) { + pr_debug("%s: rx error: %d buffers missing\n", + dev->name, num_buf); + dev->stats.rx_length_errors++; + break; + } + page = buf; + give_pages(rq, page); + --rq->num; + } +err_buf: + dev->stats.rx_dropped++; + dev_kfree_skb(skb); + return NULL; } static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) @@ -351,17 +376,18 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) skb_trim(skb, len); } else { page = buf; - skb = page_to_skb(rq, page, len); - if (unlikely(!skb)) { - dev->stats.rx_dropped++; - give_pages(rq, page); - return; - } - if (vi->mergeable_rx_bufs) - if (receive_mergeable(rq, skb)) { - dev_kfree_skb(skb); + if (vi->mergeable_rx_bufs) { + skb = receive_mergeable(dev, rq, page, len); + if (unlikely(!skb)) + return; + } else { + skb = page_to_skb(rq, page, len); + if (unlikely(!skb)) { + dev->stats.rx_dropped++; + give_pages(rq, page); return; } + } } hdr = skb_vnet_hdr(skb); From 59d2a52eb56dcdaec3c81c456bf408cbab13bde6 Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Thu, 26 Dec 2013 15:32:51 +0200 Subject: [PATCH 31/63] virtio-net: make all RX paths handle errors consistently receive mergeable now handles errors internally. Do same for big and small packet paths, otherwise the logic is too hard to follow. Cc: Jason Wang Cc: David S. Miller Acked-by: Michael Dalton Signed-off-by: Michael S. Tsirkin (cherry picked from commit f121159d72091f25afb22007c833e60a6845e912) Acked-by: Jason Wang Signed-off-by: Greg Kroah-Hartman --- drivers/net/virtio_net.c | 56 ++++++++++++++++++++++++++-------------- 1 file changed, 36 insertions(+), 20 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ad9b385e1f3d..ec0e9f236ff0 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -294,6 +294,34 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq, return skb; } +static struct sk_buff *receive_small(void *buf, unsigned int len) +{ + struct sk_buff * skb = buf; + + len -= sizeof(struct virtio_net_hdr); + skb_trim(skb, len); + + return skb; +} + +static struct sk_buff *receive_big(struct net_device *dev, + struct receive_queue *rq, + void *buf) +{ + struct page *page = buf; + struct sk_buff *skb = page_to_skb(rq, page, 0); + + if (unlikely(!skb)) + goto err; + + return skb; + +err: + dev->stats.rx_dropped++; + give_pages(rq, page); + return NULL; +} + static struct sk_buff *receive_mergeable(struct net_device *dev, struct receive_queue *rq, void *buf, @@ -357,7 +385,6 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) struct net_device *dev = vi->dev; struct virtnet_stats *stats = this_cpu_ptr(vi->stats); struct sk_buff *skb; - struct page *page; struct skb_vnet_hdr *hdr; if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) { @@ -369,26 +396,15 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) dev_kfree_skb(buf); return; } + if (vi->mergeable_rx_bufs) + skb = receive_mergeable(dev, rq, buf, len); + else if (vi->big_packets) + skb = receive_big(dev, rq, buf); + else + skb = receive_small(buf, len); - if (!vi->mergeable_rx_bufs && !vi->big_packets) { - skb = buf; - len -= sizeof(struct virtio_net_hdr); - skb_trim(skb, len); - } else { - page = buf; - if (vi->mergeable_rx_bufs) { - skb = receive_mergeable(dev, rq, page, len); - if (unlikely(!skb)) - return; - } else { - skb = page_to_skb(rq, page, len); - if (unlikely(!skb)) { - dev->stats.rx_dropped++; - give_pages(rq, page); - return; - } - } - } + if (unlikely(!skb)) + return; hdr = skb_vnet_hdr(skb); From fcd8e312578603626f0d78084d7ec2e444489aa4 Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Thu, 26 Dec 2013 15:32:55 +0200 Subject: [PATCH 32/63] virtio_net: don't leak memory or block when too many frags We leak an skb when there are too many frags, we also stop processing the packet in the middle, the result is almost sure to be loss of networking. Reported-by: Michael Dalton Acked-by: Michael Dalton Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/net/virtio_net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ec0e9f236ff0..68692af1f488 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -341,7 +341,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, if (i >= MAX_SKB_FRAGS) { pr_debug("%s: packet too long\n", skb->dev->name); skb->dev->stats.rx_length_errors++; - return NULL; + goto err_frags; } page = virtqueue_get_buf(rq->vq, &len); if (!page) { @@ -362,6 +362,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, err_skb: give_pages(rq, page); while (--num_buf) { +err_frags: buf = virtqueue_get_buf(rq->vq, &len); if (unlikely(!buf)) { pr_debug("%s: rx error: %d buffers missing\n", From 6a0827c4051600d836b16ed9cb6caf78b26573bc Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Mon, 30 Dec 2013 11:34:40 +0800 Subject: [PATCH 33/63] virtio-net: fix refill races during restore [ Upstream commit 6cd4ce0099da7702f885b6fa9ebb49e3831d90b4 ] During restoring, try_fill_recv() was called with neither napi lock nor napi disabled. This can lead two try_fill_recv() was called in the same time. Fix this by refilling before trying to enable napi. Fixes 0741bcb5584f9e2390ae6261573c4de8314999f2 (virtio: net: Add freeze, restore handlers to support S4). Cc: Amit Shah Cc: Rusty Russell Cc: Michael S. Tsirkin Cc: Eric Dumazet Signed-off-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/virtio_net.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 68692af1f488..a0c05e07feeb 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1745,16 +1745,17 @@ static int virtnet_restore(struct virtio_device *vdev) if (err) return err; - if (netif_running(vi->dev)) + if (netif_running(vi->dev)) { + for (i = 0; i < vi->curr_queue_pairs; i++) + if (!try_fill_recv(&vi->rq[i], GFP_KERNEL)) + schedule_delayed_work(&vi->refill, 0); + for (i = 0; i < vi->max_queue_pairs; i++) virtnet_napi_enable(&vi->rq[i]); + } netif_device_attach(vi->dev); - for (i = 0; i < vi->curr_queue_pairs; i++) - if (!try_fill_recv(&vi->rq[i], GFP_KERNEL)) - schedule_delayed_work(&vi->refill, 0); - mutex_lock(&vi->config_lock); vi->config_enable = true; mutex_unlock(&vi->config_lock); From e7c28ed53545ee8972baf31c89a4e801be228e99 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 30 Dec 2013 23:40:50 +0100 Subject: [PATCH 34/63] net: llc: fix use after free in llc_ui_recvmsg [ Upstream commit 4d231b76eef6c4a6bd9c96769e191517765942cb ] While commit 30a584d944fb fixes datagram interface in LLC, a use after free bug has been introduced for SOCK_STREAM sockets that do not make use of MSG_PEEK. The flow is as follow ... if (!(flags & MSG_PEEK)) { ... sk_eat_skb(sk, skb, false); ... } ... if (used + offset < skb->len) continue; ... where sk_eat_skb() calls __kfree_skb(). Therefore, cache original length and work on skb_len to check partial reads. Fixes: 30a584d944fb ("[LLX]: SOCK_DGRAM interface fixes") Signed-off-by: Daniel Borkmann Cc: Stephen Hemminger Cc: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/llc/af_llc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c index 88709882c464..c3ee80547066 100644 --- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -715,7 +715,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock, unsigned long cpu_flags; size_t copied = 0; u32 peek_seq = 0; - u32 *seq; + u32 *seq, skb_len; unsigned long used; int target; /* Read at least this many bytes */ long timeo; @@ -812,6 +812,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock, } continue; found_ok_skb: + skb_len = skb->len; /* Ok so how much can we use? */ used = skb->len - offset; if (len < used) @@ -844,7 +845,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock, } /* Partial read */ - if (used + offset < skb->len) + if (used + offset < skb_len) continue; } while (len > 0); From 50ba56ca77a62c3bc36ba9bde7bffecc63c30477 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Thu, 2 Jan 2014 19:50:52 -0500 Subject: [PATCH 35/63] netpoll: Fix missing TXQ unlock and and OOPS. [ Upstream commit aca5f58f9ba803ec8c2e6bcf890db17589e8dfcc ] The VLAN tag handling code in netpoll_send_skb_on_dev() has two problems. 1) It exits without unlocking the TXQ. 2) It then tries to queue a NULL skb to npinfo->txq. Reported-by: Ahmed Tamrawi Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/netpoll.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/core/netpoll.c b/net/core/netpoll.c index b04f73847eda..27f33f25cda8 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -386,8 +386,14 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, !vlan_hw_offload_capable(netif_skb_features(skb), skb->vlan_proto)) { skb = __vlan_put_tag(skb, skb->vlan_proto, vlan_tx_tag_get(skb)); - if (unlikely(!skb)) - break; + if (unlikely(!skb)) { + /* This is actually a packet drop, but we + * don't want the code at the end of this + * function to try and re-queue a NULL skb. + */ + status = NETDEV_TX_OK; + goto unlock_txq; + } skb->vlan_tci = 0; } @@ -395,6 +401,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, if (status == NETDEV_TX_OK) txq_trans_update(txq); } + unlock_txq: __netif_tx_unlock(txq); if (status == NETDEV_TX_OK) From dfe2d46afe3448ca78e251ff56ff9c6dd565076b Mon Sep 17 00:00:00 2001 From: Curt Brune Date: Mon, 6 Jan 2014 11:00:32 -0800 Subject: [PATCH 36/63] bridge: use spin_lock_bh() in br_multicast_set_hash_max [ Upstream commit fe0d692bbc645786bce1a98439e548ae619269f5 ] br_multicast_set_hash_max() is called from process context in net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function. br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock), which can deadlock the CPU if a softirq that also tries to take the same lock interrupts br_multicast_set_hash_max() while the lock is held . This can happen quite easily when any of the bridge multicast timers expire, which try to take the same lock. The fix here is to use spin_lock_bh(), preventing other softirqs from executing on this CPU. Steps to reproduce: 1. Create a bridge with several interfaces (I used 4). 2. Set the "multicast query interval" to a low number, like 2. 3. Enable the bridge as a multicast querier. 4. Repeatedly set the bridge hash_max parameter via sysfs. # brctl addbr br0 # brctl addif br0 eth1 eth2 eth3 eth4 # brctl setmcqi br0 2 # brctl setmcquerier br0 1 # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done Signed-off-by: Curt Brune Signed-off-by: Scott Feldman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/bridge/br_multicast.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index d82058f6fc79..2a180a380181 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1839,7 +1839,7 @@ int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val) u32 old; struct net_bridge_mdb_htable *mdb; - spin_lock(&br->multicast_lock); + spin_lock_bh(&br->multicast_lock); if (!netif_running(br->dev)) goto unlock; @@ -1871,7 +1871,7 @@ int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val) } unlock: - spin_unlock(&br->multicast_lock); + spin_unlock_bh(&br->multicast_lock); return err; } From 1e42fa04afb0dae65292683a5dcad85572ab7553 Mon Sep 17 00:00:00 2001 From: Simon Horman Date: Sun, 19 May 2013 15:46:49 +0000 Subject: [PATCH 37/63] net: Loosen constraints for recalculating checksum in skb_segment() [ Upstream commit 1cdbcb7957cf9e5f841dbcde9b38fd18a804208b ] This is a generic solution to resolve a specific problem that I have observed. If the encapsulation of an skb changes then ability to offload checksums may also change. In particular it may be necessary to perform checksumming in software. An example of such a case is where a non-GRE packet is received but is to be encapsulated and transmitted as GRE. Another example relates to my proposed support for for packets that are non-MPLS when received but MPLS when transmitted. The cost of this change is that the value of the csum variable may be checked when it previously was not. In the case where the csum variable is true this is pure overhead. In the case where the csum variable is false it leads to software checksumming, which I believe also leads to correct checksums in transmitted packets for the cases described above. Further analysis: This patch relies on the return value of can_checksum_protocol() being correct and in turn the return value of skb_network_protocol(), used to provide the protocol parameter of can_checksum_protocol(), being correct. It also relies on the features passed to skb_segment() and in turn to can_checksum_protocol() being correct. I believe that this problem has not been observed for VLANs because it appears that almost all drivers, the exception being xgbe, set vlan_features such that that the checksum offload support for VLAN packets is greater than or equal to that of non-VLAN packets. I wonder if the code in xgbe may be an oversight and the hardware does support checksumming of VLAN packets. If so it may be worth updating the vlan_features of the driver as this patch will force such checksums to be performed in software rather than hardware. Signed-off-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index d9e8736bcdc1..c35b81b80fe2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2854,7 +2854,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) doffset + tnl_hlen); if (fskb != skb_shinfo(skb)->frag_list) - continue; + goto perform_csum_check; if (!sg) { nskb->ip_summed = CHECKSUM_NONE; @@ -2918,6 +2918,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) nskb->len += nskb->data_len; nskb->truesize += nskb->data_len; +perform_csum_check: if (!csum) { nskb->csum = skb_checksum(nskb, doffset, nskb->len - doffset, 0); From 4adf2b4bd303e73dfc13c3428383a8dba129fdb5 Mon Sep 17 00:00:00 2001 From: Russell King Date: Sun, 29 Dec 2013 12:39:50 +0000 Subject: [PATCH 38/63] ARM: fix footbridge clockevent device commit 4ff859fe1dc0da0f87bbdfff78f527898878fa4a upstream. The clockevents code was being told that the footbridge clock event device ticks at 16x the rate which it actually does. This leads to timekeeping problems since it allows the clocksource to wrap before the kernel notices. Fix this by using the correct clock. Fixes: 4e8d76373c9fd ("ARM: footbridge: convert to clockevents/clocksource") Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-footbridge/dc21285-timer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-footbridge/dc21285-timer.c b/arch/arm/mach-footbridge/dc21285-timer.c index 9ee78f7b4990..782f6c71fa0a 100644 --- a/arch/arm/mach-footbridge/dc21285-timer.c +++ b/arch/arm/mach-footbridge/dc21285-timer.c @@ -96,11 +96,12 @@ static struct irqaction footbridge_timer_irq = { void __init footbridge_timer_init(void) { struct clock_event_device *ce = &ckevt_dc21285; + unsigned rate = DIV_ROUND_CLOSEST(mem_fclk_21285, 16); - clocksource_register_hz(&cksrc_dc21285, (mem_fclk_21285 + 8) / 16); + clocksource_register_hz(&cksrc_dc21285, rate); setup_irq(ce->irq, &footbridge_timer_irq); ce->cpumask = cpumask_of(smp_processor_id()); - clockevents_config_and_register(ce, mem_fclk_21285, 0x4, 0xffffff); + clockevents_config_and_register(ce, rate, 0x4, 0xffffff); } From 167ce5e7b14e282c645ed8b78971420958cd5665 Mon Sep 17 00:00:00 2001 From: Russell King Date: Fri, 3 Jan 2014 15:01:39 +0000 Subject: [PATCH 39/63] ARM: fix "bad mode in ... handler" message for undefined instructions commit 29c350bf28da333e41e30497b649fe335712a2ab upstream. The array was missing the final entry for the undefined instruction exception handler; this commit adds it. Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman --- arch/arm/kernel/traps.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 6b9567e19bdc..d6a0fdb6c2ee 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -35,7 +35,13 @@ #include #include -static const char *handler[]= { "prefetch abort", "data abort", "address exception", "interrupt" }; +static const char *handler[]= { + "prefetch abort", + "data abort", + "address exception", + "interrupt", + "undefined instruction", +}; void *vectors_page; From e2a9f5890cf777ed26a1d3f693f79992671662a6 Mon Sep 17 00:00:00 2001 From: Abhilash Kesavan Date: Thu, 12 Dec 2013 08:32:02 +0530 Subject: [PATCH 40/63] ARM: dts: exynos5250: Fix MDMA0 clock number commit 8777539479abd7b3efeb691685415dc2b057d0e0 upstream. Due to incorrect clock specified in MDMA0 node, using MDMA0 controller could cause system failures, due to wrong clock being controlled. This patch fixes this by specifying correct clock. Signed-off-by: Abhilash Kesavan Acked-by: Mike Turquette [t.figa: Corrected commit message and description.] Signed-off-by: Tomasz Figa Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/exynos5250.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/exynos5250.dtsi b/arch/arm/boot/dts/exynos5250.dtsi index fc9fb3d526e2..cdbdc4dfef22 100644 --- a/arch/arm/boot/dts/exynos5250.dtsi +++ b/arch/arm/boot/dts/exynos5250.dtsi @@ -545,7 +545,7 @@ mdma0: mdma@10800000 { compatible = "arm,pl330", "arm,primecell"; reg = <0x10800000 0x1000>; interrupts = <0 33 0>; - clocks = <&clock 271>; + clocks = <&clock 346>; clock-names = "apb_pclk"; #dma-cells = <1>; #dma-channels = <8>; From a713b10e08e6d9214d506a55c39b9546b8a0840b Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Mon, 16 Dec 2013 19:16:08 +0100 Subject: [PATCH 41/63] ARM: shmobile: kzm9g: Fix coherent DMA mask commit 4f387323853c495ac589210832fad4503f75a0e7 upstream. Commit 4dcfa60071b3d23f0181f27d8519f12e37cefbb9 ("ARM: DMA-API: better handing of DMA masks for coherent allocations") added an additional check to the coherent DMA mask that results in an error when the mask is larger than what dma_addr_t can address. Set the LCDC coherent DMA mask to DMA_BIT_MASK(32) instead of ~0 to fix the problem. Signed-off-by: Laurent Pinchart Signed-off-by: Simon Horman Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-shmobile/board-kzm9g.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/mach-shmobile/board-kzm9g.c b/arch/arm/mach-shmobile/board-kzm9g.c index e6b775a10aad..4d610e6ea987 100644 --- a/arch/arm/mach-shmobile/board-kzm9g.c +++ b/arch/arm/mach-shmobile/board-kzm9g.c @@ -332,7 +332,7 @@ static struct platform_device lcdc_device = { .resource = lcdc_resources, .dev = { .platform_data = &lcdc_info, - .coherent_dma_mask = ~0, + .coherent_dma_mask = DMA_BIT_MASK(32), }, }; From 642e26154d7d6dc384149e0a8f1141b1fa1e6f94 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Mon, 16 Dec 2013 19:16:07 +0100 Subject: [PATCH 42/63] ARM: shmobile: armadillo: Fix coherent DMA mask commit dcd740b645003b866d7eb30d13d34d0729cce9db upstream. Commit 4dcfa60071b3d23f0181f27d8519f12e37cefbb9 ("ARM: DMA-API: better handing of DMA masks for coherent allocations") added an additional check to the coherent DMA mask that results in an error when the mask is larger than what dma_addr_t can address. Set the LCDC coherent DMA mask to DMA_BIT_MASK(32) instead of ~0 to fix the problem. Signed-off-by: Laurent Pinchart Signed-off-by: Simon Horman Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-shmobile/board-armadillo800eva.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-shmobile/board-armadillo800eva.c b/arch/arm/mach-shmobile/board-armadillo800eva.c index b85b2882dbd0..803c9fcfb359 100644 --- a/arch/arm/mach-shmobile/board-armadillo800eva.c +++ b/arch/arm/mach-shmobile/board-armadillo800eva.c @@ -437,7 +437,7 @@ static struct platform_device lcdc0_device = { .id = 0, .dev = { .platform_data = &lcdc0_info, - .coherent_dma_mask = ~0, + .coherent_dma_mask = DMA_BIT_MASK(32), }, }; @@ -534,7 +534,7 @@ static struct platform_device hdmi_lcdc_device = { .id = 1, .dev = { .platform_data = &hdmi_lcdc_info, - .coherent_dma_mask = ~0, + .coherent_dma_mask = DMA_BIT_MASK(32), }, }; From b251b2f4a378b977f7d203e6a269b2fe28af3330 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Mon, 16 Dec 2013 19:16:09 +0100 Subject: [PATCH 43/63] ARM: shmobile: mackerel: Fix coherent DMA mask commit b6328a6b7ba57fc84c38248f6f0e387e1170f1a8 upstream. Commit 4dcfa60071b3d23f0181f27d8519f12e37cefbb9 ("ARM: DMA-API: better handing of DMA masks for coherent allocations") added an additional check to the coherent DMA mask that results in an error when the mask is larger than what dma_addr_t can address. Set the LCDC coherent DMA mask to DMA_BIT_MASK(32) instead of ~0 to fix the problem. Signed-off-by: Laurent Pinchart Signed-off-by: Simon Horman Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-shmobile/board-mackerel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-shmobile/board-mackerel.c b/arch/arm/mach-shmobile/board-mackerel.c index fa3407da682a..3b917bc76a49 100644 --- a/arch/arm/mach-shmobile/board-mackerel.c +++ b/arch/arm/mach-shmobile/board-mackerel.c @@ -421,7 +421,7 @@ static struct platform_device lcdc_device = { .resource = lcdc_resources, .dev = { .platform_data = &lcdc_info, - .coherent_dma_mask = ~0, + .coherent_dma_mask = DMA_BIT_MASK(32), }, }; @@ -497,7 +497,7 @@ static struct platform_device hdmi_lcdc_device = { .id = 1, .dev = { .platform_data = &hdmi_lcdc_info, - .coherent_dma_mask = ~0, + .coherent_dma_mask = DMA_BIT_MASK(32), }, }; From e45410185006f748b0d48292550cfad2e37f47a5 Mon Sep 17 00:00:00 2001 From: Ilia Mirkin Date: Sun, 5 Jan 2014 20:07:02 -0500 Subject: [PATCH 44/63] drm/nouveau/bios: make jump conditional MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 6d60792ec059d9f2139828f9f017679abb81aa73 upstream. This fixes a hang in VBIOS scripts of the form "condition; jump". The jump used to always be executed, while now it will only be executed if the condition is true. See https://bugs.freedesktop.org/show_bug.cgi?id=72943 Reported-by: Darcy Brás da Silva Signed-off-by: Ilia Mirkin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/nouveau/core/subdev/bios/init.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/init.c b/drivers/gpu/drm/nouveau/core/subdev/bios/init.c index e2d7f38447cc..3044b07230db 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/bios/init.c +++ b/drivers/gpu/drm/nouveau/core/subdev/bios/init.c @@ -1295,7 +1295,11 @@ init_jump(struct nvbios_init *init) u16 offset = nv_ro16(bios, init->offset + 1); trace("JUMP\t0x%04x\n", offset); - init->offset = offset; + + if (init_exec(init)) + init->offset = offset; + else + init->offset += 3; } /** From d2c62ec3b445d55819b7d96c3d45af97b0678af0 Mon Sep 17 00:00:00 2001 From: John David Anglin Date: Sun, 5 Jan 2014 21:25:00 -0500 Subject: [PATCH 45/63] parisc: Ensure full cache coherency for kmap/kunmap commit f8dae00684d678afa13041ef170cecfd1297ed40 upstream. Helge Deller noted a few weeks ago problems with the AIO support on parisc. This change is the result of numerous iterations on how best to deal with this problem. The solution adopted here is to provide full cache coherency in a uniform manner on all parisc systems. This involves calling flush_dcache_page() on kmap operations and flush_kernel_dcache_page() on kunmap operations. As a result, the copy_user_page() and clear_user_page() functions can be removed and the overall code is simpler. The change ensures that both userspace and kernel aliases to a mapped page are invalidated and flushed. This is necessary for the correct operation of PA8800 and PA8900 based systems which do not support inequivalent aliases. With this change, I have observed no cache related issues on c8000 and rp3440. It is now possible for example to do kernel builds with "-j64" on four way systems. On systems using XFS file systems, the patch recently posted by Mikulas Patocka to "fix crash using XFS on loopback" is needed to avoid a hang caused by an uninitialized lock passed to flush_dcache_page() in the page struct. Signed-off-by: John David Anglin Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/include/asm/cacheflush.h | 12 ++++------ arch/parisc/include/asm/page.h | 5 ++-- arch/parisc/kernel/cache.c | 35 ---------------------------- 3 files changed, 6 insertions(+), 46 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index f0e2784e7cca..2f9b751878ba 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -125,42 +125,38 @@ flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vma void mark_rodata_ro(void); #endif -#ifdef CONFIG_PA8X00 -/* Only pa8800, pa8900 needs this */ - #include #define ARCH_HAS_KMAP -void kunmap_parisc(void *addr); - static inline void *kmap(struct page *page) { might_sleep(); + flush_dcache_page(page); return page_address(page); } static inline void kunmap(struct page *page) { - kunmap_parisc(page_address(page)); + flush_kernel_dcache_page_addr(page_address(page)); } static inline void *kmap_atomic(struct page *page) { pagefault_disable(); + flush_dcache_page(page); return page_address(page); } static inline void __kunmap_atomic(void *addr) { - kunmap_parisc(addr); + flush_kernel_dcache_page_addr(addr); pagefault_enable(); } #define kmap_atomic_prot(page, prot) kmap_atomic(page) #define kmap_atomic_pfn(pfn) kmap_atomic(pfn_to_page(pfn)) #define kmap_atomic_to_page(ptr) virt_to_page(ptr) -#endif #endif /* _PARISC_CACHEFLUSH_H */ diff --git a/arch/parisc/include/asm/page.h b/arch/parisc/include/asm/page.h index b7adb2ac049c..c53fc63149e8 100644 --- a/arch/parisc/include/asm/page.h +++ b/arch/parisc/include/asm/page.h @@ -28,9 +28,8 @@ struct page; void clear_page_asm(void *page); void copy_page_asm(void *to, void *from); -void clear_user_page(void *vto, unsigned long vaddr, struct page *pg); -void copy_user_page(void *vto, void *vfrom, unsigned long vaddr, - struct page *pg); +#define clear_user_page(vto, vaddr, page) clear_page_asm(vto) +#define copy_user_page(vto, vfrom, vaddr, page) copy_page_asm(vto, vfrom) /* #define CONFIG_PARISC_TMPALIAS */ diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c index c035673209f7..a72545554a31 100644 --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -388,41 +388,6 @@ void flush_kernel_dcache_page_addr(void *addr) } EXPORT_SYMBOL(flush_kernel_dcache_page_addr); -void clear_user_page(void *vto, unsigned long vaddr, struct page *page) -{ - clear_page_asm(vto); - if (!parisc_requires_coherency()) - flush_kernel_dcache_page_asm(vto); -} -EXPORT_SYMBOL(clear_user_page); - -void copy_user_page(void *vto, void *vfrom, unsigned long vaddr, - struct page *pg) -{ - /* Copy using kernel mapping. No coherency is needed - (all in kmap/kunmap) on machines that don't support - non-equivalent aliasing. However, the `from' page - needs to be flushed before it can be accessed through - the kernel mapping. */ - preempt_disable(); - flush_dcache_page_asm(__pa(vfrom), vaddr); - preempt_enable(); - copy_page_asm(vto, vfrom); - if (!parisc_requires_coherency()) - flush_kernel_dcache_page_asm(vto); -} -EXPORT_SYMBOL(copy_user_page); - -#ifdef CONFIG_PA8X00 - -void kunmap_parisc(void *addr) -{ - if (parisc_requires_coherency()) - flush_kernel_dcache_page_addr(addr); -} -EXPORT_SYMBOL(kunmap_parisc); -#endif - void purge_tlb_entries(struct mm_struct *mm, unsigned long addr) { unsigned long flags; From 6e3eb1670fb74dd18a294990c52ff2f3f2e29824 Mon Sep 17 00:00:00 2001 From: Simon Guinot Date: Mon, 23 Dec 2013 13:24:35 +0100 Subject: [PATCH 46/63] ahci: add PCI ID for Marvell 88SE9170 SATA controller commit e098f5cbe9d410e7878b50f524dce36cc83ec40e upstream. This patch adds support for the PCI ID provided by the Marvell 88SE9170 SATA controller. Signed-off-by: Simon Guinot Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman --- drivers/ata/ahci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 3d67f76407c1..3f1794f4a8bf 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -427,6 +427,9 @@ static const struct pci_device_id ahci_pci_tbl[] = { .driver_data = board_ahci_yes_fbs }, /* 88se9128 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL_EXT, 0x9125), .driver_data = board_ahci_yes_fbs }, /* 88se9125 */ + { PCI_DEVICE_SUB(PCI_VENDOR_ID_MARVELL_EXT, 0x9178, + PCI_VENDOR_ID_MARVELL_EXT, 0x9170), + .driver_data = board_ahci_yes_fbs }, /* 88se9170 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL_EXT, 0x917a), .driver_data = board_ahci_yes_fbs }, /* 88se9172 */ { PCI_DEVICE(PCI_VENDOR_ID_MARVELL_EXT, 0x9172), From f783d8b1299501f0aa0586dbd4fa0460f5f33be0 Mon Sep 17 00:00:00 2001 From: James Hogan Date: Mon, 16 Dec 2013 10:41:38 +0000 Subject: [PATCH 47/63] clk: clk-divider: fix divisor > 255 bug commit 778037e1ccb75609846deca9e419449c1dc137fa upstream. Commit 6d9252bd9a4bb (clk: Add support for power of two type dividers) merged in v3.6 added the _get_val function to convert a divisor value to a register field value depending on the flags. However it used the type u8 for the div field, causing divisors larger than 255 to be masked and the resultant clock rate to be too high. E.g. in my case an 11bit divider was supposed to divide 24.576 MHz down to 32.768KHz. The divisor was correctly calculated as 750 (0x2ee). This was masked to 238 (0xee) resulting in a frequency of 103.26KHz. Signed-off-by: James Hogan Cc: Rajendra Nayak Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Mike Turquette Signed-off-by: Greg Kroah-Hartman --- drivers/clk/clk-divider.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c index 6d9674160430..2ce22447d76e 100644 --- a/drivers/clk/clk-divider.c +++ b/drivers/clk/clk-divider.c @@ -87,7 +87,7 @@ static unsigned int _get_table_val(const struct clk_div_table *table, return 0; } -static unsigned int _get_val(struct clk_divider *divider, u8 div) +static unsigned int _get_val(struct clk_divider *divider, unsigned int div) { if (divider->flags & CLK_DIVIDER_ONE_BASED) return div; From 3541ef9f6f33b137586b32d37b2a068e0160a82c Mon Sep 17 00:00:00 2001 From: Seung-Woo Kim Date: Fri, 22 Nov 2013 14:21:08 +0900 Subject: [PATCH 48/63] clk: samsung: exynos4: Correct SRC_MFC register commit 5fdd1b56be51b1ec4dbde5b213d649ac717442da upstream. The SRC_MFC register offset was incorrect, which could cause have caused wrong calculation of rate of sclk_mfc clock, that could in turn lead to incorrect operation of MFC. This patch corrects it. Signed-off-by: Seung-Woo Kim Acked-by: Mike Turquette [t.figa: Updated patch description] Signed-off-by: Tomasz Figa Signed-off-by: Greg Kroah-Hartman --- drivers/clk/samsung/clk-exynos4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/samsung/clk-exynos4.c b/drivers/clk/samsung/clk-exynos4.c index 3c1f88868f29..b4b283b47308 100644 --- a/drivers/clk/samsung/clk-exynos4.c +++ b/drivers/clk/samsung/clk-exynos4.c @@ -40,7 +40,7 @@ #define SRC_TOP1 0xc214 #define SRC_CAM 0xc220 #define SRC_TV 0xc224 -#define SRC_MFC 0xcc28 +#define SRC_MFC 0xc228 #define SRC_G3D 0xc22c #define E4210_SRC_IMAGE 0xc230 #define SRC_LCD0 0xc234 From e964763f1bed41305564d0656295639fcd7f7ed7 Mon Sep 17 00:00:00 2001 From: Abhilash Kesavan Date: Wed, 11 Dec 2013 17:27:05 +0530 Subject: [PATCH 49/63] clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock commit 2feed5aecf5f367b92bd6b6e92afe9e3de466907 upstream. The sysreg (system register) generates control signals for various blocks like disp1blk, i2c, mipi, usb etc. However, it gets disabled as an unused clock at boot-up. This can lead to failures in operation of above blocks, because they can not be configured properly if this clock is disabled. Signed-off-by: Abhilash Kesavan Acked-by: Mike Turquette [t.figa: Updated patch description.] Signed-off-by: Tomasz Figa Signed-off-by: Greg Kroah-Hartman --- drivers/clk/samsung/clk-exynos5250.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/clk/samsung/clk-exynos5250.c b/drivers/clk/samsung/clk-exynos5250.c index 22d7699e7ced..4e6ab159cd08 100644 --- a/drivers/clk/samsung/clk-exynos5250.c +++ b/drivers/clk/samsung/clk-exynos5250.c @@ -377,7 +377,8 @@ struct samsung_gate_clock exynos5250_gate_clks[] __initdata = { GATE(hsi2c2, "hsi2c2", "aclk66", GATE_IP_PERIC, 30, 0, 0), GATE(hsi2c3, "hsi2c3", "aclk66", GATE_IP_PERIC, 31, 0, 0), GATE(chipid, "chipid", "aclk66", GATE_IP_PERIS, 0, 0, 0), - GATE(sysreg, "sysreg", "aclk66", GATE_IP_PERIS, 1, 0, 0), + GATE(sysreg, "sysreg", "aclk66", + GATE_IP_PERIS, 1, CLK_IGNORE_UNUSED, 0), GATE(pmu, "pmu", "aclk66", GATE_IP_PERIS, 2, CLK_IGNORE_UNUSED, 0), GATE(tzpc0, "tzpc0", "aclk66", GATE_IP_PERIS, 6, 0, 0), GATE(tzpc1, "tzpc1", "aclk66", GATE_IP_PERIS, 7, 0, 0), From 94f8884551dac1e358bc4a7a90d9442998261a3e Mon Sep 17 00:00:00 2001 From: Andrew Bresticker Date: Fri, 8 Nov 2013 15:44:07 +0530 Subject: [PATCH 50/63] clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks commit 97c3557c3e0413efb1f021f582d1459760e22727 upstream. The gate clocks for the MFC sysmmus appear to be flipped, i.e. GATE_IP_MFC[2] gates sysmmu_mfcl and GATE_IP_MFC[1] gates sysmmu_mfcr. Fix this so that the MFC will start up. Signed-off-by: Andrew Bresticker Signed-off-by: Sachin Kamat Acked-by: Mike Turquette Signed-off-by: Tomasz Figa Signed-off-by: Greg Kroah-Hartman --- drivers/clk/samsung/clk-exynos5250.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/clk/samsung/clk-exynos5250.c b/drivers/clk/samsung/clk-exynos5250.c index 4e6ab159cd08..92afc060673e 100644 --- a/drivers/clk/samsung/clk-exynos5250.c +++ b/drivers/clk/samsung/clk-exynos5250.c @@ -325,8 +325,8 @@ struct samsung_gate_clock exynos5250_gate_clks[] __initdata = { GATE(smmu_gscl2, "smmu_gscl2", "aclk266", GATE_IP_GSCL, 9, 0, 0), GATE(smmu_gscl3, "smmu_gscl3", "aclk266", GATE_IP_GSCL, 10, 0, 0), GATE(mfc, "mfc", "aclk333", GATE_IP_MFC, 0, 0, 0), - GATE(smmu_mfcl, "smmu_mfcl", "aclk333", GATE_IP_MFC, 1, 0, 0), - GATE(smmu_mfcr, "smmu_mfcr", "aclk333", GATE_IP_MFC, 2, 0, 0), + GATE(smmu_mfcl, "smmu_mfcl", "aclk333", GATE_IP_MFC, 2, 0, 0), + GATE(smmu_mfcr, "smmu_mfcr", "aclk333", GATE_IP_MFC, 1, 0, 0), GATE(rotator, "rotator", "aclk266", GATE_IP_GEN, 1, 0, 0), GATE(jpeg, "jpeg", "aclk166", GATE_IP_GEN, 2, 0, 0), GATE(mdma1, "mdma1", "aclk266", GATE_IP_GEN, 4, 0, 0), From 033795785f03230ca660a37a428db3ab096dd659 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 2 Dec 2013 12:20:36 +0100 Subject: [PATCH 51/63] mfd: rtsx_pcr: Disable interrupts before cancelling delayed works commit 73beb63d290f961c299526852884846b0d868840 upstream. This fixes a kernel panic when resuming from suspend to RAM. Without this fix an interrupt hits after the delayed work is canceled and thus requeues it. So we end up freeing an armed timer. Signed-off-by: Thomas Gleixner Signed-off-by: Samuel Ortiz Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/rtsx_pcr.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/mfd/rtsx_pcr.c b/drivers/mfd/rtsx_pcr.c index e968c01ca2ac..45f26be359ea 100644 --- a/drivers/mfd/rtsx_pcr.c +++ b/drivers/mfd/rtsx_pcr.c @@ -1195,8 +1195,14 @@ static void rtsx_pci_remove(struct pci_dev *pcidev) pcr->remove_pci = true; - cancel_delayed_work(&pcr->carddet_work); - cancel_delayed_work(&pcr->idle_work); + /* Disable interrupts at the pcr level */ + spin_lock_irq(&pcr->lock); + rtsx_pci_writel(pcr, RTSX_BIER, 0); + pcr->bier = 0; + spin_unlock_irq(&pcr->lock); + + cancel_delayed_work_sync(&pcr->carddet_work); + cancel_delayed_work_sync(&pcr->idle_work); mfd_remove_devices(&pcidev->dev); From a771213d2f2e5f809a3234060ffa58c2b5d7a1b8 Mon Sep 17 00:00:00 2001 From: Jiang Liu Date: Thu, 19 Dec 2013 20:38:15 +0800 Subject: [PATCH 52/63] ACPI / TPM: fix memory leak when walking ACPI namespace commit df45c712d1f4ef37714245fb75de726f4ca2bf8d upstream. In function ppi_callback(), memory allocated by acpi_get_name() will get leaked when current device isn't the desired TPM device, so fix the memory leak. Signed-off-by: Jiang Liu Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/char/tpm/tpm_ppi.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/char/tpm/tpm_ppi.c b/drivers/char/tpm/tpm_ppi.c index 2168d15bc728..57a818b2b5f2 100644 --- a/drivers/char/tpm/tpm_ppi.c +++ b/drivers/char/tpm/tpm_ppi.c @@ -27,15 +27,18 @@ static char *tpm_device_name = "TPM"; static acpi_status ppi_callback(acpi_handle handle, u32 level, void *context, void **return_value) { - acpi_status status; + acpi_status status = AE_OK; struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; - status = acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer); - if (strstr(buffer.pointer, context) != NULL) { - *return_value = handle; + + if (ACPI_SUCCESS(acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer))) { + if (strstr(buffer.pointer, context) != NULL) { + *return_value = handle; + status = AE_CTRL_TERMINATE; + } kfree(buffer.pointer); - return AE_CTRL_TERMINATE; } - return AE_OK; + + return status; } static inline void ppi_assign_params(union acpi_object params[4], From 290a699346ae6cc3e4ccfa7bdc1ee4445f6e020f Mon Sep 17 00:00:00 2001 From: Lan Tianyu Date: Mon, 6 Jan 2014 22:50:37 +0800 Subject: [PATCH 53/63] ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS commit a90b40385735af0d3031f98e97b439e8944a31b3 upstream. The AML method _BIX of NEC LZ750/LS returns a broken package which skips the first member "Revision" (ACPI 5.0, Table 10-234). Add a quirk for this machine to skip member "Revision" during parsing the package returned by _BIX. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=67351 Reported-and-tested-by: Francisco Castro Signed-off-by: Lan Tianyu Reviewed-by: Dmitry Torokhov Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/battery.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c index 95332717e4f5..99427d7307af 100644 --- a/drivers/acpi/battery.c +++ b/drivers/acpi/battery.c @@ -68,6 +68,7 @@ MODULE_AUTHOR("Alexey Starikovskiy "); MODULE_DESCRIPTION("ACPI Battery Driver"); MODULE_LICENSE("GPL"); +static int battery_bix_broken_package; static unsigned int cache_time = 1000; module_param(cache_time, uint, 0644); MODULE_PARM_DESC(cache_time, "cache time in milliseconds"); @@ -443,7 +444,12 @@ static int acpi_battery_get_info(struct acpi_battery *battery) ACPI_EXCEPTION((AE_INFO, status, "Evaluating %s", name)); return -ENODEV; } - if (test_bit(ACPI_BATTERY_XINFO_PRESENT, &battery->flags)) + + if (battery_bix_broken_package) + result = extract_package(battery, buffer.pointer, + extended_info_offsets + 1, + ARRAY_SIZE(extended_info_offsets) - 1); + else if (test_bit(ACPI_BATTERY_XINFO_PRESENT, &battery->flags)) result = extract_package(battery, buffer.pointer, extended_info_offsets, ARRAY_SIZE(extended_info_offsets)); @@ -1064,6 +1070,17 @@ static int battery_notify(struct notifier_block *nb, return 0; } +static struct dmi_system_id bat_dmi_table[] = { + { + .ident = "NEC LZ750/LS", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "NEC"), + DMI_MATCH(DMI_PRODUCT_NAME, "PC-LZ750LS"), + }, + }, + {}, +}; + static int acpi_battery_add(struct acpi_device *device) { int result = 0; @@ -1174,6 +1191,8 @@ static void __init acpi_battery_init_async(void *unused, async_cookie_t cookie) if (!acpi_battery_dir) return; #endif + if (dmi_check_system(bat_dmi_table)) + battery_bix_broken_package = 1; if (acpi_bus_register_driver(&acpi_battery_driver) < 0) { #ifdef CONFIG_ACPI_PROCFS_POWER acpi_unlock_battery_dir(acpi_battery_dir); From 9b788c26a953d491617b05d1f4c63895524e7dab Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Mon, 16 Dec 2013 21:39:50 +0100 Subject: [PATCH 54/63] mac80211: move "bufferable MMPDU" check to fix AP mode scan commit 277d916fc2e959c3f106904116bb4f7b1148d47a upstream. The check needs to apply to both multicast and unicast packets, otherwise probe requests on AP mode scans are sent through the multicast buffer queue, which adds long delays (often longer than the scanning interval). Signed-off-by: Felix Fietkau Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman --- net/mac80211/tx.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index e9d18c30071f..cc65cdddb047 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -447,7 +447,6 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) { struct sta_info *sta = tx->sta; struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb); - struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)tx->skb->data; struct ieee80211_local *local = tx->local; if (unlikely(!sta)) @@ -458,15 +457,6 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) !(info->flags & IEEE80211_TX_CTL_NO_PS_BUFFER))) { int ac = skb_get_queue_mapping(tx->skb); - /* only deauth, disassoc and action are bufferable MMPDUs */ - if (ieee80211_is_mgmt(hdr->frame_control) && - !ieee80211_is_deauth(hdr->frame_control) && - !ieee80211_is_disassoc(hdr->frame_control) && - !ieee80211_is_action(hdr->frame_control)) { - info->flags |= IEEE80211_TX_CTL_NO_PS_BUFFER; - return TX_CONTINUE; - } - ps_dbg(sta->sdata, "STA %pM aid %d: PS buffer for AC %d\n", sta->sta.addr, sta->sta.aid, ac); if (tx->local->total_ps_buffered >= TOTAL_MAX_TX_BUFFER) @@ -509,9 +499,22 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) static ieee80211_tx_result debug_noinline ieee80211_tx_h_ps_buf(struct ieee80211_tx_data *tx) { + struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb); + struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)tx->skb->data; + if (unlikely(tx->flags & IEEE80211_TX_PS_BUFFERED)) return TX_CONTINUE; + /* only deauth, disassoc and action are bufferable MMPDUs */ + if (ieee80211_is_mgmt(hdr->frame_control) && + !ieee80211_is_deauth(hdr->frame_control) && + !ieee80211_is_disassoc(hdr->frame_control) && + !ieee80211_is_action(hdr->frame_control)) { + if (tx->flags & IEEE80211_TX_UNICAST) + info->flags |= IEEE80211_TX_CTL_NO_PS_BUFFER; + return TX_CONTINUE; + } + if (tx->flags & IEEE80211_TX_UNICAST) return ieee80211_tx_h_unicast_ps_buf(tx); else From d0ccf8a11507ae170d8db47babad5331a4cdb33f Mon Sep 17 00:00:00 2001 From: Dirk Brandewie Date: Mon, 6 Jan 2014 10:59:16 -0800 Subject: [PATCH 55/63] intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters. commit 6cbd7ee10e2842a3d1f9b60abede1c8f3d1f1130 upstream. KVM environments do not support APERF/MPERF MSRs. intel_pstate cannot operate without these registers. The previous validity checks in intel_pstate_msrs_not_valid() are insufficent in nested KVMs. References: https://bugzilla.redhat.com/show_bug.cgi?id=1046317 Signed-off-by: Dirk Brandewie Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/cpufreq/intel_pstate.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 7054c579d451..a22fb3e47256 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -516,7 +516,8 @@ static void intel_pstate_timer_func(unsigned long __data) } #define ICPU(model, policy) \ - { X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, (unsigned long)&policy } + { X86_VENDOR_INTEL, 6, model, X86_FEATURE_APERFMPERF,\ + (unsigned long)&policy } static const struct x86_cpu_id intel_pstate_cpu_ids[] = { ICPU(0x2a, default_policy), From af313b03198d1bbb13e83793416b229d6b1c810d Mon Sep 17 00:00:00 2001 From: Bernd Schubert Date: Mon, 23 Sep 2013 14:47:32 +0200 Subject: [PATCH 56/63] SCSI: sd: Reduce buffer size for vpd request commit af73623f5f10eb3832c87a169b28f7df040a875b upstream. Somehow older areca firmware versions have issues with scsi_get_vpd_page() and a large buffer, the firmware seems to crash and the scsi error-handler will start endless recovery retries. Limiting the buf-size to 64-bytes fixes this issue with older firmware versions (<1.49 for my controller). Fixes a regression with areca controllers and older firmware versions introduced by commit: 66c28f97120e8a621afd5aa7a31c4b85c547d33d Reported-by: Nix Tested-by: Nix Signed-off-by: Bernd Schubert Acked-by: Martin K. Petersen Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sd.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 9bc913b92d13..26b543bc4f53 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2634,13 +2634,16 @@ static void sd_read_write_same(struct scsi_disk *sdkp, unsigned char *buffer) } if (scsi_report_opcode(sdev, buffer, SD_BUF_SIZE, INQUIRY) < 0) { + /* too large values might cause issues with arcmsr */ + int vpd_buf_len = 64; + sdev->no_report_opcodes = 1; /* Disable WRITE SAME if REPORT SUPPORTED OPERATION * CODES is unsupported and the device has an ATA * Information VPD page (SAT). */ - if (!scsi_get_vpd_page(sdev, 0x89, buffer, SD_BUF_SIZE)) + if (!scsi_get_vpd_page(sdev, 0x89, buffer, vpd_buf_len)) sdev->no_write_same = 1; } From 7b2252e993e29974eb0d017156db989173ec31aa Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Tue, 31 Dec 2013 16:28:39 +0100 Subject: [PATCH 57/63] netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper commit 2690d97ade05c5325cbf7c72b94b90d265659886 upstream. Commit 5901b6be885e attempted to introduce IPv6 support into IRC NAT helper. By doing so, the following code seemed to be removed by accident: ip = ntohl(exp->master->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.ip); sprintf(buffer, "%u %u", ip, port); pr_debug("nf_nat_irc: inserting '%s' == %pI4, port %u\n", buffer, &ip, port); This leads to the fact that buffer[] was left uninitialized and contained some stack value. When we call nf_nat_mangle_tcp_packet(), we call strlen(buffer) on excatly this uninitialized buffer. If we are unlucky and the skb has enough tailroom, we overwrite resp. leak contents with values that sit on our stack into the packet and send that out to the receiver. Since the rather informal DCC spec [1] does not seem to specify IPv6 support right now, we log such occurences so that admins can act accordingly, and drop the packet. I've looked into XChat source, and IPv6 is not supported there: addresses are in u32 and print via %u format string. Therefore, restore old behaviour as in IPv4, use snprintf(). The IRC helper does not support IPv6 by now. By this, we can safely use strlen(buffer) in nf_nat_mangle_tcp_packet() and prevent a buffer overflow. Also simplify some code as we now have ct variable anyway. [1] http://www.irchelp.org/irchelp/rfc/ctcpspec.html Fixes: 5901b6be885e ("netfilter: nf_nat: support IPv6 in IRC NAT helper") Signed-off-by: Daniel Borkmann Cc: Harald Welte Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nf_nat_irc.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nf_nat_irc.c b/net/netfilter/nf_nat_irc.c index f02b3605823e..1fb2258c3535 100644 --- a/net/netfilter/nf_nat_irc.c +++ b/net/netfilter/nf_nat_irc.c @@ -34,10 +34,14 @@ static unsigned int help(struct sk_buff *skb, struct nf_conntrack_expect *exp) { char buffer[sizeof("4294967296 65635")]; + struct nf_conn *ct = exp->master; + union nf_inet_addr newaddr; u_int16_t port; unsigned int ret; /* Reply comes from server. */ + newaddr = ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3; + exp->saved_proto.tcp.port = exp->tuple.dst.u.tcp.port; exp->dir = IP_CT_DIR_REPLY; exp->expectfn = nf_nat_follow_master; @@ -57,17 +61,35 @@ static unsigned int help(struct sk_buff *skb, } if (port == 0) { - nf_ct_helper_log(skb, exp->master, "all ports in use"); + nf_ct_helper_log(skb, ct, "all ports in use"); return NF_DROP; } - ret = nf_nat_mangle_tcp_packet(skb, exp->master, ctinfo, - protoff, matchoff, matchlen, buffer, - strlen(buffer)); + /* strlen("\1DCC CHAT chat AAAAAAAA P\1\n")=27 + * strlen("\1DCC SCHAT chat AAAAAAAA P\1\n")=28 + * strlen("\1DCC SEND F AAAAAAAA P S\1\n")=26 + * strlen("\1DCC MOVE F AAAAAAAA P S\1\n")=26 + * strlen("\1DCC TSEND F AAAAAAAA P S\1\n")=27 + * + * AAAAAAAAA: bound addr (1.0.0.0==16777216, min 8 digits, + * 255.255.255.255==4294967296, 10 digits) + * P: bound port (min 1 d, max 5d (65635)) + * F: filename (min 1 d ) + * S: size (min 1 d ) + * 0x01, \n: terminators + */ + /* AAA = "us", ie. where server normally talks to. */ + snprintf(buffer, sizeof(buffer), "%u %u", ntohl(newaddr.ip), port); + pr_debug("nf_nat_irc: inserting '%s' == %pI4, port %u\n", + buffer, &newaddr.ip, port); + + ret = nf_nat_mangle_tcp_packet(skb, ct, ctinfo, protoff, matchoff, + matchlen, buffer, strlen(buffer)); if (ret != NF_ACCEPT) { - nf_ct_helper_log(skb, exp->master, "cannot mangle packet"); + nf_ct_helper_log(skb, ct, "cannot mangle packet"); nf_ct_unexpect_related(exp); } + return ret; } From e23fe36a8cf5faa57d0c45868a3f7679c4f07cb0 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 11 Jan 2014 19:15:52 -0800 Subject: [PATCH 58/63] x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround commit 26bef1318adc1b3a530ecc807ef99346db2aa8b0 upstream. Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog Tested-by: Borislav Petkov Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/fpu-internal.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index e25cc33ec54d..e72b2e41499e 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -295,12 +295,13 @@ static inline int restore_fpu_checking(struct task_struct *tsk) /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is pending. Clear the x87 state here by setting it to fixed values. "m" is a random variable that should be in L1 */ - alternative_input( - ASM_NOP8 ASM_NOP2, - "emms\n\t" /* clear stack tags */ - "fildl %P[addr]", /* set F?P to defined value */ - X86_FEATURE_FXSAVE_LEAK, - [addr] "m" (tsk->thread.fpu.has_fpu)); + if (unlikely(static_cpu_has(X86_FEATURE_FXSAVE_LEAK))) { + asm volatile( + "fnclex\n\t" + "emms\n\t" + "fildl %P[addr]" /* set F?P to defined value */ + : : [addr] "m" (tsk->thread.fpu.has_fpu)); + } return fpu_restore_checking(&tsk->thread.fpu); } From 9d80092f8d9e0fc4aa2b6a7c8d2e4a7437899ca5 Mon Sep 17 00:00:00 2001 From: Ben Segall Date: Wed, 16 Oct 2013 11:16:12 -0700 Subject: [PATCH 59/63] sched: Fix race on toggling cfs_bandwidth_used commit 1ee14e6c8cddeeb8a490d7b54cd9016e4bb900b4 upstream. When we transition cfs_bandwidth_used to false, any currently throttled groups will incorrectly return false from cfs_rq_throttled. While tg_set_cfs_bandwidth will unthrottle them eventually, currently running code (including at least dequeue_task_fair and distribute_cfs_runtime) will cause errors. Fix this by turning off cfs_bandwidth_used only after unthrottling all cfs_rqs. Tested: toggle bandwidth back and forth on a loaded cgroup. Caused crashes in minutes without the patch, hasn't crashed with it. Signed-off-by: Ben Segall Signed-off-by: Peter Zijlstra Cc: pjt@google.com Link: http://lkml.kernel.org/r/20131016181611.22647.80365.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar Cc: Chris J Arges Signed-off-by: Greg Kroah-Hartman --- kernel/sched/core.c | 9 ++++++++- kernel/sched/fair.c | 16 +++++++++------- kernel/sched/sched.h | 3 ++- 3 files changed, 19 insertions(+), 9 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f9e35b1e7713..b4308d7da339 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7812,7 +7812,12 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) runtime_enabled = quota != RUNTIME_INF; runtime_was_enabled = cfs_b->quota != RUNTIME_INF; - account_cfs_bandwidth_used(runtime_enabled, runtime_was_enabled); + /* + * If we need to toggle cfs_bandwidth_used, off->on must occur + * before making related changes, and on->off must occur afterwards + */ + if (runtime_enabled && !runtime_was_enabled) + cfs_bandwidth_usage_inc(); raw_spin_lock_irq(&cfs_b->lock); cfs_b->period = ns_to_ktime(period); cfs_b->quota = quota; @@ -7838,6 +7843,8 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) unthrottle_cfs_rq(cfs_rq); raw_spin_unlock_irq(&rq->lock); } + if (runtime_was_enabled && !runtime_enabled) + cfs_bandwidth_usage_dec(); out_unlock: mutex_unlock(&cfs_constraints_mutex); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ce60006132b1..54e85898c390 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2029,13 +2029,14 @@ static inline bool cfs_bandwidth_used(void) return static_key_false(&__cfs_bandwidth_used); } -void account_cfs_bandwidth_used(int enabled, int was_enabled) +void cfs_bandwidth_usage_inc(void) { - /* only need to count groups transitioning between enabled/!enabled */ - if (enabled && !was_enabled) - static_key_slow_inc(&__cfs_bandwidth_used); - else if (!enabled && was_enabled) - static_key_slow_dec(&__cfs_bandwidth_used); + static_key_slow_inc(&__cfs_bandwidth_used); +} + +void cfs_bandwidth_usage_dec(void) +{ + static_key_slow_dec(&__cfs_bandwidth_used); } #else /* HAVE_JUMP_LABEL */ static bool cfs_bandwidth_used(void) @@ -2043,7 +2044,8 @@ static bool cfs_bandwidth_used(void) return true; } -void account_cfs_bandwidth_used(int enabled, int was_enabled) {} +void cfs_bandwidth_usage_inc(void) {} +void cfs_bandwidth_usage_dec(void) {} #endif /* HAVE_JUMP_LABEL */ /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ce39224d6155..dfa31d533e3f 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1318,7 +1318,8 @@ extern void print_rt_stats(struct seq_file *m, int cpu); extern void init_cfs_rq(struct cfs_rq *cfs_rq); extern void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq); -extern void account_cfs_bandwidth_used(int enabled, int was_enabled); +extern void cfs_bandwidth_usage_inc(void); +extern void cfs_bandwidth_usage_dec(void); #ifdef CONFIG_NO_HZ_COMMON enum rq_nohz_flag_bits { From 373e0a593bd15df79e47158bd4628eb133d4da7d Mon Sep 17 00:00:00 2001 From: Ben Segall Date: Wed, 16 Oct 2013 11:16:17 -0700 Subject: [PATCH 60/63] sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining commit db06e78cc13d70f10877e0557becc88ab3ad2be8 upstream. hrtimer_expires_remaining does not take internal hrtimer locks and thus must be guarded against concurrent __hrtimer_start_range_ns (but returning HRTIMER_RESTART is safe). Use cfs_b->lock to make it safe. Signed-off-by: Ben Segall Signed-off-by: Peter Zijlstra Cc: pjt@google.com Link: http://lkml.kernel.org/r/20131016181617.22647.73829.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar Cc: Chris J Arges Signed-off-by: Greg Kroah-Hartman --- kernel/sched/fair.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 54e85898c390..d0a7b93ce05e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2470,7 +2470,13 @@ static const u64 min_bandwidth_expiration = 2 * NSEC_PER_MSEC; /* how long we wait to gather additional slack before distributing */ static const u64 cfs_bandwidth_slack_period = 5 * NSEC_PER_MSEC; -/* are we near the end of the current quota period? */ +/* + * Are we near the end of the current quota period? + * + * Requires cfs_b->lock for hrtimer_expires_remaining to be safe against the + * hrtimer base being cleared by __hrtimer_start_range_ns. In the case of + * migrate_hrtimers, base is never cleared, so we are fine. + */ static int runtime_refresh_within(struct cfs_bandwidth *cfs_b, u64 min_expire) { struct hrtimer *refresh_timer = &cfs_b->period_timer; @@ -2546,10 +2552,12 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b) u64 expires; /* confirm we're still not at a refresh boundary */ - if (runtime_refresh_within(cfs_b, min_bandwidth_expiration)) - return; - raw_spin_lock(&cfs_b->lock); + if (runtime_refresh_within(cfs_b, min_bandwidth_expiration)) { + raw_spin_unlock(&cfs_b->lock); + return; + } + if (cfs_b->quota != RUNTIME_INF && cfs_b->runtime > slice) { runtime = cfs_b->runtime; cfs_b->runtime = 0; From 9ca715c462018a8631240088dafa567bec6fe721 Mon Sep 17 00:00:00 2001 From: Ben Segall Date: Wed, 16 Oct 2013 11:16:22 -0700 Subject: [PATCH 61/63] sched: Fix hrtimer_cancel()/rq->lock deadlock commit 927b54fccbf04207ec92f669dce6806848cbec7d upstream. __start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock, waiting for the hrtimer to finish. However, if sched_cfs_period_timer runs for another loop iteration, the hrtimer can attempt to take rq->lock, resulting in deadlock. Fix this by ensuring that cfs_b->timer_active is cleared only if the _latest_ call to do_sched_cfs_period_timer is returning as idle. Then __start_cfs_bandwidth can just call hrtimer_try_to_cancel and wait for that to succeed or timer_active == 1. Signed-off-by: Ben Segall Signed-off-by: Peter Zijlstra Cc: pjt@google.com Link: http://lkml.kernel.org/r/20131016181622.22647.16643.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar Cc: Chris J Arges Signed-off-by: Greg Kroah-Hartman --- kernel/sched/fair.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d0a7b93ce05e..516bc354212d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2410,6 +2410,13 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun) if (idle) goto out_unlock; + /* + * if we have relooped after returning idle once, we need to update our + * status as actually running, so that other cpus doing + * __start_cfs_bandwidth will stop trying to cancel us. + */ + cfs_b->timer_active = 1; + __refill_cfs_bandwidth_runtime(cfs_b); if (!throttled) { @@ -2682,11 +2689,11 @@ void __start_cfs_bandwidth(struct cfs_bandwidth *cfs_b) * (timer_active==0 becomes visible before the hrtimer call-back * terminates). In either case we ensure that it's re-programmed */ - while (unlikely(hrtimer_active(&cfs_b->period_timer))) { + while (unlikely(hrtimer_active(&cfs_b->period_timer)) && + hrtimer_try_to_cancel(&cfs_b->period_timer) < 0) { + /* bounce the lock to allow do_sched_cfs_period_timer to run */ raw_spin_unlock(&cfs_b->lock); - /* ensure cfs_b->lock is available while we wait */ - hrtimer_cancel(&cfs_b->period_timer); - + cpu_relax(); raw_spin_lock(&cfs_b->lock); /* if someone else restarted the timer then we're done */ if (cfs_b->timer_active) From 5ba4542368ccbbb717426505c0fb801233f9110a Mon Sep 17 00:00:00 2001 From: Paul Turner Date: Wed, 16 Oct 2013 11:16:27 -0700 Subject: [PATCH 62/63] sched: Guarantee new group-entities always have weight commit 0ac9b1c21874d2490331233b3242085f8151e166 upstream. Currently, group entity load-weights are initialized to zero. This admits some races with respect to the first time they are re-weighted in earlty use. ( Let g[x] denote the se for "g" on cpu "x". ) Suppose that we have root->a and that a enters a throttled state, immediately followed by a[0]->t1 (the only task running on cpu[0]) blocking: put_prev_task(group_cfs_rq(a[0]), t1) put_prev_entity(..., t1) check_cfs_rq_runtime(group_cfs_rq(a[0])) throttle_cfs_rq(group_cfs_rq(a[0])) Then, before unthrottling occurs, let a[0]->b[0]->t2 wake for the first time: enqueue_task_fair(rq[0], t2) enqueue_entity(group_cfs_rq(b[0]), t2) enqueue_entity_load_avg(group_cfs_rq(b[0]), t2) account_entity_enqueue(group_cfs_ra(b[0]), t2) update_cfs_shares(group_cfs_rq(b[0])) < skipped because b is part of a throttled hierarchy > enqueue_entity(group_cfs_rq(a[0]), b[0]) ... We now have b[0] enqueued, yet group_cfs_rq(a[0])->load.weight == 0 which violates invariants in several code-paths. Eliminate the possibility of this by initializing group entity weight. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar Cc: Chris J Arges Signed-off-by: Greg Kroah-Hartman --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 516bc354212d..305ef886219e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6091,7 +6091,8 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq, se->cfs_rq = parent->my_q; se->my_q = cfs_rq; - update_load_set(&se->load, 0); + /* guarantee group entities always have weight */ + update_load_set(&se->load, NICE_0_LOAD); se->parent = parent; } From 1071ea6e68ead40df739b223e9013d99c23c19ab Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 15 Jan 2014 15:29:14 -0800 Subject: [PATCH 63/63] Linux 3.10.27 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index ac07707a2f9e..09675a57059c 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 3 PATCHLEVEL = 10 -SUBLEVEL = 26 +SUBLEVEL = 27 EXTRAVERSION = NAME = TOSSUG Baby Fish