[ Based on an original patch by Yuri Tikhonov ]
Implement the state machine for handling the RAID-6 parities check and
repair functionality. Note that the raid6 case does not need to check
for new failures, like raid5, as it will always writeback the correct
disks. The raid5 case can be updated to check zero_sum_result to avoid
getting confused by new failures rather than retrying the entire check
operation.
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In the synchronous implementation of stripe dirtying we processed a
degraded stripe with one call to handle_stripe_dirtying6(). I.e.
compute the missing blocks from the other drives, then copy in the new
data and reconstruct the parities.
In the asynchronous case we do not perform stripe operations directly.
Instead, operations are scheduled with flags to be later serviced by
raid_run_ops. So, for the degraded case the final reconstruction step
can only be carried out after all blocks have been brought up to date by
being read, or computed. Like the raid5 case schedule_reconstruction()
sets STRIPE_OP_RECONSTRUCT to request a parity generation pass and
through operation chaining can handle compute and reconstruct in a
single raid_run_ops pass.
[dan.j.williams@intel.com: fixup handle_stripe_dirtying6 gating]
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Modify handle_stripe_fill6 to work asynchronously by introducing
fetch_block6 as the raid6 analog of fetch_block5 (schedule compute
operations for missing/out-of-sync disks).
[dan.j.williams@intel.com: compute D+Q in one pass]
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Extend schedule_reconstruction5 for reuse by the raid6 path. Add
support for generating Q and BUG() if a request is made to perform
'prexor'.
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[ Based on an original patch by Yuri Tikhonov ]
The raid_run_ops routine uses the asynchronous offload api and
the stripe_operations member of a stripe_head to carry out xor+pq+copy
operations asynchronously, outside the lock.
The operations performed by RAID-6 are the same as in the RAID-5 case
except for no support of STRIPE_OP_PREXOR operations. All the others
are supported:
STRIPE_OP_BIOFILL
- copy data into request buffers to satisfy a read request
STRIPE_OP_COMPUTE_BLK
- generate missing blocks (1 or 2) in the cache from the other blocks
STRIPE_OP_BIODRAIN
- copy data out of request buffers to satisfy a write request
STRIPE_OP_RECONSTRUCT
- recalculate parity for new data that has entered the cache
STRIPE_OP_CHECK
- verify that the parity is correct
The flow is the same as in the RAID-5 case, and reuses some routines, namely:
1/ ops_complete_postxor (renamed to ops_complete_reconstruct)
2/ ops_complete_compute (updated to set up to 2 targets uptodate)
3/ ops_run_check (renamed to ops_run_check_p for xor parity checks)
[neilb@suse.de: fixes to get it to pass mdadm regression suite]
Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ops_complete_compute5 can be reused in the raid6 path if it is updated to
generically handle a second target.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Even though the intent is to extend dmatest with P+Q tests there is
still value in having an always-on sanity check to prevent an
unintentionally broken driver from registering.
This depends on raid6_pq.ko for verification, the side effect being that
PQ capable channels will fail to register when raid6 is disabled.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
iop33x support is not included because that engine is a bit more awkward
to handle in that it can either be in xor mode or pq mode. The
dmaengine/async_tx layers currently only comprehend static capabilities.
Note iop13xx does not support hardware PQ continuation so the driver
must handle the DMA_PREP_CONTINUE flag for operations across > 16
sources. From the comment for dma_maxpq:
/* When an engine does not support native continuation we need 3 extra
* source slots to reuse P and Q with the following coefficients:
* 1/ {00} * P : remove P from Q', but use it as a source for P'
* 2/ {01} * Q : use Q to continue Q' calculation
* 3/ {00} * Q : subtract Q from P' to cancel (2)
*/
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
lockdep correctly identifies a potential recursive locking case for
iop_chan->lock, but in the dependency submission case we expect that the same
class will be acquired for both the parent dependency and the child channel.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Port drivers/md/raid6test/test.c to use the async raid6 recovery
routines. This is meant as a unit test for raid6 acceleration drivers. In
addition to the 16-drive test case this implements tests for the 4-disk and
5-disk special cases (dma devices can not generically handle less than 2
sources), and adds a test for the D+Q case.
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Test raid6 p+q operations with a simple "always multiply by 1" q
calculation to fit into dmatest's current destination verification
scheme.
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
async_raid6_2data_recov() recovers two data disk failures
async_raid6_datap_recov() recovers a data disk and the P disk
These routines are a port of the synchronous versions found in
drivers/md/raid6recov.c. The primary difference is breaking out the xor
operations into separate calls to async_xor. Two helper routines are
introduced to perform scalar multiplication where needed.
async_sum_product() multiplies two sources by scalar coefficients and
then sums (xor) the result. async_mult() simply multiplies a single
source by a scalar.
This implemention also includes, in contrast to the original
synchronous-only code, special case handling for the 4-disk and 5-disk
array cases. In these situations the default N-disk algorithm will
present 0-source or 1-source operations to dma devices. To cover for
dma devices where the minimum source count is 2 we implement 4-disk and
5-disk handling in the recovery code.
[ Impact: asynchronous raid6 recovery routines for 2data and datap cases ]
Cc: Yuri Tikhonov <yur@emcraft.com>
Cc: Ilya Yanok <yanok@emcraft.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[ Based on an original patch by Yuri Tikhonov ]
This adds support for doing asynchronous GF multiplication by adding
two additional functions to the async_tx API:
async_gen_syndrome() does simultaneous XOR and Galois field
multiplication of sources.
async_syndrome_val() validates the given source buffers against known P
and Q values.
When a request is made to run async_pq against more than the hardware
maximum number of supported sources we need to reuse the previous
generated P and Q values as sources into the next operation. Care must
be taken to remove Q from P' and P from Q'. For example to perform a 5
source pq op with hardware that only supports 4 sources at a time the
following approach is taken:
p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08}))
p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10}))
p' = p + q + q + src4 = p + src4
q' = {00}*p + {01}*q + {00}*q + {10}*src4 = q + {10}*src4
Note: 4 is the minimum acceptable maxpq otherwise we punt to
synchronous-software path.
The DMA_PREP_CONTINUE flag indicates to the driver to reuse p and q as
sources (in the above manner) and fill the remaining slots up to maxpq
with the new sources/coefficients.
Note1: Some devices have native support for P+Q continuation and can skip
this extra work. Devices with this capability can advertise it with
dma_set_maxpq. It is up to each driver how to handle the
DMA_PREP_CONTINUE flag.
Note2: The api supports disabling the generation of P when generating Q,
this is ignored by the synchronous path but is implemented by some dma
devices to save unnecessary writes. In this case the continuation
algorithm is simplified to only reuse Q as a source.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We currently walk the parent chain when waiting for a given tx to
complete however this walk may race with the driver cleanup routine.
The routines in async_raid6_recov.c may fall back to the synchronous
path at any point so we need to be prepared to call async_tx_quiesce()
(which calls dma_wait_for_async_tx). To remove the ->parent walk we
guarantee that every time a dependency is attached ->issue_pending() is
invoked, then we can simply poll the initial descriptor until
completion.
This also allows for a lighter weight 'issue pending' implementation as
there is no longer a requirement to iterate through all the channels'
->issue_pending() routines as long as operations have been submitted in
an ordered chain. async_tx_issue_pending() is added for this case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If module_init and module_exit are nops then neither need to be defined.
[ Impact: pure cleanup ]
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Replace the flat zero_sum_result with a collection of flags to contain
the P (xor) zero-sum result, and the soon to be utilized Q (raid6 reed
solomon syndrome) zero-sum result. Use the SUM_CHECK_ namespace instead
of DMA_ since these flags will be used on non-dma-zero-sum enabled
platforms.
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use percpu memory rather than stack for storing the buffer lists used in
parity calculations. Include space for dma address conversions and pass
that to async_tx via the async_submit_ctl.scribble pointer.
[ Impact: move memory pressure from stack to heap ]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for asynchronous handling of raid6 operations move the
spare page to a percpu allocation to allow multiple simultaneous
synchronous raid6 recovery operations.
Make this allocation cpu hotplug aware to maximize allocation
efficiency.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In general, EC transaction should complete in less than 1ms, thus it is possible to merge wait for
1ms in poll mode and 1ms of interrupt transaction timeout.
Still, driver will wait 500ms for EC to complete transaction.
This significantly simplifies driver and makes it immune to problematic EC interrupt
implementations.
It also may lessen kernel start-up time by 500ms.
References: http://bugzilla.kernel.org/show_bug.cgi?id=12949
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acer Aspire 8930G laptops (and possibly others) report the battery current
as a 16-bit signed negative when it is charging. It also reports it as
0x10000 when the current is 0. This patch adds a quirk for this which
takes the absolute value of the reported current cast to an s16. This is
a DSDT bug present in the latest BIOS revision (the EC register is 16 bits
signed and the DSDT attempts to take the 16-bit two's complement of this,
which works for discharge but not charge. It also breaks zero values
because a 32-bit register is used and the high bits aren't thrown away).
I've enabled this for all Acer systems which report in mA units. This
should be safe since it won't break compliant systems unless they report a
current above 32A, which is insane. The patch also detects the valid
32-bit value -1, which indicates unknown status, and does not attempt the
fix in that case (note that this does not conflict with 16-bit -1, which
is 65535 as read normally and gets translated to 1mA).
Signed-off-by: Hector Martin <hector@marcansoft.com>
Acked-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Use VBT information to determine which DDC bus to use for CRTDCC.
Fall back to GPIOA if VBT info is not available.
Signed-off-by: David Müller <d.mueller@elsoft.ch>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Tested on: 855 (David), and 945GM, 965GM, GM45, and G45 (anholt)
In ext4_link we need to check using EXT4_LINK_MAX, and not
EXT4_DIR_LINK_MAX(), since ext4_link() is creating hard links of
regular files, and not directories.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch moves the tx_prod, tx_cons, tx_pending, tx_ring, and
tx_buffers transmit ring device members to a per-interrupt structure.
It also adds a new transmit producer mailbox member (prodmbox) and
converts the code to use it rather than a preprocessor constant.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the rx_rcb, rx_rcb_mapping, and rx_rcb_ptr return ring
device members to a per-interrupt structure. It also adds a new return
ring consumer mailbox register member (consmbox) and converts the code
to use it rather than a preprocessor constant.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the last_tag, last_tag_irq, and hw_status device
members to a per-interrupt structure. It also adds a new interrupt
mailbox member (int_mbox) and converts the code to use it rather than a
direct preprocessor constant.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the napi interrupt handler functions to accept and
use tg3_napi structures.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch migrates the ISR parameter from struct net_device to struct
tg3_napi. Checkpatch complains about the existence of the preexisting
IRQF_SAMPLE_RANDOM flag. I've opted to keep this patch conservative and
let it continue to exist until the flag gets officially purged from the
kernel.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch creates a per-interrupt data structure, moves the napi
member over, and creates a tg3 pointer back to the device structure.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Later patches will be adding MSIX support, which will complicate
interrupt initialization. This patch prepares for the integration by
breaking out the interrupt setup and teardown code into separate
functions and cleaning up the error return paths.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 5717 only uses extended buffer descriptors for the jumbo producer
ring. Extended buffer descriptors are available on all devices that
support a separate jumbo producer ring so make the change universal.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch migrates most of the rx producer ring variables to a new
tg3_rx_prodring_set structure and modifies the code accordingly.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Later patches are going to complicate the ring initialization routines.
This patch breaks out the setup and teardown of the rx producer rings
into separate functions to make the code more readable.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch attempts to document the various rx buffer sizes used by the
driver and how they relate to each other.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves where the jumbo capable and msi support flags are
located. This is prep work for the addition of msix support flags.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch separates the code that sets up the mini producer ring from
the code that sets up the jumbo producer rings. The 5717 asic rev
devices do not have a mini ring, but do have a jumbo frame
implementation similar to the 5704 and previous devices.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes up the NVRAM detection switch statements to conform
to the kernel coding style.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new device ID for those 5785 devices that will only
use 10/100 phys.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device firmware uses the MDIO bus during early setup. If the driver
modifies the MDIO bus configuration while it is in use by the firmware,
any number of bad things can happen. This patch delays MDIO setup until
after the firmware posts its magic signature, signifying initialization
is complete.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
_BQC doesn't return a value listed in _BCL method.
http://bugzilla.kernel.org/show_bug.cgi?id=13511
ingore the buggy _BQC method in this case
Signed-off-by: Vladimir Serbinenko <phcoder@gmail.com>
Signed-off-by: Scott Howard <showard314@gmail.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The 900A provides hotplug notifications on a different ACPI object to
other models.
Reported-by: Trevor <trevor.chart@gmail.com>
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
Signed-off-by: Len Brown <len.brown@intel.com>
rfkill_unregister() should always be followed by rfkill_destroy()
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
Signed-off-by: Len Brown <len.brown@intel.com>
make it use the node from irq_desc.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A95C392.5050903@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
... to return irq_desc node info without #ifdefs at the callsites.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <4A95C350.8060308@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>