RSS had previously been disabled when DCB was enabled because
DCB was single queued per traffic class. Now that DCB implements
multiple Tx/Rx rings per traffic class enable RSS.
Here RSS hashes across the queues in the traffic class.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain.@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This enables multiple {Tx|Rx} rings per traffic class while in DCB
mode. In order to get this working as expected the tc_to_tx net
device mapping is configured as well as the prio_tc_map.
skb priorities are mapped across a range of queue pairs to get
a distribution per traffic class. The maximum number of
queue pairs used while in DCB mode is capped at 64. The hardware
max is actually 128 queues but 64 is sufficient for now and
allocating more seemed a bit excessive. It is easy enough to
increase the cap later if need be.
To get the 802.1Q priority tags inserted correctly ixgbe was
previously using the skb queue_mapping field to directly set
the 802.1Q priority. This no longer works because we have removed
the 1:1 mapping between queues and traffic class. Each ring
is aligned with an 802.1Qaz traffic class so here we add an
extra field to the ring struct to identify the 802.1Q traffic
class. This uses an extra byte of the ixgbe_ring struct
fortunately there was a 2byte hole,
struct ixgbe_ring {
void * desc; /* 0 8 */
struct device * dev; /* 8 8 */
struct net_device * netdev; /* 16 8 */
union {
struct ixgbe_tx_buffer * tx_buffer_info; /* 8 */
struct ixgbe_rx_buffer * rx_buffer_info; /* 8 */
}; /* 24 8 */
long unsigned int state; /* 32 8 */
u8 atr_sample_rate; /* 40 1 */
u8 atr_count; /* 41 1 */
u16 count; /* 42 2 */
u16 rx_buf_len; /* 44 2 */
u16 next_to_use; /* 46 2 */
u16 next_to_clean; /* 48 2 */
u8 queue_index; /* 50 1 */
u8 reg_idx; /* 51 1 */
u16 work_limit; /* 52 2 */
/* XXX 2 bytes hole, try to pack */
u8 * tail; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
Now we can set the VLAN priority directly and it will be
correct. User space can indicate the 802.1Qaz priority
using the SO_PRIORITY setsocket() option and QOS layer will
steer the skb to the correct rings. Additionally using
the multiq qdisc with a queue_mapping action works as
well.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove ixgbe_fcoe_getapp() and use the generic kernel
routine instead. Also add application priority to the
kernel maintained list on setapp so applications and
stacks can query the value.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Implement ieee_setapp dcbnl ops in ixgbe. This is required
to setup FCoE which requires dedicated resources. If the
app data is not for FCoE then no action is taken in ixgbe
except to add it to the dcb_app_list.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This implements dcbnl get and set capabilities ops. The
devices supported by ixgbe can be configured to run in
IEEE or CEE modes but not both.
With the DCBX set capabilities bit we add an explicit
signal that must be used to toggle between these modes.
This patch adds logic to fail the CEE command set_hw_all()
which programs the device with a CEE configuration if
the CEE caps bit is not set. Similarly, IEEE set
commands will fail if the IEEE caps bit is not set. We
allow most CEE config set commands to occur because they
do not touch the hardware until set_hw_all() is called.
The one exception to the above is the {set|get}app routines.
These must always be protected by caps bits to ensure
side effects do not corrupt the current configured mode.
By requiring the caps bit to be set correctly we can
maintain a consistent configuration in the hardware
for CEE or IEEE modes and prevent partial hardware
configurations that may occur if user space does
not send a complete IEEE or CEE configurations.
It is expected that user space will signal a DCBX mode
before programming device.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch add DMA Coalescing which is a power-saving feature that
coalesces DMA writes in order to stay in a low-power state as much
as possible. Feature is disabled by default.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds functions and functions pointers to accommodate
differences between NVM interfaces and options for i350 devices,
82580 devices and the rest.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The kernel version string is off by a major version number since
new silicon was just introduced and also uses the wrong format for
the version postfix.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Josef had changed shrink_delalloc to exit after three shrink
attempts, which wasn't quite enough because new writers could
race in and steal free space.
But it also fixed deadlocks and stalls as we tried to recover
delalloc reservations. The code was tweaked to loop 1024
times, and would reset the counter any time a small amount
of progress was made. This was too drastic, and with a
lot of writers we can end up stuck in shrink_delalloc forever.
The shrink_delalloc loop is fairly complex because the caller is looping
too, and the caller will go ahead and force a transaction commit to make
sure we reclaim space.
This reworks things to exit shrink_delalloc when we've forced some
writeback and the delalloc reservations have gone down. This means
the writeback has not just started but has also finished at
least some of the metadata changes required to reclaim delalloc
space.
If we've got this wrong, we're returning ENOSPC too early, which
is a big improvement over the current behavior of hanging the machine.
Test 224 in xfstests hammers on this nicely, and with 1000 writers
trying to fill a 1GB drive we get our first ENOSPC at 93% full. The
other writers are able to continue until we get 100%.
This is a worst case test for btrfs because the 1000 writers are doing
small IO, and the small FS size means we don't have a lot of room
for metadata chunks.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
futex,plist: Pass the real head of the priority list to plist_del()
futex,plist: Remove debug lock assignment from plist_node
plist: Shrink struct plist_head
plist: Add priority list test
The distance transforming in numa_emulation() used to call
numa_set_distance() for all MAX_NUMNODES * MAX_NUMNODES node
combinations regardless of which are enabled. As numa_set_distance()
ignores all out-of-bound distance settings, this doesn't cause any
problem other than looping unnecessarily many times during boot.
However, as MAX_NUMNODES * MAX_NUMNODES can be pretty high, update the
code such that it iterates through only the enabled combinations.
Yinghai Lu identified the issue and provided an initial patch to
address the issue; however, the patch was incorrect in that it didn't
build emulated distance table when there's no physical distance table
and unnecessarily complex.
http://thread.gmane.org/gmane.linux.kernel/1107986/focus=1107988
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
This patch splits the pll configs up on pll versions. This allows
easy adding of other known good pll values. Additionally it made it
possible to remove invalid configurations resulting in better
behaviour for such cases. The resulting clocks are no longer stored
resulting in some computing overhead on each mode change.
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
The clocks can be easily recalculated by the timing and refresh value.
This brings us one step closer to removing VIAs modetable and use
generic ones and being easier extensible.
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Before this patch only clocks that perfectly match were used and if
none existed this was not handled properly. This patch changes this
to always use the closest clock supported. This should behave like
before for clocks that have a perfect match but be much saner for
clocks which are slightly off.
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
This patch removes the direct lookup table for resolution+refresh and
pixclock by calculating this information from the mode table. Removes a
lot of dupllication and error potential by just doing a little more
calculations on each mode change.
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
On suspend we disable all interrupts in the core code, but this does
not mask the interrupt line in the default implementation as we use a
lazy disable approach. That means we mark the interrupt disabled, but
leave the hardware unmasked. That's an optimization because we avoid
the hardware access for the common case where no interrupt happens
after we marked it disabled. If an interrupt happens, then the
interrupt flow handler masks the line at the hardware level and marks
it pending.
Suspend makes use of this delayed disable as it "disables" all
interrupts when preparing the suspend transition. Right before the
system goes into hardware suspend state it checks whether one of the
interrupts which is marked as a wakeup interrupt came in after
disabling it.
Most interrupt chips have a separate register which selects the
interrupts which can wake up the system from suspend, so we don't have
to mask any on the non wakeup interrupts.
But now we have to deal with brilliant designed hardware which lacks
such a wakeup configuration facility. For such hardware it's necessary
to mask all non wakeup interrupts before going into suspend in order
to avoid the wakeup from random interrupts.
Rather than working around this in the affected interrupt chip
implementations we can solve this elegant in the core code itself.
Add a flag IRQCHIP_MASK_ON_SUSPEND which can be set by the irq chip
implementation to indicate, that the interrupts which are not selected
as wakeup sources must be masked in the suspend path. Mask them in the
loop which checks the wakeup interrupts pending flag.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
LKML-Reference: <alpine.LFD.2.00.1103112112310.2787@localhost6.localdomain6>
The function gen_74x164_remove and mc33880_remove are used only wrapped
by __devexit_p so define it using __devexit.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The device table is required to load modules based on modaliases.
After adding MODULE_DEVICE_TABLE, below entries will be added to modules.pcimap:
pch_gpio 0x00008086 0x00008803 0xffffffff 0xffffffff 0x00000000 0x00000000 0x0
ml_ioh_gpio 0x000010db 0x0000802e 0xffffffff 0xffffffff 0x00000000 0x00000000 0x0
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
mpc23s17 is very similar to the mcp23s08, except that registers are 16bit
wide, so extend the interface to work with both variants.
The s17 variant also has an additional address pin, so adjust platform
data structure to support up to 8 devices per SPI chipselect.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The latest binutils (2.21.0.20110302/Ubuntu) breaks the build
yet another time, under CONFIG_XEN=y due to a .size directive that
refers to a slightly differently named (hence, to the now very
strict and unforgiving assembler, non-existent) symbol.
[ mingo:
This unnecessary build breakage caused by new binutils
version 2.21 gets escallated back several kernel releases spanning
several years of Linux history, affecting over 130,000 upstream
kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially
affecting all major Linux distro kernel configs).
Git annotate tells us that this slight debug symbol code mismatch
bug has been introduced in 2008 in commit 3d75e1b8:
3d75e1b8 (Jeremy Fitzhardinge 2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs)
The 'bug' is just a slight assymetry in ENTRY()/END()
debug-symbols sequences, with lots of assembly code between the
ENTRY() and the END():
ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs)
...
END(do_hypervisor_callback)
Human reviewers almost never catch such small mismatches, and binutils
never even warned about it either.
This new binutils version thus breaks the Xen build on all upstream kernels
since v2.6.27, out of the blue.
This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels
impossible on such binutils, for a bisection window of over hundred
thousand historic commits. (!)
This is a major fail on the side of binutils and binutils needs to turn
this show-stopper build failure into a warning ASAP. ]
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The original dma_halt() function set the CA (channel abort) bit on both
the 83xx and 85xx controllers. This is incorrect on the 83xx, where this
bit means TEM (transfer error mask) instead. The 83xx doesn't support
channel abort, so we only do this operation on 85xx.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This merges the fsl_chan_ld_cleanup() function into the dma_do_tasklet()
function to reduce locking overhead. In the best case, we will be able
to keep the DMA controller busy while we are freeing used descriptors.
In all cases, the spinlock is grabbed two times fewer than before on
each transaction.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Previous to this patch, the dma_run_dependencies() function has been
called while holding desc_lock. This function can call tx_submit() for
other descriptors, which may try to re-grab the lock. Avoid this by
moving the descriptors to be cleaned up to a temporary list, and
dropping the lock before cleanup.
At the same time, add support for automatic unmapping of src and dst
buffers, as offered by the DMAEngine API.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Enabling poisoning in the dmapool API quickly showed that the DMA
controller was fetching descriptors that should not have been in use.
This has caused intermittent controller lockups during testing.
I have been unable to figure out the exact set of conditions which cause
this to happen. However, I believe it is related to the driver using the
hardware registers to track whether the controller is busy or not. The
code can incorrectly decide that the hardware is idle due to lag between
register writes and the hardware actually becoming busy.
To fix this, the driver has been reworked to explicitly track the state
of the hardware, rather than try to guess what it is doing based on the
register values.
This has passed dmatest with 10 threads per channel, 100000 iterations
per thread several times without error. Previously, this would fail
within a few seconds.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This fixes some minor violations of the coding style. It also changes
the style of the device_prep_dma_*() function definitions so they are
identical.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This adds better tracking to link descriptor allocations, callbacks, and
frees. This makes it much easier to track errors with link descriptors.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This makes debugging the driver much easier when multiple channels are
running concurrently. In addition, you can see how much descriptor
memory each channel has allocated via the dmapool API in sysfs.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This is a purely cosmetic cleanup. It is nice to have related functions
right next to each other in the code.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The dmatest code relies on the DMAEngine API to automatically call
dma_unmap_single() on src buffers. The flags it passes are incorrect,
fix them.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch updates HRT driver for supporting PM.
The resume function of PWM4 timer which is used clocksource is needed
when kernel is resuming for restarting.
Signed-off-by: Jaecheol Lee <jc.lee@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds support suspend to ram for EXYNOS4210.
As a note, this includes function of outer cache flush
because it is used before entering PM.
Signed-off-by: Jaecheol Lee <jc.lee@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds definitions of PMU and CMU registers for EXYNOS4 PM.
Signed-off-by: Jaecheol Lee <jc.lee@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch updates s5pv210_defconfig and s5p64x0_defconfig for
HRT support and CONFIG_S5P_HRT is used for its configuration.
Signed-off-by: Sangbeom Kim <sbkim73@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds support HRT for machines of S5P64X0 and S5PV210.
Basically, PWM Timer3 is used for clockevent and PWM Timer4 is
used for clocksource. Since PWM Timer3 is used for other purpose,
PWM Timer2 is used on SMDKV210.
Signed-off-by: Sangbeom Kim <sbkim73@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds support HR-Timer(High Resolution Timer) and dynamic
tick system for S5P SoCs. There are many clock sources for HR-Timer
on S5P SoCs. The PWM timer, RTC, System Timer, and MCT can be used
for clock source.
This patch can only support PWM timer for clock source of S5P64X0
and S5PV210.
Signed-off-by: Sangbeom Kim <sbkim73@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
* 'for_dan' of git://git.infradead.org/users/vkoul/slave-dma:
drivers, pch_dma: Fix warning when CONFIG_PM=n.
dmaengine/dw_dmac fix: use readl & writel instead of __raw_readl & __raw_writel
avr32: at32ap700x: Specify DMA Flow Controller, Src and Dst msize
dw_dmac: Setting Default Burst length for transfers as 16.
dw_dmac: Allow src/dst msize & flow controller to be configured at runtime
dw_dmac: Changing type of src_master and dest_master to u8.
dw_dmac: Pass Channel Priority from platform_data
dw_dmac: Pass Channel Allocation Order from platform_data
dw_dmac: Mark all tx_descriptors with DMA_CRTL_ACK after xfer finish
dw_dmac: Change value of DWC_MAX_COUNT to 4095.
dw_dmac: Adding support for 64 bit access width for memcpy xfers
dw_dmac: Calling dwc_scan_descriptors from dwc_tx_status() after taking lock
dw_dmac: Move single descriptor from dwc->queue to dwc->active_list in dwc_complete_all
dw_dmac: Replace module_init() with subsys_initcall()
dw_dmac: Remove compilation dependency from AVR32 and put on HAVE_CLK
dmaengine: mxs-dma: add dma support for i.MX23/28
pch_dma: set the number of array correctly
pch_dma: fix kernel error issue
The new xfs_alert_tag() used a variable named "panic",
and that is to be avoided. Rename it.
Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>