Commit Graph

275188 Commits

Author SHA1 Message Date
Archit Taneja
11354dd58d OMAPDSS/OMAP_VOUT: Fix incorrect OMAP3-alpha compatibility setting
On OMAP3, in order to enable alpha blending for LCD and TV managers, we needed
to set LCDALPHABLENDERENABLE/TVALPHABLENDERENABLE bits in DISPC_CONFIG. On
OMAP4, alpha blending is always enabled by default, if the above bits are set,
we switch to an OMAP3 compatibility mode where the zorder values in the pipeline
attribute registers are ignored and a fixed priority is configured.

Rename the manager_info member "alpha_enabled" to "partial_alpha_enabled" for
more clarity. Introduce two dss_features FEAT_ALPHA_FIXED_ZORDER and
FEAT_ALPHA_FREE_ZORDER which represent OMAP3-alpha compatibility mode and OMAP4
alpha mode respectively. Introduce an overlay cap for ZORDER. The DSS2 user is
expected to check for the ZORDER cap, if an overlay doesn't have this cap, the
user is expected to set the parameter partial_alpha_enabled. If the overlay has
ZORDER cap, the DSS2 user can assume that alpha blending is already enabled.

Don't support OMAP3 compatibility mode for now. Trying to read/write to
alpha_blending_enabled sysfs attribute issues a warning for OMAP4 and does not
set the LCDALPHABLENDERENABLE/TVALPHABLENDERENABLE bits.

Change alpha_enabled to partial_alpha_enabled in the omap_vout driver. Use
overlay cap "OMAP_DSS_OVL_CAP_GLOBAL_ALPHA" to check if overlay supports alpha
blending or not. Replace this with checks for VIDEO1 pipeline.

Cc: linux-media@vger.kernel.org
Cc: Lajos Molnar <molnar@ti.com>
Signed-off-by: Archit Taneja <archit@ti.com>
Acked-by: Vaibhav Hiremath <hvaibhav@ti.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2011-10-03 16:51:54 +03:00
Pierre-Louis Bossart
14bc52b8fe ALSA: hda/hdmi: expose ELD control
Applications may want to read ELD information to
understand what codecs are supported on the HDMI
receiver and handle the a-v delay for better lip-sync.

ELD information is exposed in a device-specific
IFACE_PCM kcontrol. Tested both with amixer and
PulseAudio; with a corresponding patch passthrough modes
are enabled automagically.

ELD control size is set to zero in case of errors or
wrong configurations. No notifications are implemented
for now, it is expected that jack detection is used to
reconfigure the audio outputs.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-10-03 15:48:12 +02:00
Marc Zyngier
1e7c5fd294 genirq: percpu: allow interrupt type to be set at enable time
As request_percpu_irq() doesn't allow for a percpu interrupt to have
its type configured (it is generally impossible to configure it on all
CPUs at once), add a 'type' argument to enable_percpu_irq().

This allows some low-level, board specific init code to be switched to
a generic API.

[ tglx: Added WARN_ON argument ]

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-03 15:35:27 +02:00
Marc Zyngier
31d9d9b6d8 genirq: Add support for per-cpu dev_id interrupts
The ARM GIC interrupt controller offers per CPU interrupts (PPIs),
which are usually used to connect local timers to each core. Each CPU
has its own private interface to the GIC, and only sees the PPIs that
are directly connect to it.

While these timers are separate devices and have a separate interrupt
line to a core, they all use the same IRQ number.

For these devices, request_irq() is not the right API as it assumes
that an IRQ number is visible by a number of CPUs (through the
affinity setting), but makes it very awkward to express that an IRQ
number can be handled by all CPUs, and yet be a different interrupt
line on each CPU, requiring a different dev_id cookie to be passed
back to the handler.

The *_percpu_irq() functions is designed to overcome these
limitations, by providing a per-cpu dev_id vector:

int request_percpu_irq(unsigned int irq, irq_handler_t handler,
		   const char *devname, void __percpu *percpu_dev_id);
void free_percpu_irq(unsigned int, void __percpu *);
int setup_percpu_irq(unsigned int irq, struct irqaction *new);
void remove_percpu_irq(unsigned int irq, struct irqaction *act);
void enable_percpu_irq(unsigned int irq);
void disable_percpu_irq(unsigned int irq);

The API has a number of limitations:
- no interrupt sharing
- no threading
- common handler across all the CPUs

Once the interrupt is requested using setup_percpu_irq() or
request_percpu_irq(), it must be enabled by each core that wishes its
local interrupt to be delivered.

Based on an initial patch by Thomas Gleixner.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-03 15:35:26 +02:00
Axel Lin
9b5999b1bc ASoC: Fix setting update bits for WM8741_DACRMSB_ATTENUATION
After checking the code and datasheet, I think what we want in the second
snd_soc_update_bits call is to update WM8741_DACRMSB_ATTENUATION register
instead of WM8741_DACRLSB_ATTENUATION.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-03 14:15:40 +01:00
Axel Lin
f6a74ced46 ASoC: txx9: Add __exit_p at necessary place
We have __exit annotation for txx9aclc_generic_remove(),
thus add __devexit_p to wrap it.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-03 14:15:40 +01:00
Axel Lin
1c6927f872 ASoC: Staticise ep93xx_ac97_dai
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Mika Westerberg <mika.westerberg@iki.fi>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-03 14:15:32 +01:00
Axel Lin
2e4891a5ec ASoC: Staticise simtec_audio_resume()
It is exported via resume callback of struct dev_pm_ops rather than referenced
directly and so should be staticised.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-03 14:15:23 +01:00
Axel Lin
43419b80fa ASoC: Remove needless codec->dapm.bias_level assignment to SND_SOC_BIAS_OFF
This assignment is done by the snd_soc_register_codec so there is no need
to redo it in probe function of a codec driver.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-03 14:15:22 +01:00
Wu Fengguang
b00949aa2d writeback: per-bdi background threshold
One thing puzzled me is that in JBOD case, the per-disk writeout
performance is smaller than the corresponding single-disk case even
when they have comparable bdi_thresh. Tracing shows find that in single
disk case, bdi_writeback is always kept high while in JBOD case, it
could drop low from time to time and correspondingly bdi_reclaimable
could sometimes rush high.

The fix is to watch bdi_reclaimable and kick background writeback as
soon as it goes high. This resembles the global background threshold
but in per-bdi manner. The trick is, as long as bdi_reclaimable does
not go high, bdi_writeback naturally won't go low because
bdi_reclaimable+bdi_writeback ~= bdi_thresh.

With less fluctuated writeback pages, JBOD performance is observed to
increase noticeably in various cases.

vmstat:nr_written values before/after patch:

  3.1.0-rc4-wo-underrun+      3.1.0-rc4-bgthresh3+  
------------------------  ------------------------  
               125596480       +25.9%    158179363  JBOD-10HDD-16G/ext4-100dd-1M-24p-16384M-20:10-X
                61790815      +110.4%    130032231  JBOD-10HDD-16G/ext4-10dd-1M-24p-16384M-20:10-X
                58853546        -0.1%     58823828  JBOD-10HDD-16G/ext4-1dd-1M-24p-16384M-20:10-X
               110159811       +24.7%    137355377  JBOD-10HDD-16G/xfs-100dd-1M-24p-16384M-20:10-X
                69544762       +10.8%     77080047  JBOD-10HDD-16G/xfs-10dd-1M-24p-16384M-20:10-X
                50644862        +0.5%     50890006  JBOD-10HDD-16G/xfs-1dd-1M-24p-16384M-20:10-X
                42677090       +28.0%     54643527  JBOD-10HDD-thresh=100M/ext4-100dd-1M-24p-16384M-100M:10-X
                47491324       +13.3%     53785605  JBOD-10HDD-thresh=100M/ext4-10dd-1M-24p-16384M-100M:10-X
                52548986        +0.9%     53001031  JBOD-10HDD-thresh=100M/ext4-1dd-1M-24p-16384M-100M:10-X
                26783091       +36.8%     36650248  JBOD-10HDD-thresh=100M/xfs-100dd-1M-24p-16384M-100M:10-X
                35526347       +14.0%     40492312  JBOD-10HDD-thresh=100M/xfs-10dd-1M-24p-16384M-100M:10-X
                44670723        -1.1%     44177606  JBOD-10HDD-thresh=100M/xfs-1dd-1M-24p-16384M-100M:10-X
               127996037       +22.4%    156719990  JBOD-10HDD-thresh=2G/ext4-100dd-1M-24p-16384M-2048M:10-X
                57518856        +3.8%     59677625  JBOD-10HDD-thresh=2G/ext4-10dd-1M-24p-16384M-2048M:10-X
                51919909       +12.2%     58269894  JBOD-10HDD-thresh=2G/ext4-1dd-1M-24p-16384M-2048M:10-X
                86410514       +79.0%    154660433  JBOD-10HDD-thresh=2G/xfs-100dd-1M-24p-16384M-2048M:10-X
                40132519       +38.6%     55617893  JBOD-10HDD-thresh=2G/xfs-10dd-1M-24p-16384M-2048M:10-X
                48423248        +7.5%     52042927  JBOD-10HDD-thresh=2G/xfs-1dd-1M-24p-16384M-2048M:10-X
               206041046       +44.1%    296846536  JBOD-10HDD-thresh=4G/xfs-100dd-1M-24p-16384M-4096M:10-X
                72312903       -19.4%     58272885  JBOD-10HDD-thresh=4G/xfs-10dd-1M-24p-16384M-4096M:10-X
                50635672        -0.5%     50384787  JBOD-10HDD-thresh=4G/xfs-1dd-1M-24p-16384M-4096M:10-X
                68308534      +115.7%    147324758  JBOD-10HDD-thresh=800M/ext4-100dd-1M-24p-16384M-800M:10-X
                57882933       +14.5%     66269621  JBOD-10HDD-thresh=800M/ext4-10dd-1M-24p-16384M-800M:10-X
                52183472       +12.8%     58855181  JBOD-10HDD-thresh=800M/ext4-1dd-1M-24p-16384M-800M:10-X
                53788956       +94.2%    104460352  JBOD-10HDD-thresh=800M/xfs-100dd-1M-24p-16384M-800M:10-X
                44493342       +35.5%     60298210  JBOD-10HDD-thresh=800M/xfs-10dd-1M-24p-16384M-800M:10-X
                42641209       +18.9%     50681038  JBOD-10HDD-thresh=800M/xfs-1dd-1M-24p-16384M-800M:10-X

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:58 +08:00
Wu Fengguang
8927f66c4e writeback: dirty position control - bdi reserve area
Keep a minimal pool of dirty pages for each bdi, so that the disk IO
queues won't underrun. Also gently increase a small bdi_thresh to avoid
it stuck in 0 for some light dirtied bdi.

It's particularly useful for JBOD and small memory system.

It may result in (pos_ratio > 1) at the setpoint and push the dirty
pages high. This is more or less intended because the bdi is in the
danger of IO queue underflow.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:58 +08:00
Wu Fengguang
57fc978cfb writeback: control dirty pause time
The dirty pause time shall ultimately be controlled by adjusting
nr_dirtied_pause, since there is relationship

	pause = pages_dirtied / task_ratelimit

Assuming

	pages_dirtied ~= nr_dirtied_pause
	task_ratelimit ~= dirty_ratelimit

We get

	nr_dirtied_pause ~= dirty_ratelimit * desired_pause

Here dirty_ratelimit is preferred over task_ratelimit because it's
more stable.

It's also important to limit possible large transitional errors:

- bw is changing quickly
- pages_dirtied << nr_dirtied_pause on entering dirty exceeded area
- pages_dirtied >> nr_dirtied_pause on btrfs (to be improved by a
  separate fix, but still expect non-trivial errors)

So we end up using the above formula inside clamp_val().

The best test case for this code is to run 100 "dd bs=4M" tasks on
btrfs and check its pause time distribution.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:58 +08:00
Wu Fengguang
c8462cc9de writeback: limit max dirty pause time
Apply two policies to scale down the max pause time for

1) small number of concurrent dirtiers
2) small memory system (comparing to storage bandwidth)

MAX_PAUSE=200ms may only be suitable for high end servers with lots of
concurrent dirtiers, where the large pause time can reduce much overheads.

Otherwise, smaller pause time is desirable whenever possible, so as to
get good responsiveness and smooth user experiences. It's actually
required for good disk utilization in the case when all the dirty pages
can be synced to disk within MAX_PAUSE=200ms.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:57 +08:00
Wu Fengguang
143dfe8611 writeback: IO-less balance_dirty_pages()
As proposed by Chris, Dave and Jan, don't start foreground writeback IO
inside balance_dirty_pages(). Instead, simply let it idle sleep for some
time to throttle the dirtying task. In the mean while, kick off the
per-bdi flusher thread to do background writeback IO.

RATIONALS
=========

- disk seeks on concurrent writeback of multiple inodes (Dave Chinner)

  If every thread doing writes and being throttled start foreground
  writeback, it leads to N IO submitters from at least N different
  inodes at the same time, end up with N different sets of IO being
  issued with potentially zero locality to each other, resulting in
  much lower elevator sort/merge efficiency and hence we seek the disk
  all over the place to service the different sets of IO.
  OTOH, if there is only one submission thread, it doesn't jump between
  inodes in the same way when congestion clears - it keeps writing to
  the same inode, resulting in large related chunks of sequential IOs
  being issued to the disk. This is more efficient than the above
  foreground writeback because the elevator works better and the disk
  seeks less.

- lock contention and cache bouncing on concurrent IO submitters (Dave Chinner)

  With this patchset, the fs_mark benchmark on a 12-drive software RAID0 goes
  from CPU bound to IO bound, freeing "3-4 CPUs worth of spinlock contention".

  * "CPU usage has dropped by ~55%", "it certainly appears that most of
    the CPU time saving comes from the removal of contention on the
    inode_wb_list_lock" (IMHO at least 10% comes from the reduction of
    cacheline bouncing, because the new code is able to call much less
    frequently into balance_dirty_pages() and hence access the global
    page states)

  * the user space "App overhead" is reduced by 20%, by avoiding the
    cacheline pollution by the complex writeback code path

  * "for a ~5% throughput reduction", "the number of write IOs have
    dropped by ~25%", and the elapsed time reduced from 41:42.17 to
    40:53.23.

  * On a simple test of 100 dd, it reduces the CPU %system time from 30% to 3%,
    and improves IO throughput from 38MB/s to 42MB/s.

- IO size too small for fast arrays and too large for slow USB sticks

  The write_chunk used by current balance_dirty_pages() cannot be
  directly set to some large value (eg. 128MB) for better IO efficiency.
  Because it could lead to more than 1 second user perceivable stalls.
  Even the current 4MB write size may be too large for slow USB sticks.
  The fact that balance_dirty_pages() starts IO on itself couples the
  IO size to wait time, which makes it hard to do suitable IO size while
  keeping the wait time under control.

  Now it's possible to increase writeback chunk size proportional to the
  disk bandwidth. In a simple test of 50 dd's on XFS, 1-HDD, 3GB ram,
  the larger writeback size dramatically reduces the seek count to 1/10
  (far beyond my expectation) and improves the write throughput by 24%.

- long block time in balance_dirty_pages() hurts desktop responsiveness

  Many of us may have the experience: it often takes a couple of seconds
  or even long time to stop a heavy writing dd/cp/tar command with
  Ctrl-C or "kill -9".

- IO pipeline broken by bumpy write() progress

  There are a broad class of "loop {read(buf); write(buf);}" applications
  whose read() pipeline will be under-utilized or even come to a stop if
  the write()s have long latencies _or_ don't progress in a constant rate.
  The current threshold based throttling inherently transfers the large
  low level IO completion fluctuations to bumpy application write()s,
  and further deteriorates with increasing number of dirtiers and/or bdi's.

  For example, when doing 50 dd's + 1 remote rsync to an XFS partition,
  the rsync progresses very bumpy in legacy kernel, and throughput is
  improved by 67% by this patchset. (plus the larger write chunk size,
  it will be 93% speedup).

  The new rate based throttling can support 1000+ dd's with excellent
  smoothness, low latency and low overheads.

For the above reasons, it's much better to do IO-less and low latency
pauses in balance_dirty_pages().

Jan Kara, Dave Chinner and me explored the scheme to let
balance_dirty_pages() wait for enough writeback IO completions to
safeguard the dirty limit. However it's found to have two problems:

- in large NUMA systems, the per-cpu counters may have big accounting
  errors, leading to big throttle wait time and jitters.

- NFS may kill large amount of unstable pages with one single COMMIT.
  Because NFS server serves COMMIT with expensive fsync() IOs, it is
  desirable to delay and reduce the number of COMMITs. So it's not
  likely to optimize away such kind of bursty IO completions, and the
  resulted large (and tiny) stall times in IO completion based throttling.

So here is a pause time oriented approach, which tries to control the
pause time in each balance_dirty_pages() invocations, by controlling
the number of pages dirtied before calling balance_dirty_pages(), for
smooth and efficient dirty throttling:

- avoid useless (eg. zero pause time) balance_dirty_pages() calls
- avoid too small pause time (less than   4ms, which burns CPU power)
- avoid too large pause time (more than 200ms, which hurts responsiveness)
- avoid big fluctuations of pause times

It can control pause times at will. The default policy (in a followup
patch) will be to do ~10ms pauses in 1-dd case, and increase to ~100ms
in 1000-dd case.

BEHAVIOR CHANGE
===============

(1) dirty threshold

Users will notice that the applications will get throttled once crossing
the global (background + dirty)/2=15% threshold, and then balanced around
17.5%. Before patch, the behavior is to just throttle it at 20% dirtyable
memory in 1-dd case.

Since the task will be soft throttled earlier than before, it may be
perceived by end users as performance "slow down" if his application
happens to dirty more than 15% dirtyable memory.

(2) smoothness/responsiveness

Users will notice a more responsive system during heavy writeback.
"killall dd" will take effect instantly.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:57 +08:00
Wu Fengguang
9d823e8f6b writeback: per task dirty rate limit
Add two fields to task_struct.

1) account dirtied pages in the individual tasks, for accuracy
2) per-task balance_dirty_pages() call intervals, for flexibility

The balance_dirty_pages() call interval (ie. nr_dirtied_pause) will
scale near-sqrt to the safety gap between dirty pages and threshold.

The main problem of per-task nr_dirtied is, if 1k+ tasks start dirtying
pages at exactly the same time, each task will be assigned a large
initial nr_dirtied_pause, so that the dirty threshold will be exceeded
long before each task reached its nr_dirtied_pause and hence call
balance_dirty_pages().

The solution is to watch for the number of pages dirtied on each CPU in
between the calls into balance_dirty_pages(). If it exceeds ratelimit_pages
(3% dirty threshold), force call balance_dirty_pages() for a chance to
set bdi->dirty_exceeded. In normal situations, this safeguarding
condition is not expected to trigger at all.

On the sqrt in dirty_poll_interval():

It will serve as an initial guess when dirty pages are still in the
freerun area.

When dirty pages are floating inside the dirty control scope [freerun,
limit], a followup patch will use some refined dirty poll interval to
get the desired pause time.

   thresh-dirty (MB)    sqrt
		   1      16
		   2      22
		   4      32
		   8      45
		  16      64
		  32      90
		  64     128
		 128     181
		 256     256
		 512     362
		1024     512

The above table means, given 1MB (or 1GB) gap and the dd tasks polling
balance_dirty_pages() on every 16 (or 512) pages, the dirty limit won't
be exceeded as long as there are less than 16 (or 512) concurrent dd's.

So sqrt naturally leads to less overheads and more safe concurrent tasks
for large memory servers, which have large (thresh-freerun) gaps.

peter: keep the per-CPU ratelimit for safeguarding the 1k+ tasks case

CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Andrea Righi <andrea@betterlinux.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:57 +08:00
Wu Fengguang
7381131cbc writeback: stabilize bdi->dirty_ratelimit
There are some imperfections in balanced_dirty_ratelimit.

1) large fluctuations

The dirty_rate used for computing balanced_dirty_ratelimit is merely
averaged in the past 200ms (very small comparing to the 3s estimation
period for write_bw), which makes rather dispersed distribution of
balanced_dirty_ratelimit.

It's pretty hard to average out the singular points by increasing the
estimation period. Considering that the averaging technique will
introduce very undesirable time lags, I give it up totally. (btw, the 3s
write_bw averaging time lag is much more acceptable because its impact
is one-way and therefore won't lead to oscillations.)

The more practical way is filtering -- most singular
balanced_dirty_ratelimit points can be filtered out by remembering some
prev_balanced_rate and prev_prev_balanced_rate. However the more
reliable way is to guard balanced_dirty_ratelimit with task_ratelimit.

2) due to truncates and fs redirties, the (write_bw <=> dirty_rate)
match could become unbalanced, which may lead to large systematical
errors in balanced_dirty_ratelimit. The truncates, due to its possibly
bumpy nature, can hardly be compensated smoothly. So let's face it. When
some over-estimated balanced_dirty_ratelimit brings dirty_ratelimit
high, dirty pages will go higher than the setpoint. task_ratelimit will
in turn become lower than dirty_ratelimit.  So if we consider both
balanced_dirty_ratelimit and task_ratelimit and update dirty_ratelimit
only when they are on the same side of dirty_ratelimit, the systematical
errors in balanced_dirty_ratelimit won't be able to bring
dirty_ratelimit far away.

The balanced_dirty_ratelimit estimation may also be inaccurate near
@limit or @freerun, however is less an issue.

3) since we ultimately want to

- keep the fluctuations of task ratelimit as small as possible
- keep the dirty pages around the setpoint as long time as possible

the update policy used for (2) also serves the above goals nicely:
if for some reason the dirty pages are high (task_ratelimit < dirty_ratelimit),
and dirty_ratelimit is low (dirty_ratelimit < balanced_dirty_ratelimit),
there is no point to bring up dirty_ratelimit in a hurry only to hurt
both the above two goals.

So, we make use of task_ratelimit to limit the update of dirty_ratelimit
in two ways:

1) avoid changing dirty rate when it's against the position control target
   (the adjusted rate will slow down the progress of dirty pages going
   back to setpoint).

2) limit the step size. task_ratelimit is changing values step by step,
   leaving a consistent trace comparing to the randomly jumping
   balanced_dirty_ratelimit. task_ratelimit also has the nice smaller
   errors in stable state and typically larger errors when there are big
   errors in rate.  So it's a pretty good limiting factor for the step
   size of dirty_ratelimit.

Note that bdi->dirty_ratelimit is always tracking balanced_dirty_ratelimit.
task_ratelimit is merely used as a limiting factor.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:57 +08:00
Wu Fengguang
be3ffa2764 writeback: dirty rate control
It's all about bdi->dirty_ratelimit, which aims to be (write_bw / N)
when there are N dd tasks.

On write() syscall, use bdi->dirty_ratelimit
============================================

    balance_dirty_pages(pages_dirtied)
    {
        task_ratelimit = bdi->dirty_ratelimit * bdi_position_ratio();
        pause = pages_dirtied / task_ratelimit;
        sleep(pause);
    }

On every 200ms, update bdi->dirty_ratelimit
===========================================

    bdi_update_dirty_ratelimit()
    {
        task_ratelimit = bdi->dirty_ratelimit * bdi_position_ratio();
        balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate;
        bdi->dirty_ratelimit = balanced_dirty_ratelimit
    }

Estimation of balanced bdi->dirty_ratelimit
===========================================

balanced task_ratelimit
-----------------------

balance_dirty_pages() needs to throttle tasks dirtying pages such that
the total amount of dirty pages stays below the specified dirty limit in
order to avoid memory deadlocks. Furthermore we desire fairness in that
tasks get throttled proportionally to the amount of pages they dirty.

IOW we want to throttle tasks such that we match the dirty rate to the
writeout bandwidth, this yields a stable amount of dirty pages:

        dirty_rate == write_bw                                          (1)

The fairness requirement gives us:

        task_ratelimit = balanced_dirty_ratelimit
                       == write_bw / N                                  (2)

where N is the number of dd tasks.  We don't know N beforehand, but
still can estimate balanced_dirty_ratelimit within 200ms.

Start by throttling each dd task at rate

        task_ratelimit = task_ratelimit_0                               (3)
                         (any non-zero initial value is OK)

After 200ms, we measured

        dirty_rate = # of pages dirtied by all dd's / 200ms
        write_bw   = # of pages written to the disk / 200ms

For the aggressive dd dirtiers, the equality holds

        dirty_rate == N * task_rate
                   == N * task_ratelimit_0                              (4)
Or
        task_ratelimit_0 == dirty_rate / N                              (5)

Now we conclude that the balanced task ratelimit can be estimated by

                                                      write_bw
        balanced_dirty_ratelimit = task_ratelimit_0 * ----------        (6)
                                                      dirty_rate

Because with (4) and (5) we can get the desired equality (1):

                                                       write_bw
        balanced_dirty_ratelimit == (dirty_rate / N) * ----------
                                                       dirty_rate
                                 == write_bw / N

Then using the balanced task ratelimit we can compute task pause times like:

        task_pause = task->nr_dirtied / task_ratelimit

task_ratelimit with position control
------------------------------------

However, while the above gives us means of matching the dirty rate to
the writeout bandwidth, it at best provides us with a stable dirty page
count (assuming a static system). In order to control the dirty page
count such that it is high enough to provide performance, but does not
exceed the specified limit we need another control.

The dirty position control works by extending (2) to

        task_ratelimit = balanced_dirty_ratelimit * pos_ratio           (7)

where pos_ratio is a negative feedback function that subjects to

1) f(setpoint) = 1.0
2) df/dx < 0

That is, if the dirty pages are ABOVE the setpoint, we throttle each
task a bit more HEAVY than balanced_dirty_ratelimit, so that the dirty
pages are created less fast than they are cleaned, thus DROP to the
setpoints (and the reverse).

Based on (7) and the assumption that both dirty_ratelimit and pos_ratio
remains CONSTANT for the past 200ms, we get

        task_ratelimit_0 = balanced_dirty_ratelimit * pos_ratio         (8)

Putting (8) into (6), we get the formula used in
bdi_update_dirty_ratelimit():

                                                write_bw
        balanced_dirty_ratelimit *= pos_ratio * ----------              (9)
                                                dirty_rate

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:56 +08:00
Wu Fengguang
af6a311384 writeback: add bg_threshold parameter to __bdi_update_bandwidth()
No behavior change.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:56 +08:00
Wu Fengguang
6c14ae1e92 writeback: dirty position control
bdi_position_ratio() provides a scale factor to bdi->dirty_ratelimit, so
that the resulted task rate limit can drive the dirty pages back to the
global/bdi setpoints.

Old scheme is,
                                          |
                           free run area  |  throttle area
  ----------------------------------------+---------------------------->
                                    thresh^                  dirty pages

New scheme is,

  ^ task rate limit
  |
  |            *
  |             *
  |              *
  |[free run]      *      [smooth throttled]
  |                  *
  |                     *
  |                         *
  ..bdi->dirty_ratelimit..........*
  |                               .     *
  |                               .          *
  |                               .              *
  |                               .                 *
  |                               .                    *
  +-------------------------------.-----------------------*------------>
                          setpoint^                  limit^  dirty pages

The slope of the bdi control line should be

1) large enough to pull the dirty pages to setpoint reasonably fast

2) small enough to avoid big fluctuations in the resulted pos_ratio and
   hence task ratelimit

Since the fluctuation range of the bdi dirty pages is typically observed
to be within 1-second worth of data, the bdi control line's slope is
selected to be a linear function of bdi write bandwidth, so that it can
adapt to slow/fast storage devices well.

Assume the bdi control line

	pos_ratio = 1.0 + k * (dirty - bdi_setpoint)

where k is the negative slope.

If targeting for 12.5% fluctuation range in pos_ratio when dirty pages
are fluctuating in range

	[bdi_setpoint - write_bw/2, bdi_setpoint + write_bw/2],

we get slope

	k = - 1 / (8 * write_bw)

Let pos_ratio(x_intercept) = 0, we get the parameter used in code:

	x_intercept = bdi_setpoint + 8 * write_bw

The global/bdi slopes are nicely complementing each other when the
system has only one major bdi (indicated by bdi_thresh ~= thresh):

1) slope of global control line    => scaling to the control scope size
2) slope of main bdi control line  => scaling to the writeout bandwidth

so that

- in memory tight systems, (1) becomes strong enough to squeeze dirty
  pages inside the control scope

- in large memory systems where the "gravity" of (1) for pulling the
  dirty pages to setpoint is too weak, (2) can back (1) up and drive
  dirty pages to bdi_setpoint ~= setpoint reasonably fast.

Unfortunately in JBOD setups, the fluctuation range of bdi threshold
is related to memory size due to the interferences between disks.  In
this case, the bdi slope will be weighted sum of write_bw and bdi_thresh.

Given equations

        span = x_intercept - bdi_setpoint
        k = df/dx = - 1 / span

and the extremum values

        span = bdi_thresh
        dx = bdi_thresh

we get

        df = - dx / span = - 1.0

That means, when bdi_dirty deviates bdi_thresh up, pos_ratio and hence
task ratelimit will fluctuate by -100%.

peter: use 3rd order polynomial for the global control line

CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:56 +08:00
Wu Fengguang
c8e28ce049 writeback: account per-bdi accumulated dirtied pages
Introduce the BDI_DIRTIED counter. It will be used for estimating the
bdi's dirty bandwidth.

CC: Jan Kara <jack@suse.cz>
CC: Michael Rubin <mrubin@google.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:56 +08:00
Nobuhiro Iwamatsu
d762cc290b HID: Add support MacbookAir 4,1 keyboard
Added USB device IDs and keyboard map for MacBookAir 4,1 keyboard.

Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-10-03 14:00:05 +02:00
Kalle Valo
62c83ac4d6 ath6kl: include vmalloc.h in debug.c
Fixes compilation errors when compiling for ARM:

ath6kl/debug.c:312: error: implicit declaration of function 'vmalloc'
ath6kl/debug.c:312: warning: assignment makes pointer from integer without a cast
ath6kl/debug.c:342: error: implicit declaration of function 'vfree'
ath6kl/debug.c:696: warning: assignment makes pointer from integer without a cast
ath6kl/debug.c:871: warning: assignment makes pointer from integer without a cast

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2011-10-03 13:52:46 +03:00
Dimitris Papastamos
0eef6b0415 regmap: Fix doc comment
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-03 11:50:53 +01:00
Dimitris Papastamos
c08604b8ae regmap: Optimize the lookup path to use binary search
Since there are more lookups than insertions in a typical
scenario, optimize the linear search into a binary search.  For
this to work, we need to keep reg_defaults sorted upon
insertions, for now be lazy and use sort().

Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-03 11:50:53 +01:00
Florian Westphal
98d9ae841a netfilter: nf_conntrack: fix event flooding in GRE protocol tracker
GRE connections cause ctnetlink event flood because the ASSURED event
is set for every packet received.

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-10-03 12:43:24 +02:00
Arnd Bergmann
28748782b7 video/omap: fix build dependencies
Four of the LCD panel drivers depend on the backlight class,
so add the dependency in Kconfig.
Selecting the BACKLIGHT_CLASS_DEVICE symbol does not generally
work since it has other dependencies.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[tomi.valkeinen@ti.com: changed also N8x0 panel]
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2011-10-03 13:28:14 +03:00
Linus Walleij
b1e3be0647 clocksource: fixup ux500 build problems
Based on a patch from Arnd Bergmann this fixes up the build
problem of assigning a non-existing global when the ux500 PRCMU
timer is not linked in by passing its base address to the init
function. We also add a missing <linux/errno.h> inclusion and
staticize the dummy function.

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2011-10-03 09:34:16 +02:00
Linus Torvalds
9b13776977 Merge branch 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6
* 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6:
  mfd: Fix generic irq chip ack function name for jz4740-adc
2011-10-02 19:23:44 -07:00
Linus Torvalds
4edf5886bb Merge branch 'for-linus' of git://github.com/tiwai/sound
* 'for-linus' of git://github.com/tiwai/sound:
  ALSA: hda - Fix a regression of the position-buffer check
2011-10-02 19:22:44 -07:00
Sachin Kamat
6ca3f8bdf8 ARM: EXYNOS4: Add HDMI support for ORIGEN
This patch adds HDMI (TVout) support for ORIGEN board.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-10-03 08:42:22 +09:00
Sachin Kamat
c86cfdd012 ARM: EXYNOS4: Add keypad support for ORIGEN
This patch adds keypad support for ORIGEN board as GPIO keys.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-10-03 08:42:22 +09:00
Tushar Behera
cf1dad9d7f ARM: EXYNOS4: Add support for secondary MMC port on ORIGEN
Secondary MMC port on ORIGEN is connected to sdhci instance 0.
Support for secondary MMC port is extended by registering
sdhci instance 0.

Since sdhci instance 2 can contain a bootable media, sdhci
instance 0 is registered after instance 2.

Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
[kgene.kim@samsung.com: Added comments in registering sdhci]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-10-03 08:42:22 +09:00
Tushar Behera
92e41efd73 ARM: EXYNOS4: Fix sdhci card detection for ORIGEN
Fix incorrect value of cd_type field in platform data
for sdhci device.

Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-10-03 08:42:22 +09:00
Giridhar Maruthy
9edff0f79f ARM: EXYNOS4: Add PWM backlight support on ORIGEN
This patch adds support for LCD backlight using PWM timer
on ORIGEN board.

Signed-off-by: Giridhar Maruthy <giridhar.maruthy@linaro.org>
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-10-03 08:42:22 +09:00
Sachin Kamat
6f8eb324a3 ARM: EXYNOS4: Add FIMC device on ORIGEN board
This patch adds definitions to enable support for s5p-fimc
driver on ORIGEN board.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
[kgene.kim@samsung.com: re-ordering changes]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-10-03 08:42:21 +09:00
Sachin Kamat
24f9e1f314 ARM: EXYNOS4: Add USB EHCI device to ORIGEN board
Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
[kgene.kim@samsung.com: re-ordering changes in Kconfig]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-10-03 08:42:21 +09:00
Kukjin Kim
218f82e624 Merge branch 'next-samsung-board-v3.1' into next/topic-exynos4-devel-origen 2011-10-03 08:42:01 +09:00
Arnd Bergmann
ca7aceef21 ASoC: sh: use correct __iomem annotations
This removes a few unnecessary type casts and avoids
sparse warnings.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 21:51:27 +01:00
Arnd Bergmann
b381bc8ab8 ASoC: imx: eukrea_tlv320 needs i2c
Add a missing dependency that is required for random configurations.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 21:51:19 +01:00
Mark Brown
6010c4c6c8 Merge branch 'for-3.1' into for-3.2
Conflicts:
	sound/soc/omap/mcpdm.c
	sound/soc/omap/mcpdm.h
2011-10-02 20:20:35 +01:00
Arnd Bergmann
b5c49d49b9 ASoC: omap_mcpdm_remove cannot be __devexit
omap_mcpdm_remove is used from asoc_mcpdm_probe, which is an
initcall, and must not be discarded when HOTPLUG is disabled.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 20:19:59 +01:00
Axel Lin
4b8713fd54 ASoC: Remove unused srate variable in tegra_spdif_hw_params
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 19:58:45 +01:00
Axel Lin
c6add3f0e6 ASoC: Remove unused rate variable in magician_playback_hw_params
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 19:58:45 +01:00
Axel Lin
8fccf46905 ASoC: Staticise sh4_ssi_dai
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 19:58:45 +01:00
Axel Lin
bc6bd90eb7 ASoC: Staticise samsung_spdif_dai
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 19:58:45 +01:00
Axel Lin
019cd3b25a ASoC: tegra: Staticise tegra_i2s_dai and tegra_spdif_dai
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 19:58:44 +01:00
Axel Lin
c4c5839f98 ASoC: samsung: Add __devexit_p at necessary places
According to the comments in include/linux/init.h:

"Pointers to __devexit functions must use __devexit_p(function_name), the
wrapper will insert either the function_name or NULL, depending on the confi
options."

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Jaswinder Singh <jassi.brar@samsung.com>
Cc: Ben Dooks <ben@simtec.co.uk>
Cc: Seungwhan Youn <sw.youn@samsung.com>
Cc: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 19:58:44 +01:00
Axel Lin
48860e409d ASoC: kirkwood-i2s: Add __devexit_p at necessary place
According to the comments in include/linux/init.h:

"Pointers to __devexit functions must use __devexit_p(function_name), the
wrapper will insert either the function_name or NULL, depending on the config
options."

We have __devexit annotation for kirkwood_i2s_dev_remove(), thus add __devexit_p
at necessary place.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 19:58:44 +01:00
Axel Lin
3f0456bfd7 ASoC: wm8782: Add __devexit_p at necessary place
According to the comments in include/linux/init.h:

"Pointers to __devexit functions must use __devexit_p(function_name), the
wrapper will insert either the function_name or NULL, depending on the config
options."

We have __devexit annotation for wm8782_remove(), thus add __devexit_p at
necessary place.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-10-02 19:58:44 +01:00
Mark Brown
ea945ab4d2 Merge branch 'for-3.1' into for-3.2 2011-10-02 19:57:19 +01:00