To support task performance boosting, the usage of a single knob has the
advantage to be a simple solution, both from the implementation and the
usability standpoint. However, on a real system it can be difficult to
identify a single value for the knob which fits the needs of multiple
different tasks. For example, some kernel threads and/or user-space
background services should be better managed the "standard" way while we
still want to be able to boost the performance of specific workloads.
In order to improve the flexibility of the task boosting mechanism this
patch is the first of a small series which extends the previous
implementation to introduce a "per task group" support.
This first patch introduces just the basic CGroups support, a new
"schedtune" CGroups controller is added which allows to configure
different boost value for different groups of tasks.
To keep the implementation simple but still effective for a boosting
strategy, the new controller:
1. allows only a two layer hierarchy
2. supports only a limited number of boost groups
A two layer hierarchy allows to place each task either:
a) in the root control group
thus being subject to a system-wide boosting value
b) in a child of the root group
thus being subject to the specific boost value defined by that
"boost group"
The limited number of "boost groups" supported is mainly motivated by
the observation that in a real system it could be useful to have only
few classes of tasks which deserve different treatment.
For example, background vs foreground or interactive vs low-priority.
As an additional benefit, a limited number of boost groups allows also
to have a simpler implementation especially for the code required to
compute the boost value for CPUs which have runnable tasks belonging to
different boost groups.
cc: Tejun Heo <tj@kernel.org>
cc: Li Zefan <lizefan@huawei.com>
cc: Johannes Weiner <hannes@cmpxchg.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
The current (CFS) scheduler implementation does not allow "to boost"
tasks performance by running them at a higher OPP compared to the
minimum required to meet their workload demands.
To support tasks performance boosting the scheduler should provide a
"knob" which allows to tune how much the system is going to be optimised
for energy efficiency vs performance.
This patch is the first of a series which provides a simple interface to
define a tuning knob. One system-wide "boost" tunable is exposed via:
/proc/sys/kernel/sched_cfs_boost
which can be configured in the range [0..100], to define a percentage
where:
- 0% boost requires to operate in "standard" mode by scheduling
tasks at the minimum capacities required by the workload demand
- 100% boost requires to push at maximum the task performances,
"regardless" of the incurred energy consumption
A boost value in between these two boundaries is used to bias the
power/performance trade-off, the higher the boost value the more the
scheduler is biased toward performance boosting instead of energy
efficiency.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Scheduler-driven CPU frequency selection hopes to exploit both
per-task and global information in the scheduler to improve frequency
selection policy, achieving lower power consumption, improved
responsiveness/performance, and less reliance on heuristics and
tunables. For further discussion on the motivation of this integration
see [0].
This patch implements a shim layer between the Linux scheduler and the
cpufreq subsystem. The interface accepts capacity requests from the
CFS, RT and deadline sched classes. The requests from each sched class
are summed on each CPU with a margin applied to the CFS and RT
capacity requests to provide some headroom. Deadline requests are
expected to be precise enough given their nature to not require
headroom. The maximum total capacity request for a CPU in a frequency
domain drives the requested frequency for that domain.
Policy is determined by both the sched classes and this shim layer.
Note that this algorithm is event-driven. There is no polling loop to
check cpu idle time nor any other method which is unsynchronized with
the scheduler, aside from a throttling mechanism to ensure frequency
changes are not attempted faster than the hardware can accommodate them.
Thanks to Juri Lelli <juri.lelli@arm.com> for contributing design ideas,
code and test results, and to Ricky Liang <jcliang@chromium.org>
for initialization and static key inc/dec fixes.
[0] http://article.gmane.org/gmane.linux.kernel/1499836
[smuckle@linaro.org: various additions and fixes, revised commit text]
CC: Ricky Liang <jcliang@chromium.org>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Some architectures and platforms perform CPU frequency transitions
through a non-blocking method, while some might block or sleep. Even
when frequency transitions do not block or sleep they may be very slow.
This distinction is important when trying to change frequency from
a non-interruptible context in a scheduler hot path.
Describe this distinction with a cpufreq driver flag,
CPUFREQ_DRIVER_FAST. The default is to not have this flag set,
thus erring on the side of caution.
cpufreq_driver_is_slow() is also introduced in this patch. Setting
the above flag will allow this function to return false.
[smuckle@linaro.org: change flag/API to include drivers that are too
slow for scheduler hot paths, in addition to those that block/sleep]
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Implements cpufreq_scale_max_freq_capacity() to provide the scheduler
with a maximum frequency scaling correction factor for more accurate
load-tracking and cpu capacity handling by being able to deal with
frequency capping.
This scaling factor describes the influence of running a cpu with a
current maximum frequency lower than the absolute possible maximum
frequency on load tracking and cpu capacity.
The factor is:
current_max_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
In fact, max_freq_scale should be a struct cpufreq_policy data member.
But this would require that the scheduler hot path (__update_load_avg())
would have to grab the cpufreq lock. This can be avoided by using per-cpu
data initialized to SCHED_CAPACITY_SCALE for max_freq_scale.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
This patch implements support for extracting energy cost data from DT.
The data should conform to the DT bindings for energy cost data needed
by EAS (energy aware scheduling).
Signed-off-by: Robin Randhawa <robin.randhawa@arm.com>
The idle-state of each cpu is currently pointed to by rq->idle_state but
there isn't any information in the struct cpuidle_state that can used to
look up the idle-state energy model data stored in struct
sched_group_energy. For this purpose is necessary to store the idle
state index as well. Ideally, the idle-state data should be unified.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
cpufreq is currently keeping it a secret which cpus are sharing
clock source. The scheduler needs to know about clock domains as well
to become more energy aware. The SD_SHARE_CAP_STATES domain flag
indicates whether cpus belonging to the sched_domain share capacity
states (P-states).
There is no connection with cpufreq (yet). The flag must be set by
the arch specific topology code.
cc: Russell King <linux@arm.linux.org.uk>
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
The struct sched_group_energy represents the per sched_group related
data which is needed for energy aware scheduling. It contains:
(1) number of elements of the idle state array
(2) pointer to the idle state array which comprises 'power consumption'
for each idle state
(3) number of elements of the capacity state array
(4) pointer to the capacity state array which comprises 'compute
capacity and power consumption' tuples for each capacity state
The struct sched_group obtains a pointer to a struct sched_group_energy.
The function pointer sched_domain_energy_f is introduced into struct
sched_domain_topology_level which will allow the arch to pass a particular
struct sched_group_energy from the topology shim layer into the scheduler
core.
The function pointer sched_domain_energy_f has an 'int cpu' parameter
since the folding of two adjacent sd levels via sd degenerate doesn't work
for all sd levels. I.e. it is not possible for example to use this feature
to provide per-cpu energy in sd level DIE on ARM's TC2 platform.
It was discussed that the folding of sd levels approach is preferable
over the cpu parameter approach, simply because the user (the arch
specifying the sd topology table) can introduce less errors. But since
it is not working, the 'int cpu' parameter is the only way out. It's
possible to use the folding of sd levels approach for
sched_domain_flags_f and the cpu parameter approach for the
sched_domain_energy_f at the same time though. With the use of the
'int cpu' parameter, an extra check function has to be provided to make
sure that all cpus spanned by a sched group are provisioned with the same
energy data.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Implements cpufreq_scale_freq_capacity() to provide the scheduler with a
frequency scaling correction factor for more accurate load-tracking.
The factor is:
current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
In fact, freq_scale should be a struct cpufreq_policy data member. But
this would require that the scheduler hot path (__update_load_avg()) would
have to grab the cpufreq lock. This can be avoided by using per-cpu data
initialized to SCHED_CAPACITY_SCALE for freq_scale.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
(from https://lkml.org/lkml/2016/9/1/428)
(cherry pick from android-3.10 commit b58133100b38f2bf83cad2d7097417a3a196ed0b)
Removing a bounce buffer copy operation in the pmsg driver path is
always better. We also gain in overall performance by not requesting
a vmalloc on every write as this can cause precious RT tasks, such
as user facing media operation, to stall while memory is being
reclaimed. Added a write_buf_user to the pstore functions, a backup
platform write_buf_user that uses the small buffer that is part of
the instance, and implemented a ramoops write_buf_user that only
supports PSTORE_TYPE_PMSG.
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 31057326
Change-Id: I4cdee1cd31467aa3e6c605bce2fbd4de5b0f8caa
Just for good measure, make sure that check_object_size() is always
inlined too, as already done for copy_*_user() and __copy_*_user().
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: Ibfdf4790d03fe426e68d9a864c55a0d1bbfb7d61
(cherry picked from commit a85d6b8242)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Instead of having each caller of check_object_size() need to remember to
check for a const size parameter, move the check into check_object_size()
itself. This actually matches the original implementation in PaX, though
this commit cleans up the now-redundant builtin_const() calls in the
various architectures.
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I348809399c10ffa051251866063be674d064b9ff
(cherry picked from 81409e9e28)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
The IO latency histogram change broke allmodconfig and
allnoconfig builds. This fixes those breakages.
Change-Id: I9cdae655b40ed155468f3cef25cdb74bb56c4d3e
Signed-off-by: Mohan Srinivasan <srmohan@google.com>
SLUB already has a redzone debugging feature. But it is only positioned
at the end of object (aka right redzone) so it cannot catch left oob.
Although current object's right redzone acts as left redzone of next
object, first object in a slab cannot take advantage of this effect.
This patch explicitly adds a left red zone to each object to detect left
oob more precisely.
Background:
Someone complained to me that left OOB doesn't catch even if KASAN is
enabled which does page allocation debugging. That page is out of our
control so it would be allocated when left OOB happens and, in this
case, we can't find OOB. Moreover, SLUB debugging feature can be
enabled without page allocator debugging and, in this case, we will miss
that OOB.
Before trying to implement, I expected that changes would be too
complex, but, it doesn't look that complex to me now. Almost changes
are applied to debug specific functions so I feel okay.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ib893a17ecabd692e6c402e864196bf89cd6781a5
(cherry picked from commit d86bd1bece)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
This patch adds a new sysfs node (latency_hist) and reports
IO (svc time) latency histograms. Disabled by default, can be
enabled by echoing 0 into latency_hist, stats can be cleared
by writing 2 into latency_hist. This commit fixes the 32 bit
build breakage in the previous commit. Tested on both 32 bit
and 64 bit arm devices.
Bug: 30677035
Change-Id: I9a615a16616d80f87e75676ac4d078a5c429dcf9
Signed-off-by: Mohan Srinivasan <srmohan@google.com>
When I initially added the unsafe_[get|put]_user() helpers in commit
5b24a7a2aa ("Add 'unsafe' user access functions for batched
accesses"), I made the mistake of modeling the interface on our
traditional __[get|put]_user() functions, which return zero on success,
or -EFAULT on failure.
That interface is fairly easy to use, but it's actually fairly nasty for
good code generation, since it essentially forces the caller to check
the error value for each access.
In particular, since the error handling is already internally
implemented with an exception handler, and we already use "asm goto" for
various other things, we could fairly easily make the error cases just
jump directly to an error label instead, and avoid the need for explicit
checking after each operation.
So switch the interface to pass in an error label, rather than checking
the error value in the caller. Best do it now before we start growing
more users (the signal handling code in particular would be a good place
to use the new interface).
So rather than
if (unsafe_get_user(x, ptr))
... handle error ..
the interface is now
unsafe_get_user(x, ptr, label);
where an error during the user mode fetch will now just cause a jump to
'label' in the caller.
Right now the actual _implementation_ of this all still ends up being a
"if (err) goto label", and does not take advantage of any exception
label tricks, but for "unsafe_put_user()" in particular it should be
fairly straightforward to convert to using the exception table model.
Note that "unsafe_get_user()" is much harder to convert to a clever
exception table model, because current versions of gcc do not allow the
use of "asm goto" (for the exception) with output values (for the actual
value to be fetched). But that is hopefully not a limitation in the
long term.
[ Also note that it might be a good idea to switch unsafe_get_user() to
actually _return_ the value it fetches from user space, but this
commit only changes the error handling semantics ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ib905a84a04d46984320f6fd1056da4d72f3d6b53
(cherry picked from commit 1bd4403d86)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit bb1fceca22)
When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()
Then it attempts to copy user data into this fresh skb.
If the copy fails, we undo the work and remove the fresh skb.
Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)
Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
This bug was found by Marco Grassi thanks to syzkaller.
Fixes: 6859d49475 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I58bb02d6e4e399612e8580b9e02d11e661df82f5
Bug: 31183296
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.
The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.
Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit
80127a3968 ("locking/percpu-rwsem: Optimize readers and reduce global
impact").
We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.
Cc: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[jstultz: backported to 4.4]
Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2
Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557
Signed-off-by: John Stultz <john.stultz@linaro.org>
Currently the percpu-rwsem switches to (global) atomic ops while a
writer is waiting; which could be quite a while and slows down
releasing the readers.
This patch cures this problem by ordering the reader-state vs
reader-count (see the comments in __percpu_down_read() and
percpu_down_write()). This changes a global atomic op into a full
memory barrier, which doesn't have the global cacheline contention.
This also enables using the percpu-rwsem with rcu_sync disabled in order
to bias the implementation differently, reducing the writer latency by
adding some cost to readers.
Mailing-list-URL: https://lkml.org/lkml/2016/8/9/181
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[jstultz: Backported to 4.4]
Change-Id: I8ea04b4dca2ec36f1c2469eccafde1423490572f
Signed-off-by: John Stultz <john.stultz@linaro.org>
On upstream uboot, we use ums mode to update firmware.
Add this flag to help enter USB Mass Storage mode.
Change-Id: I0e515bfd8703bd48d950b72787b365226af11ce9
Signed-off-by: Jacob Chen <jacob2.chen@rock-chips.com>
Some SoCs have a single phy-hw-block with multiple phys, this is
modelled by a single phy dts node, so we end up with multiple
controller nodes with a phys property pointing to the phy-node
of the otg-phy.
Only one of these controllers typically is an otg controller, yet we
were checking the first controller who uses a phy from the block and
then end up looking for a dr_mode property in e.g. the ehci controller.
This commit fixes this by adding an arg0 parameter to
of_usb_get_dr_mode_by_phy and make of_usb_get_dr_mode_by_phy
check that this matches the phandle args[0] value when looking for
the otg controller.
Conflicts:
drivers/usb/phy/phy-am335x.c
Change-Id: I0b999a7445399cb2c86060bdf662db8aab96d1cc
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Wu Liang feng <wulf@rock-chips.com>
(cherry picked from commit ce15ed4c5d)
If OF is disabled, we will try to define a stub for
of_usb_get_dr_mode_by_phy(), however that missed a
static inline annotation which made us redefine the
stub over and over again. Fix that.
Change-Id: I0e3594d2beb29273343dacf0af73f159ad30a35d
Fixes: 98bfb39466 ("usb: of: add an api to get
dr_mode by the phy node")
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Wu Liang feng <wulf@rock-chips.com>
(cherry picked from commit be99c84300)
Some USB phy drivers have different handling for the controller in each
dr_mode. But the phy driver does not have visibility to the dr_mode of
the controller.
This adds an api to return the dr_mode of the controller which
associates the given phy node.
Change-Id: I2bdf7ba8fbff03e8dd33a2f4642b4455bcdabd29
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
(cherry picked from commit 98bfb39466)
Signed-off-by: Wu Liang feng <wulf@rock-chips.com>
DEFAULT_MODE: default mode
ONE_VOP_DUAL_MIPI_HOR_SCAN: one vop to dual mipi screen hor scan
ONE_VOP_DUAL_MIPI_VER_SCAN: one vop to dual mipi screen ver scan
TWO_VOP_DUAL_MIPI: two vop to dual mipi screen
Change-Id: I48a96369059afd5615ebf41b4b6178d7206c4f72
Signed-off-by: Huang Jiachai <hjc@rock-chips.com>
commit 3feab13c91 upstream.
When the ACPI_DECLARE_PROBE_ENTRY macro was added in
commit e647b53227 ("ACPI: Add early device probing infrastructure"),
a stub macro adding an unused entry was added for the !CONFIG_ACPI
Kconfig option case to make sure kernel code making use of the
macro did not require to be guarded within CONFIG_ACPI in order to
be compiled.
The stub macro was never used since all kernel code that defines
ACPI_DECLARE_PROBE_ENTRY entries is currently guarded within
CONFIG_ACPI; it contains a typo that should be nonetheless fixed.
Fix the typo in the stub (ie !CONFIG_ACPI) ACPI_DECLARE_PROBE_ENTRY()
macro so that it can actually be used if needed.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Fixes: e647b53227 (ACPI: Add early device probing infrastructure)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4097461897 upstream.
As explained in 1407814240-4275-1-git-send-email-decui@microsoft.com we
have a hard load dependency between i8042 and atkbd which prevents
keyboard from working on Gen2 Hyper-V VMs.
> hyperv_keyboard invokes serio_interrupt(), which needs a valid serio
> driver like atkbd.c. atkbd.c depends on libps2.c because it invokes
> ps2_command(). libps2.c depends on i8042.c because it invokes
> i8042_check_port_owner(). As a result, hyperv_keyboard actually
> depends on i8042.c.
>
> For a Generation 2 Hyper-V VM (meaning no i8042 device emulated), if a
> Linux VM (like Arch Linux) happens to configure CONFIG_SERIO_I8042=m
> rather than =y, atkbd.ko can't load because i8042.ko can't load(due to
> no i8042 device emulated) and finally hyperv_keyboard can't work and
> the user can't input: https://bugs.archlinux.org/task/39820
> (Ubuntu/RHEL/SUSE aren't affected since they use CONFIG_SERIO_I8042=y)
To break the dependency we move away from using i8042_check_port_owner()
and instead allow serio port owner specify a mutex that clients should use
to serialize PS/2 command stream.
Reported-by: Mark Laws <mdl@60hz.org>
Tested-by: Mark Laws <mdl@60hz.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a1b2725a6 upstream.
Add a new USB_SPEED_SUPER_PLUS device speed, and make sure usb core can
handle the new speed.
In most cases the behaviour is the same as with USB_SPEED_SUPER SuperSpeed
devices. In a few places we add a "Plus" string to inform the user of the
new speed.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3b0946d62 upstream.
Bharat Kumar Gogada reported issues with the generic MSI code, where the
end-point ended up with garbage in its MSI configuration (both for the vector
and the message).
It turns out that the two MSI paths in the kernel are doing slightly different
things:
generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP
PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI
And it turns out that end-points are allowed to latch the content of the MSI
configuration registers as soon as MSIs are enabled. In Bharat's case, the
end-point ends up using whatever was there already, which is not what you
want.
In order to make things converge, we introduce a new MSI domain flag
(MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set,
this flag forces the programming of the end-point as soon as the MSIs are
allocated.
A consequence of this is that we have an extra activate in irq_startup, but
that should be without much consequence.
tglx:
- Several people reported a VMWare regression with PCI/MSI-X passthrough. It
turns out that the patch also cures that issue.
- We need to have a look at the MSI disable interrupt path, where we write
the msg to all zeros without disabling MSI in the PCI device. Is that
correct?
Fixes: 52f518a3a7 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts"
Reported-and-tested-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Reported-and-tested-by: Foster Snowhill <forst@forstwoof.ru>
Reported-by: Matthias Prager <linux@matthiasprager.de>
Reported-by: Jason Taylor <jason.taylor@simplivity.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/1468426713-31431-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69874ec233 upstream.
Add the device ID for the PF of the NFP4000. The device ID for the VF,
0x6003, is already present as PCI_DEVICE_ID_NETRONOME_NFP6000_VF.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
H546dlb01 is a 5.46 inch OLED screen with resolution 1080x1920.
Change-Id: I977f355c53c58f6dba46c4581fc8190bfce04cf2
Signed-off-by: bivvy <bivvy.bi@rock-chips.com>
if userspace set car_reversing 1, rk fb will ignore
buffer from hwc.
Change-Id: Ib3bb9a105a8d6b7a2cc0e71c21bf8cc208b4ffd3
Signed-off-by: Huang Jiachai <hjc@rock-chips.com>
Before reboot if the DCDC is disabled,
the DCDC is still disabled after restart.
We have an method to workaround:
Use vsel pin to switch the voltage between value in FAN53555_VSEL0
and FAN53555_VSEL1. If VSEL pin is inactive, the voltage of DCDC
are controlled by FAN53555_VSEL0, when we pull vsel pin to active,
they would be controlled by FAN53555_VSEL1.
In this case, we can set FAN53555_VSEL1 to disable dcdc,
So we can make vsel pin to active to disable dcdc,
VSEL pin is inactive to enable DCDC.
Change-Id: I14c823ed11dc3369044ad2ed0b53a6027acbccd0
Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
The naming is meant to discourage random use: the helper functions are
not really any more "unsafe" than the traditional double-underscore
functions (which need the address range checking), but they do need even
more infrastructure around them, and should not be used willy-nilly.
In addition to checking the access range, these user access functions
require that you wrap the user access with a "user_acess_{begin,end}()"
around it.
That allows architectures that implement kernel user access control
(x86: SMAP, arm64: PAN) to do the user access control in the wrapping
user_access_begin/end part, and then batch up the actual user space
accesses using the new interfaces.
The main (and hopefully only) use for these are for core generic access
helpers, initially just the generic user string functions
(strnlen_user() and strncpy_from_user()).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5b24a7a2aa)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
This is the start of porting PAX_USERCOPY into the mainline kernel. This
is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. The
work is based on code by PaX Team and Brad Spengler, and an earlier port
from Casey Schaufler. Additional non-slab page tests are from Rik van Riel.
This patch contains the logic for validating several conditions when
performing copy_to_user() and copy_from_user() on the kernel object
being copied to/from:
- address range doesn't wrap around
- address range isn't NULL or zero-allocated (with a non-zero copy size)
- if on the slab allocator:
- object size must be less than or equal to copy size (when check is
implemented in the allocator, which appear in subsequent patches)
- otherwise, object must not span page allocations (excepting Reserved
and CMA ranges)
- if on the stack
- object must not extend before/after the current process stack
- object must be contained by a valid stack frame (when there is
arch/build support for identifying stack frames)
- object must not overlap with kernel text
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit f5509cc18d)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
skip debug_page_ref and KCOV_INSTRUMENT in mm/Makefile