Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.
Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Deleted the file mode configuration code in unsupported map type and
removed the file mode check in non-existing helper functions.
(cherry-pick from net-next: 6e71b04a82)
Bug: 30950746
Change-Id: Icfad20f1abb77f91068d244fb0d87fa40824dd1b
We all should be using (and improving) the schedutil governor now. Get
rid of the non-upstream governor.
Tested on Hikey.
Change-Id: I2104558b03118b0a9c5f099c23c42cd9a6c2a963
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Merge in the EAS r1.4 patches from eas-dev to 4.9 common branch.
There is one patch in android-4.9-eas-dev which is not part of the 1.4
patches
ANDROID: sched/fair: Select correct capacity state for energy_diff
but we have already merged it into android-4.4 so in the interests
of keeping aligned, let's include that in the merge.
Merge Log:
* ack/android-4.9-eas-dev:
sched: EAS: Fix the condition to distinguish energy before/after
sched: EAS: update trg_cpu to backup_cpu if no energy saving for target_cpu
sched/fair: consider task utilization in group_max_util()
sched/fair: consider task utilization in group_norm_util()
sched/fair: enforce EAS mode
sched/fair: ignore backup CPU when not valid
sched/fair: trace energy_diff for non boosted tasks
UPSTREAM: sched/fair: Sync task util before slow-path wakeup
UPSTREAM: sched/core: Add missing update_rq_clock() call in set_user_nice()
UPSTREAM: sched/core: Add missing update_rq_clock() call for task_hot()
UPSTREAM: sched/core: Add missing update_rq_clock() in detach_task_cfs_rq()
UPSTREAM: sched/core: Add missing update_rq_clock() in post_init_entity_util_avg()
UPSTREAM: sched/fair: Fix task group initialization
cpufreq/sched: Consider max cpu capacity when choosing frequencies
cpufreq/sched: Use cpu max freq rather than policy max
sched/fair: remove erroneous RCU_LOCKDEP_WARN from start_cpu()
ANDROID: sched/fair: Select correct capacity state for energy_diff
UPSTREAM: sched/fair: Fix usage of find_idlest_group() when the local group is idlest
UPSTREAM: sched/fair: Fix usage of find_idlest_group() when no groups are allowed
UPSTREAM: sched/fair: Fix find_idlest_group() when local group is not allowed
UPSTREAM: sched/fair: Remove unnecessary comparison with -1
UPSTREAM: sched/fair: Move select_task_rq_fair() slow-path into its own function
UPSTREAM: sched/fair: Force balancing on NOHZ balance if local group has capacity
UPSTREAM: sched: use load_avg for selecting idlest group
UPSTREAM: sched: fix find_idlest_group for fork
Change-Id: I57bc516f9c804bfc7144a6a5bcf70572d82f7321
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Part of the above change was reverted in 240628085effc47e86f51fc3fb37bc0e628f9a85;
this change reverts the rest.
This ARM mmap change breaks AddressSanitizer:
Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
Revert it until ASAN runtime library is updated to handle it.
Bug: 67425063
Signed-off-by: Evgenii Stepanov <eugenis@google.com>
This ARM mmap change breaks AddressSanitizer:
Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
Revert it until ASAN runtime library is updated to handle it.
Bug: 67425063
This reverts commit d2471b5e84.
Signed-off-by: Evgenii Stepanov <eugenis@google.com>
Before commit 5f8b3a757d ("sched/fair: consider task utilization in
group_norm_util()"), eenv->util_delta is used to distinguish energy
before and energy after in sched_group_energy(). After that commit,
eenv->util_delta can not do that any more.
In this commit, use trg_cpu to distinguish energy before/after in
sched_group_energy().
Before apply this commit, cap_before/cap_delta is not correct:
<idle>-0 [001] 147504.608920: sched_energy_diff: pid=7 comm=rcu_preempt
src_cpu=1 dst_cpu=3 usage_delta=7 nrg_before=250 nrg_after=250 nrg_diff=0
cap_before=0 cap_after=528 cap_delta=1056 nrg_delta=0 nrg_payoff=0
After apply this commit, cap_before/cap_delta retrun to normal:
<idle>-0 [001] 220.494011: sched_energy_diff: pid=7 comm=rcu_preempt
src_cpu=1 dst_cpu=2 usage_delta=3 nrg_before=248 nrg_after=248 nrg_diff=0
cap_before=528 cap_after=528 cap_delta=0 nrg_delta=0 nrg_payoff=0
Change-Id: I7b5f7ccce56e93af7ea4e87d8e0ea6e2405f9c27
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
(cherry picked from commit 0da783a605cd20d5a37c2a840e8a1fa641c09768)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
If no energy saving for target_cpu in the calculation of energy_diff(),
backup_cpu will be set as the new dst_cpu for the next calculation. At this
point, we also need update the new trg_cpu as backup_cpu to make sure the
subsequent calculation of energy_diff() is correct.
Change-Id: If3b35b6dc54865f1cb4b1603134102d4422227d5
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
(cherry picked from commit b1923e22f4eca3e537e015d6ea3dce1187edea37)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
The group_max_util() function is used to compute the maximum utilization
across the CPUs of a certain energy_env configuration.
Its main client is the energy_diff function when it needs to compute the
SG capacity for one of the before/after scheduling candidates.
Currently, the energy_diff function sets util_delta = 0 when it wants to
compute the energy corresponding to the scheduling candidate where the
task runs in the previous CPU. This implies that, for the task waking up
in the previous CPU we consider only its blocked load tracked by the CPU
RQ. However, in case of a medium-big task which is waking up on a long
time idle CPU, this blocked load can be already completely decayed.
More in general, the current approach is biased towards under-estimating
the capacity requirements for the "before" scheduling candidate.
This patch fixes this by:
- always use the cpu_util_wake() to properly get the utilization of a CPU
without any (partially decayed) contribution of the waking up task
- adding the task utilization to the cpu_util_wake just for the target
cpu
The "target CPU" is defined by the energy_env to be either the src_cpu or
the dst_cpu, depending on which scheduling candidate we are considering.
Finally, since this update removes the last usage of calc_util_delta()
this function is now safely removed.
Change-Id: I20ee1bcf40cee6bf6e265fb2d32ef79061ad6ced
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 52d70152fade678e304683bd1a5842af95e83558)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
The group_norm_util() function is used to compute the normalized
utilization of a SG given a certain energy_env configuration.
The main client of this function is the energy_diff function when it
comes to compute the SG energy for one of the before/after scheduling
candidates.
Currently, the energy_diff function sets util_delta = 0 when it wants to
compute the energy corresponding to the scheduling candidate where the
task runs in the previous CPU. This implies that, for the task waking up
in the previous CPU we consider only its blocked load tracked by the CPU
RQ. However, in case of a medium-big task which is waking up on a long
time idle CPU, this blocked load can be already completely decayed.
More in general, the current approach is biased towards under-estimating
the energy consumption for the "before" scheduling candidate.
This patch fixes this by:
- always use the cpu_util_wake() to properly get the utilization of a CPU
without any (partially decayed) contribution of the waking up task
- adding the task utilization to the cpu_util_wake just for the
target cpu
The "target CPU" is defined by the energy_env to be either the src_cpu
or the dst_cpu, depending on which scheduling candidate we are
considering.
This patch update also the definition of __cpu_norm_util(), which is
currently called just by the group_norm_util() function. This allows to
simplify the code by using this function just to normalize a specified
utilization with respect to a given capacity.
This update allows to completely remove any dependency of
group_norm_util() from calc_util_delta().
Change-Id: I3b6ec50ce8decb1521faae660e326ab3319d3c82
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit ef34ea830347ca175ba8a1baf05357ed98d5728c)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
For non latency sensitive tasks the goal is to optimize for energy efficiency.
Thus, we should try our best to avoid moving a task on a CPU which is then
going to be marked as overutilized.
Let's use the capacity_margin metric to verify if a candidate target CPU
should be considered without risking to bail out of EAS mode.
Change-Id: Ib3697106f4073aedf4a6c6ce42bd5d000fa8c007
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit f95753da4ba7e1c1eee0500ffe41b3e5fa68b347)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
The find_best_target can sometimes not return a valid backup CPU, either
because it cannot find one or just becasue it returns prev_cpu as a backup.
In these cases we should skip the energy_diff evaluation for the backup CPU.
Change-Id: I3787dbdfe74122348dd7a7485b88c4679051bd32
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit d9bcf5b88594a0225b51878236e49305f272eadc)
[trivial cherry-pick issue]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
In systems where SchedTune is enabled, we do not report energy diff for non
boosted tasks. Let's fix this by always genereting an energy_diff event where
however:
nrg.delta = 0, since we skip energy normalization
payoff = nrg.diff, since the payoff is defined just by the energy difference
Change-Id: I9a11ec19b6f56da04147f5ae5b47daf1dd180445
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 13e2e3c7f7d09f619444631a0962fd8020660b8d)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
We use task_util() in find_idlest_group() via capacity_spare_wake().
This task_util() updated in wake_cap(). However wake_cap() is not the
only reason for ending up in find_idlest_group() - we could have been sent
there by wake_wide(). So explicitly sync the task util with prev_cpu
when we are about to head to find_idlest_group().
We could simply do this at the beginning of
select_task_rq_fair() (i.e. irrespective of whether we're heading to
select_idle_sibling() or find_idlest_group() & co), but I didn't want to
slow down the select_idle_sibling() path more than necessary.
Don't do this during fork balancing, we won't need the task_util and
we'd just clobber the last_update_time, which is supposed to be 0.
Change-Id: I56113d8d67cf338f3fb1a692422289cf659399a3
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andres Oportus <andresoportus@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20170808095519.10077-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit ea16f0ea6c in tip:sched/core)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Add the update_rq_clock() call at the top of the callstack instead of
at the bottom where we find it missing, this to aid later effort to
minimize the number of update_rq_lock() calls.
WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
rq->clock_update_flags < RQCF_ACT_SKIP
Call Trace:
dump_stack()
__warn()
warn_slowpath_fmt()
assert_clock_updated.isra.63.part.64()
can_migrate_task()
load_balance()
pick_next_task_fair()
__schedule()
schedule()
worker_thread()
kthread()
Change-Id: I17716141789f9fac709495951cd2e079cf49d6d8
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3bed5e2166)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Instead of adding the update_rq_clock() all the way at the bottom of
the callstack, add one at the top, this to aid later effort to
minimize update_rq_lock() calls.
WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
rq->clock_update_flags < RQCF_ACT_SKIP
Call Trace:
dump_stack()
__warn()
warn_slowpath_fmt()
detach_task_cfs_rq()
switched_from_fair()
__sched_setscheduler()
_sched_setscheduler()
sched_set_stop_task()
cpu_stop_create()
__smpboot_create_thread.part.2()
smpboot_register_percpu_thread_cpumask()
cpu_stop_init()
do_one_initcall()
? print_cpu_info()
kernel_init_freeable()
? rest_init()
kernel_init()
ret_from_fork()
Change-Id: Iee08c2ed3303ae8f0c527658f13646b02a412cad
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 80f5c1b84b)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
When using schedfreq on cpus with max capacity significantly smaller than
1024, the tick update uses non-normalised capacities - this leads to
selecting an incorrect OPP as we were scaling the frequency as if the
max capacity achievable was 1024 rather than the max for that particular
cpu or group. This could result in a cpu being stuck at the lowest OPP
and unable to generate enough utilisation to climb out if the max
capacity is significantly smaller than 1024.
Instead, normalize the capacity to be in the range 0-1024 in the tick
so that when we later select a frequency, we get the correct one.
Also comments updated to be clearer about what is needed.
Change-Id: Id84391c7ac015311002ada21813a353ee13bee60
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit bc3b0db024f3d0b323e7d93994655e9e3d9f8d68)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
When we convert capacity into frequency, we used policy->max to get
the max freq of the cpu. Since this can be changed by userspace policy
or thermal events, we are potentially asking for a lower frequency
than the utilization demands.
Change over to using cpuinfo.max which is the max freq supported by
that cpu rather than the currently-chosen max. Frequency granted still
honours the max policy.
Tested by setting a userspace policy and observing the relevant vars
in a trace. In this instance, we ask for around 1ghz instead of 620MHz.
freq_new=1013512
unfixed_freq_new=624487
capacity=546
cpuinfo_max=1900800
policy_max=1171200
Change-Id: I8c5694db42243c6fb78bb9be9046b06ac81295e7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 114c6ceb2e5b0e30442e2ba5f78cfaedf76bf1f9)
[trivial cherry-pick issue]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Changes in 4.9.60
workqueue: replace pool->manager_arb mutex with a flag
ALSA: hda/realtek - Add support for ALC236/ALC3204
ALSA: hda - fix headset mic problem for Dell machines with alc236
ceph: unlock dangling spinlock in try_flush_caps()
usb: xhci: Handle error condition in xhci_stop_device()
KVM: PPC: Fix oops when checking KVM_CAP_PPC_HTM
spi: uapi: spidev: add missing ioctl header
spi: bcm-qspi: Fix use after free in bcm_qspi_probe() in error path
fuse: fix READDIRPLUS skipping an entry
xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap()
Input: elan_i2c - add ELAN0611 to the ACPI table
Input: gtco - fix potential out-of-bound access
assoc_array: Fix a buggy node-splitting case
scsi: zfcp: fix erp_action use-before-initialize in REC action trace
scsi: sg: Re-fix off by one in sg_fill_request_table()
drm/amd/powerplay: fix uninitialized variable
can: sun4i: fix loopback mode
can: kvaser_usb: Correct return value in printout
can: kvaser_usb: Ignore CMD_FLUSH_QUEUE_REPLY messages
cfg80211: fix connect/disconnect edge cases
ipsec: Fix aborted xfrm policy dump crash
regulator: fan53555: fix I2C device ids
ecryptfs: fix dereference of NULL user_key_payload
Linux 4.9.60
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit f66665c09a upstream.
In eCryptfs, we failed to verify that the authentication token keys are
not revoked before dereferencing their payloads, which is problematic
because the payload of a revoked key is NULL. request_key() *does* skip
revoked keys, but there is still a window where the key can be revoked
before we acquire the key semaphore.
Fix it by updating ecryptfs_get_key_payload_data() to return
-EKEYREVOKED if the key payload is NULL. For completeness we check this
for "encrypted" keys as well as "user" keys, although encrypted keys
cannot be revoked currently.
Alternatively we could use key_validate(), but since we'll also need to
fix ecryptfs_get_key_payload_data() to validate the payload length, it
seems appropriate to just check the payload pointer.
Fixes: 237fead619 ("[PATCH] ecryptfs: fs/Makefile and fs/Kconfig")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc1111b885 upstream.
The device tree nodes all correctly describe the regulators as
syr827 or syr828, but the I2C device id is currently set to the
wildcard value of syr82x in the driver. This causes udev to fail
to match the driver module with the modalias data from sysfs.
Fix this by replacing the I2C device ids with ones that match the
device tree descriptions, with syr827 and syr828. Tested on
Firefly rk3288 board. The syr82x id was not used anywhere.
Fixes: e80c47bd73 (regulator: fan53555: Export I2C module alias information)
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1137b5e252 upstream.
An independent security researcher, Mohamed Ghannam, has reported
this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
program.
The xfrm_dump_policy_done function expects xfrm_dump_policy to
have been called at least once or it will crash. This can be
triggered if a dump fails because the target socket's receive
buffer is full.
This patch fixes it by using the cb->start mechanism to ensure that
the initialisation is always done regardless of the buffer situation.
Fixes: 12a169e7d8 ("ipsec: Put dumpers on the dump list")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51e13359cd upstream.
If we try to connect while already connected/connecting, but
this fails, we set ssid_len=0 but leave current_bss hanging,
leading to errors.
Check all of this better, first of all ensuring that we can't
try to connect to a different SSID while connected/ing; ensure
that prev_bssid is set for re-association attempts even in the
case of the driver supporting the connect() method, and don't
reset ssid_len in the failure cases.
While at it, also reset ssid_len while disconnecting unless we
were connected and expect a disconnected event, and warn on a
successful connection without ssid_len being set.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1d2d1329a upstream.
To avoid kernel warning "Unhandled message (68)", ignore the
CMD_FLUSH_QUEUE_REPLY message for now.
As of Leaf v2 firmware version v4.1.844 (2017-02-15), flush tx queue is
synchronous. There is a capability bit indicating whether flushing tx
queue is synchronous or asynchronous.
A proper solution would be to query the device for capabilities. If the
synchronous tx flush capability bit is set, we should wait for
CMD_FLUSH_QUEUE_REPLY message, while flushing the tx queue.
Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f65a923e6 upstream.
If the return value from kvaser_usb_send_simple_msg() was non-zero, the
return value from kvaser_usb_flush_queue() was printed in the kernel
warning.
Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 587c3c9f28 upstream.
Commit 109bade9c6 ("scsi: sg: use standard lists for sg_requests")
introduced an off-by-one error in sg_ioctl(), which was fixed by commit
bd46fc406b ("scsi: sg: off by one in sg_ioctl()").
Unfortunately commit 4759df905a ("scsi: sg: factor out
sg_fill_request_table()") moved that code, and reintroduced the
bug (perhaps due to a botched rebase). Fix it again.
Fixes: 4759df905a ("scsi: sg: factor out sg_fill_request_table()")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab31fd0ce6 upstream.
v4.10 commit 6f2ce1c6af ("scsi: zfcp: fix rport unblock race with LUN
recovery") extended accessing parent pointer fields of struct
zfcp_erp_action for tracing. If an erp_action has never been enqueued
before, these parent pointer fields are uninitialized and NULL. Examples
are zfcp objects freshly added to the parent object's children list,
before enqueueing their first recovery subsequently. In
zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action
fields can cause a NULL pointer dereference. Since the kernel can read
from lowcore on s390, it does not immediately cause a kernel page
fault. Instead it can cause hangs on trying to acquire the wrong
erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl()
^bogus^
while holding already other locks with IRQs disabled.
Real life example from attaching lots of LUNs in parallel on many CPUs:
crash> bt 17723
PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800"
LOWCORE INFO:
-psw : 0x0404300180000000 0x000000000038e424
-function : _raw_spin_lock_wait_flags at 38e424
...
#0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp]
#1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp]
#2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp]
#3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp]
#4 [fdde8fe60] kthread at 173550
#5 [fdde8feb8] kernel_thread_starter at 10add2
zfcp_adapter
zfcp_port
zfcp_unit <address>, 0x404040d600000000
scsi_device NULL, returning early!
zfcp_scsi_dev.status = 0x40000000
0x40000000 ZFCP_STATUS_COMMON_RUNNING
crash> zfcp_unit <address>
struct zfcp_unit {
erp_action = {
adapter = 0x0,
port = 0x0,
unit = 0x0,
},
}
zfcp_erp_action is always fully embedded into its container object. Such
container object is never moved in its object tree (only add or delete).
Hence, erp_action parent pointers can never change.
To fix the issue, initialize the erp_action parent pointers before
adding the erp_action container to any list and thus before it becomes
accessible from outside of its initializing function.
In order to also close the time window between zfcp_erp_setup_act()
memsetting the entire erp_action to zero and setting the parent pointers
again, drop the memset and instead explicitly initialize individually
all erp_action fields except for parent pointers. To be extra careful
not to introduce any other unintended side effect, even keep zeroing the
erp_action fields for list and timer. Also double-check with
WARN_ON_ONCE that erp_action parent pointers never change, so we get to
know when we would deviate from previous behavior.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 6f2ce1c6af ("scsi: zfcp: fix rport unblock race with LUN recovery")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea6789980f upstream.
This fixes CVE-2017-12193.
Fix a case in the assoc_array implementation in which a new leaf is
added that needs to go into a node that happens to be full, where the
existing leaves in that node cluster together at that level to the
exclusion of new leaf.
What needs to happen is that the existing leaves get moved out to a new
node, N1, at level + 1 and the existing node needs replacing with one,
N0, that has pointers to the new leaf and to N1.
The code that tries to do this gets this wrong in two ways:
(1) The pointer that should've pointed from N0 to N1 is set to point
recursively to N0 instead.
(2) The backpointer from N0 needs to be set correctly in the case N0 is
either the root node or reached through a shortcut.
Fix this by removing this path and using the split_node path instead,
which achieves the same end, but in a more general way (thanks to Eric
Biggers for spotting the redundancy).
The problem manifests itself as:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: assoc_array_apply_edit+0x59/0xe5
Fixes: 3cb989501c ("Add a generic associative array implementation.")
Reported-and-tested-by: WU Fan <u3536072@connect.hku.hk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a50829479f upstream.
parse_hid_report_descriptor() has a while (i < length) loop, which
only guarantees that there's at least 1 byte in the buffer, but the
loop body can read multiple bytes which causes out-of-bounds access.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 298d275d4d upstream.
In case gntdev_mmap() succeeds only partially in mapping grant pages
it will leave some vital information uninitialized needed later for
cleanup. This will lead to an out of bounds array access when unmapping
the already mapped pages.
So just initialize the data needed for unmapping the pages a little bit
earlier.
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6cdd51404 upstream.
Marios Titas running a Haskell program noticed a problem with fuse's
readdirplus: when it is interrupted by a signal, it skips one directory
entry.
The reason is that fuse erronously updates ctx->pos after a failed
dir_emit().
The issue originates from the patch adding readdirplus support.
Reported-by: Jakob Unterwurzacher <jakobunt@gmail.com>
Tested-by: Marios Titas <redneb@gmx.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 0b05b18381 ("fuse: implement NFS-like readdirplus support")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c0368e4db4 upstream.
There was an inversion in how the error path in bcm_qspi_probe() is done
which would make us trip over a KASAN use-after-free report. Turns out
that qspi->dev_ids does not get allocated until later in the probe
process. Fix this by introducing a new lable: qspi_resource_err which
takes care of cleaning up the SPI master instance.
Fixes: fa236a7ef2 ("spi: bcm-qspi: Add Broadcom MSPI driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2b4a79b88 upstream.
The SPI_IOC_MESSAGE() macro references _IOC_SIZEBITS. Add linux/ioctl.h
to make sure this macro is defined. This fixes the following build
failure of lcdproc with the musl libc:
In file included from .../sysroot/usr/include/sys/ioctl.h:7:0,
from hd44780-spi.c:31:
hd44780-spi.c: In function 'spi_transfer':
hd44780-spi.c:89:24: error: '_IOC_SIZEBITS' undeclared (first use in this function)
status = ioctl(p->fd, SPI_IOC_MESSAGE(1), &xfer);
^
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac64115a66 upstream.
The following program causes a kernel oops:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <linux/kvm.h>
main()
{
int fd = open("/dev/kvm", O_RDWR);
ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_HTM);
}
This happens because when using the global KVM fd with
KVM_CHECK_EXTENSION, kvm_vm_ioctl_check_extension() gets
called with a NULL kvm argument, which gets dereferenced
in is_kvmppc_hv_enabled(). Spotted while reading the code.
Let's use the hv_enabled fallback variable, like everywhere
else in this function.
Fixes: 23528bb21e ("KVM: PPC: Introduce KVM_CAP_PPC_HTM")
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3207c65df upstream.
xhci_stop_device() calls xhci_queue_stop_endpoint() multiple times
without checking the return value. xhci_queue_stop_endpoint() can
return error if the HC is already halted or unable to queue commands.
This can cause a deadlock condition as xhci_stop_device() would
end up waiting indefinitely for a completion for the command that
didn't get queued. Fix this by checking the return value and bailing
out of xhci_stop_device() in case of error. This patch happens to fix
potential memory leaks of the allocated command structures as well.
Fixes: c311e391a7 ("xhci: rework command timeout and cancellation,")
Signed-off-by: Mayank Rana <mrana@codeaurora.org>
Signed-off-by: Jack Pham <jackp@codeaurora.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c2838fbde upstream.
sparse warns:
fs/ceph/caps.c:2042:9: warning: context imbalance in 'try_flush_caps' - wrong count at exit
We need to exit this function with the lock unlocked, but a couple of
cases leave it locked.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f265788c33 upstream.
We have several Dell laptops which use the codec alc236, the headset
mic can't work on these machines. Following the commit 736f20a70, we
add the pin cfg table to make the headset mic work.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 692b48258d upstream.
Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
lockdep:
[ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
[ 1270.473240] -----------------------------------------------------
[ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 1270.474239] (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
[ 1270.474994]
[ 1270.474994] and this task is already holding:
[ 1270.475440] (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
[ 1270.476046] which would create a new lock dependency:
[ 1270.476436] (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
[ 1270.476949]
[ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 1270.477553] (&pool->lock/1){-.-.}
...
[ 1270.488900] to a HARDIRQ-irq-unsafe lock:
[ 1270.489327] (&(&lock->wait_lock)->rlock){+.+.}
...
[ 1270.494735] Possible interrupt unsafe locking scenario:
[ 1270.494735]
[ 1270.495250] CPU0 CPU1
[ 1270.495600] ---- ----
[ 1270.495947] lock(&(&lock->wait_lock)->rlock);
[ 1270.496295] local_irq_disable();
[ 1270.496753] lock(&pool->lock/1);
[ 1270.497205] lock(&(&lock->wait_lock)->rlock);
[ 1270.497744] <Interrupt>
[ 1270.497948] lock(&pool->lock/1);
, which will cause a irq inversion deadlock if the above lock scenario
happens.
The root cause of this safe -> unsafe lock order is the
mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
held.
Unlocking mutex while holding an irq spinlock was never safe and this
problem has been around forever but it never got noticed because the
only time the mutex is usually trylocked while holding irqlock making
actual failures very unlikely and lockdep annotation missed the
condition until the recent b9c16a0e1f ("locking/mutex: Fix
lockdep_assert_held() fail").
Using mutex for pool->manager_arb has always been a bit of stretch.
It primarily is an mechanism to arbitrate managership between workers
which can easily be done with a pool flag. The only reason it became
a mutex is that pool destruction path wants to exclude parallel
managing operations.
This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
and make the destruction path wait for the current manager on a wait
queue.
v2: Drop unnecessary flag clearing before pool destruction as
suggested by Boqun.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The usx2y driver allocates the stream read/write buffers in continuous
pages depending on the stream setup, and this may spew the kernel
warning messages with a stack trace like:
WARNING: CPU: 1 PID: 1846 at mm/page_alloc.c:3883
__alloc_pages_slowpath+0x1ef2/0x2d70
Modules linked in:
CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
....
It may confuse user as if it were any serious error, although this is
no fatal error and the driver handles the error case gracefully.
Since the driver has already some sanity check of the given size (128
and 256 pages), it can't pass any crazy value. So it's merely page
fragmentation.
This patch adds __GFP_NOWARN to each caller for suppressing such
kernel warnings. The original issue was spotted by syzkaller.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andres Oportus <andresoportus@google.com>
Bug: 64827499
Url: https://lkml.org/lkml/2017/10/2/234
Change-Id: I32a4b702cd5c9cd4b82328ac6bc87c435146d44d
We should avoid using the space character when passing arguments to
clang, because static code analysis check tool such as sparse may
misinterpret the arguments followed by spaces as build targets hence
cause the build to fail.
Signed-off-by: David Lin <dtwlin@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from linux-kbuild commit bb3f38c3c5)
Bug: 66969589
Change-Id: I055719cbc89e7a12d1f46f0a9c2152738a278d2a
[ghackmann@google.com: tweak to preserve AOSP-specific CLANG_TRIPLE]
Signed-off-by: Greg Hackmann <ghackmann@google.com>
In isolation, the original change will break building efistub for ARM64
with gcc. This wasn't an issue upstream due to the earlier change
60f38de7a8 ("efi/libstub: Unify command line param parsing"). That's
now been backported to AOSP too.
This reverts commit 36733457a3.
Change-Id: I99592a19c77821df897bba966b5486cb835745f9
Signed-off-by: Greg Hackmann <ghackmann@google.com>