commit 48eaeb7664 upstream.
We've moved the override and firmware EDID (simply "override EDID" from
now on) handling to the low level drm_do_get_edid() function in order to
transparently use the override throughout the stack. The idea is that
you get the override EDID via the ->get_modes() hook.
Unfortunately, there are scenarios where the DDC probe in drm_get_edid()
called via ->get_modes() fails, although the preceding ->detect()
succeeds.
In the case reported by Paul Wise, the ->detect() hook,
intel_crt_detect(), relies on hotplug detect, bypassing the DDC. In the
case reported by Ilpo Järvinen, there is no ->detect() hook, which is
interpreted as connected. The subsequent DDC probe reached via
->get_modes() fails, and we don't even look at the override EDID,
resulting in no modes being added.
Because drm_get_edid() is used via ->detect() all over the place, we
can't trivially remove the DDC probe, as it leads to override EDID
effectively meaning connector forcing. The goal is that connector
forcing and override EDID remain orthogonal.
Generally, the underlying problem here is the conflation of ->detect()
and ->get_modes() via drm_get_edid(). The former should just detect, and
the latter should just get the modes, typically via reading the EDID. As
long as drm_get_edid() is used in ->detect(), it needs to retain the DDC
probe. Or such users need to have a separate DDC probe step first.
The EDID caching between ->detect() and ->get_modes() done by some
drivers is a further complication that prevents us from making
drm_do_get_edid() adapt to the two cases.
Work around the regression by falling back to a separate attempt at
getting the override EDID at drm_helper_probe_single_connector_modes()
level. With a working DDC and override EDID, it'll never be called; the
override EDID will come via ->get_modes(). There will still be a failing
DDC probe attempt in the cases that require the fallback.
v2:
- Call drm_connector_update_edid_property (Paul)
- Update commit message about EDID caching (Daniel)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107583
Reported-by: Paul Wise <pabs3@bonedaddy.net>
Cc: Paul Wise <pabs3@bonedaddy.net>
References: http://mid.mail-archive.com/alpine.DEB.2.20.1905262211270.24390@whs-18.cs.helsinki.fi
Reported-by: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
Cc: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
References: 15f080f08d ("drm/edid: respect connector force for drm_get_edid ddc probe")
Fixes: 53fd40a90f ("drm: handle override and firmware EDID at drm_do_get_edid() level")
Cc: <stable@vger.kernel.org> # v4.15+ 56a2b7f2a3 drm/edid: abstract override/firmware EDID retrieval
Cc: <stable@vger.kernel.org> # v4.15+
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Tested-by: Paul Wise <pabs3@bonedaddy.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190610093054.28445-1-jani.nikula@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78f4e932f7 upstream.
Adric Blake reported the following warning during suspend-resume:
Enabling non-boot CPUs ...
x86: Booting SMP configuration:
smpboot: Booting Node 0 Processor 1 APIC 0x2
unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
Call Trace:
intel_set_tfa
intel_pmu_cpu_starting
? x86_pmu_dead_cpu
x86_pmu_starting_cpu
cpuhp_invoke_callback
? _raw_spin_lock_irqsave
notify_cpu_starting
start_secondary
secondary_startup_64
microcode: sig=0x806ea, pf=0x80, revision=0x96
microcode: updated to revision 0xb4, date = 2019-04-01
CPU1 is up
The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
by microcode. The log above shows that the microcode loader callback
happens after the PMU restoration, leading to the conjecture that
because the microcode hasn't been updated yet, that MSR is not present
yet, leading to the #GP.
Add a microcode loader-specific hotplug vector which comes before
the PERF vectors and thus executes earlier and makes sure the MSR is
present.
Fixes: 400816f60c ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: Adric Blake <promarbler14@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: x86@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18fa84a2db upstream.
A PF_EXITING task can stay associated with an offline css. If such
task calls task_get_css(), it can get stuck indefinitely. This can be
triggered by BSD process accounting which writes to a file with
PF_EXITING set when racing against memcg disable as in the backtrace
at the end.
After this change, task_get_css() may return a css which was already
offline when the function was called. None of the existing users are
affected by this change.
INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
...
NMI backtrace for cpu 0
...
Call Trace:
<IRQ>
dump_stack+0x46/0x68
nmi_cpu_backtrace.cold.2+0x13/0x57
nmi_trigger_cpumask_backtrace+0xba/0xca
rcu_dump_cpu_stacks+0x9e/0xce
rcu_check_callbacks.cold.74+0x2af/0x433
update_process_times+0x28/0x60
tick_sched_timer+0x34/0x70
__hrtimer_run_queues+0xee/0x250
hrtimer_interrupt+0xf4/0x210
smp_apic_timer_interrupt+0x56/0x110
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:balance_dirty_pages_ratelimited+0x28f/0x3d0
...
btrfs_file_write_iter+0x31b/0x563
__vfs_write+0xfa/0x140
__kernel_write+0x4f/0x100
do_acct_process+0x495/0x580
acct_process+0xb9/0xdb
do_exit+0x748/0xa00
do_group_exit+0x3a/0xa0
get_signal+0x254/0x560
do_signal+0x23/0x5c0
exit_to_usermode_loop+0x5d/0xa0
prepare_exit_to_usermode+0x53/0x80
retint_user+0x8/0x8
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v4.2+
Fixes: ec438699a9 ("cgroup, block: implement task_get_css() and use it in bio_associate_current()")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f3e2bf008 upstream.
Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.
This forces the stack to send packets with a very high network/cpu
overhead.
Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.
In some cases, it can be useful to increase the minimal value
to a saner value.
We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.
Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f1 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.
We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f070ef2ac6 upstream.
Jonathan Looney reported that a malicious peer can force a sender
to fragment its retransmit queue into tiny skbs, inflating memory
usage and/or overflow 32bit counters.
TCP allows an application to queue up to sk_sndbuf bytes,
so we need to give some allowance for non malicious splitting
of retransmit queue.
A new SNMP counter is added to monitor how many times TCP
did not allow to split an skb if the allowance was exceeded.
Note that this counter might increase in the case applications
use SO_SNDBUF socket option to lower sk_sndbuf.
CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the
socket is already using more than half the allowed space
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b4929f65b upstream.
Jonathan Looney reported that TCP can trigger the following crash
in tcp_shifted_skb() :
BUG_ON(tcp_skb_pcount(skb) < pcount);
This can happen if the remote peer has advertized the smallest
MSS that linux TCP accepts : 48
An skb can hold 17 fragments, and each fragment can hold 32KB
on x86, or 64KB on PowerPC.
This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
can overflow.
Note that tcp_sendmsg() builds skbs with less than 64KB
of payload, so this problem needs SACK to be enabled.
SACK blocks allow TCP to coalesce multiple skbs in the retransmit
queue, thus filling the 17 fragments to maximal capacity.
CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
Fixes: 832d11c5cd ("tcp: Try to restore large SKBs while SACK processing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 38f092c41c which is
commit d5bb334a8e upstream.
Lots of people have reported issues with this patch, and as there does
not seem to be a fix going into Linus's kernel tree any time soon,
revert the commit in the stable trees so as to get people's machines
working properly again.
Reported-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reported-by: Hans de Goede <hdegoede@redhat.com>
Cc: Jeremy Cline <jeremy@jcline.org>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89a4aac0ab upstream.
In the case of a normal sync update, the preparation of framebuffers (be
it calling drm_atomic_helper_prepare_planes() or doing setups with
drm_framebuffer_get()) are performed in the new_state and the respective
cleanups are performed in the old_state.
In the case of async updates, the preparation is also done in the
new_state but the cleanups are done in the new_state (because updates
are performed in place, i.e. in the current state).
The current code blocks async udpates when the fb is changed, turning
async updates into sync updates, slowing down cursor updates and
introducing regressions in igt tests with errors of type:
"CRITICAL: completed 97 cursor updated in a period of 30 flips, we
expect to complete approximately 15360 updates, with the threshold set
at 7680"
Fb changes in async updates were prevented to avoid the following scenario:
- Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
- Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
- Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2 (wrong)
Where we have a single call to prepare fb2 but double cleanup call to fb2.
To solve the above problems, instead of blocking async fb changes, we
place the old framebuffer in the new_state object, so when the code
performs cleanups in the new_state it will cleanup the old_fb and we
will have the following scenario instead:
- Async update, oldfb = NULL, newfb = fb1, prepare fb1, no cleanup
- Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb1
- Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
Where calls to prepare/cleanup are balanced.
Cc: <stable@vger.kernel.org> # v4.14+
Fixes: 25dc194b34 ("drm: Block fb changes for async plane updates")
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190603165610.24614-6-helen.koike@collabora.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec527c3180 upstream.
As explained in
0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
we always, no matter what, have to bring up x86 HT siblings during boot at
least once in order to avoid first MCE bringing the system to its knees.
That means that whenever 'nosmt' is supplied on the kernel command-line,
all the HT siblings are as a result sitting in mwait or cpudile after
going through the online-offline cycle at least once.
This causes a serious issue though when a kernel, which saw 'nosmt' on its
commandline, is going to perform resume from hibernation: if the resume
from the hibernated image is successful, cr3 is flipped in order to point
to the address space of the kernel that is being resumed, which in turn
means that all the HT siblings are all of a sudden mwaiting on address
which is no longer valid.
That results in triple fault shortly after cr3 is switched, and machine
reboots.
Fix this by always waking up all the SMT siblings before initiating the
'restore from hibernation' process; this guarantees that all the HT
siblings will be properly carried over to the resumed kernel waiting in
resume_play_dead(), and acted upon accordingly afterwards, based on the
target kernel configuration.
Symmetricaly, the resumed kernel has to push the SMT siblings to mwait
again in case it has SMT disabled; this means it has to online all
the siblings when resuming (so that they come out of hlt) and offline
them again to let them reach mwait.
Cc: 4.19+ <stable@vger.kernel.org> # v4.19+
Debugged-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea84b580b9 upstream.
Instead of running with interrupts disabled, use a semaphore. This should
make it easier for backends that may need to sleep (e.g. EFI) when
performing a write:
|BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
|in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum
|Preemption disabled at:
|[<ffffffff99d60512>] pstore_dump+0x72/0x330
|CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D 4.20.0-rc3 #45
|Call Trace:
| dump_stack+0x4f/0x6a
| ___might_sleep.cold.91+0xd3/0xe4
| __might_sleep+0x50/0x90
| wait_for_completion+0x32/0x130
| virt_efi_query_variable_info+0x14e/0x160
| efi_query_variable_store+0x51/0x1a0
| efivar_entry_set_safe+0xa3/0x1b0
| efi_pstore_write+0x109/0x140
| pstore_dump+0x11c/0x330
| kmsg_dump+0xa4/0xd0
| oops_exit+0x22/0x30
...
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: 21b3ddd39f ("efi: Don't use spinlocks for efi vars")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66be4e66a7 upstream.
Herbert Xu pointed out that commit bb73c52bad ("rcu: Don't disable
preemption for Tiny and Tree RCU readers") was incorrect in making the
preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT.
If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is
a no-op, but still is a compiler barrier.
And RCU locking still _needs_ that compiler barrier.
It is simply fundamentally not true that RCU locking would be a complete
no-op: we still need to guarantee (for example) that things that can
trap and cause preemption cannot migrate into the RCU locked region.
The way we do that is by making it a barrier.
See for example commit 386afc9114 ("spinlocks and preemption points
need to be at least compiler barriers") from back in 2013 that had
similar issues with spinlocks that become no-ops on UP: they must still
constrain the compiler from moving other operations into the critical
region.
Now, it is true that a lot of RCU operations already use READ_ONCE() and
WRITE_ONCE() (which in practice likely would never be re-ordered wrt
anything remotely interesting), but it is also true that that is not
globally the case, and that it's not even necessarily always possible
(ie bitfields etc).
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: bb73c52bad ("rcu: Don't disable preemption for Tiny and Tree RCU readers")
Cc: stable@kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e52972c11d ]
Commit 38030d7cb7 ("net/tls: avoid NULL-deref on resync during device removal")
tried to fix a potential NULL-dereference by taking the
context rwsem. Unfortunately the RX resync may get called
from soft IRQ, so we can't use the rwsem to protect from
the device disappearing. Because we are guaranteed there
can be only one resync at a time (it's called from strparser)
use a bit to indicate resync is busy and make device
removal wait for the bit to get cleared.
Note that there is a leftover "flags" field in struct
tls_context already.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b7999b0772 ]
In Jianlin's testing, netperf was broken with 'Connection reset by peer',
as the cookie check failed in rt6_check() and ip6_dst_check() always
returned NULL.
It's caused by Commit 93531c6743 ("net/ipv6: separate handling of FIB
entries from dst based routes"), where the cookie can be got only when
'c1'(see below) for setting dst_cookie whereas rt6_check() is called
when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
(!c1) will check the 'from' cookie, this patch is to remove the c1 check
in rt6_get_cookie(), so that the dst_cookie can always be set properly.
c1:
(rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f75118800 upstream.
If overlay properties #address-cells or #size-cells are already in
the live devicetree for any given node, then the values in the
overlay must match the values in the live tree.
If the properties are already in the live tree then there is no
need to create a changeset entry to add them since they must
have the same value. This reduces the memory used by the
changeset and eliminates a possible memory leak.
Tested-by: Alan Tull <atull@kernel.org>
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6e60d8498 upstream.
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target.
In particular, it triggers for all the init/cleanup_module
aliases in the kernel (defined by the module_init/exit macros),
ending up being very noisy.
These aliases point to the __init/__exit functions of a module,
which are defined as __cold (among other attributes). However,
the aliases themselves do not have the __cold attribute.
Since the compiler behaves differently when compiling a __cold
function as well as when compiling paths leading to calls
to __cold functions, the warning is trying to point out
the possibly-forgotten attribute in the alias.
In order to keep the warning enabled, we decided to silence
this case. Ideally, we would mark the aliases directly
as __init/__exit. However, there are currently around 132 modules
in the kernel which are missing __init/__exit in their init/cleanup
functions (either because they are missing, or for other reasons,
e.g. the functions being called from somewhere else); and
a section mismatch is a hard error.
A conservative alternative was to mark the aliases as __cold only.
However, since we would like to eventually enforce __init/__exit
to be always marked, we chose to use the new __copy function
attribute (introduced by GCC 9 as well to deal with this).
With it, we copy the attributes used by the target functions
into the aliases. This way, functions that were not marked
as __init/__exit won't have their aliases marked either,
and therefore there won't be a section mismatch.
Note that the warning would go away marking either the extern
declaration, the definition, or both. However, we only mark
the definition of the alias, since we do not want callers
(which only see the declaration) to be compiled as if the function
was __cold (and therefore the paths leading to those calls
would be assumed to be unlikely).
Link: https://lore.kernel.org/lkml/20190123173707.GA16603@gmail.com/
Link: https://lore.kernel.org/lkml/20190206175627.GA20399@gmail.com/
Suggested-by: Martin Sebor <msebor@gcc.gnu.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c0d9782f5b upstream.
From the GCC manual:
copy
copy(function)
The copy attribute applies the set of attributes with which function
has been declared to the declaration of the function to which
the attribute is applied. The attribute is designed for libraries
that define aliases or function resolvers that are expected
to specify the same set of attributes as their targets. The copy
attribute can be used with functions, variables, or types. However,
the kind of symbol to which the attribute is applied (either
function or variable) must match the kind of symbol to which
the argument refers. The copy attribute copies only syntactic and
semantic attributes but not attributes that affect a symbol’s
linkage or visibility such as alias, visibility, or weak.
The deprecated attribute is also not copied.
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target, e.g.:
void __cold f(void) {}
void __alias("f") g(void);
diagnoses:
warning: 'g' specifies less restrictive attribute than
its target 'f': 'cold' [-Wmissing-attributes]
Using __copy(f) we can copy the __cold attribute from f to g:
void __cold f(void) {}
void __copy(f) __alias("f") g(void);
This attribute is most useful to deal with situations where an alias
is declared but we don't know the exact attributes the target has.
For instance, in the kernel, the widely used module_init/exit macros
define the init/cleanup_module aliases, but those cannot be marked
always as __init/__exit since some modules do not have their
functions marked as such.
Suggested-by: Martin Sebor <msebor@gcc.gnu.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e85899637 upstream.
We have a single node system with node 0 disabled:
Scanning NUMA topology in Northbridge 24
Number of physical nodes 2
Skipping disabled node 0
Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]
This causes crashes in memcg when system boots:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
#PF error: [normal kernel read fault]
...
RIP: 0010:list_lru_add+0x94/0x170
...
Call Trace:
d_lru_add+0x44/0x50
dput.part.34+0xfc/0x110
__fput+0x108/0x230
task_work_run+0x9f/0xc0
exit_to_usermode_loop+0xf5/0x100
It is reproducible as far as 4.12. I did not try older kernels. You have
to have a new enough systemd, e.g. 241 (the reason is unknown -- was not
investigated). Cannot be reproduced with systemd 234.
The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.
The root cause are list_lru_memcg_aware checks in the list_lru code. The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.
So fix this by avoiding checks on node 0. Remember the memcg-awareness by
a bool flag in struct list_lru.
Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
Fixes: 60d3fd32a7 ("list_lru: introduce per-memcg lists")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef4d6f6b27 upstream.
The ror32 implementation (word >> shift) | (word << (32 - shift) has
undefined behaviour if shift is outside the [1, 31] range. Similarly
for the 64 bit variants. Most callers pass a compile-time constant
(naturally in that range), but there's an UBSAN report that these may
actually be called with a shift count of 0.
Instead of special-casing that, we can make them DTRT for all values of
shift while also avoiding UB. For some reason, this was already partly
done for rol32 (which was well-defined for [0, 31]). gcc 8 recognizes
these patterns as rotates, so for example
__u32 rol32(__u32 word, unsigned int shift)
{
return (word << (shift & 31)) | (word >> ((-shift) & 31));
}
compiles to
0000000000000020 <rol32>:
20: 89 f8 mov %edi,%eax
22: 89 f1 mov %esi,%ecx
24: d3 c0 rol %cl,%eax
26: c3 retq
Older compilers unfortunately do not do as well, but this only affects
the small minority of users that don't pass constants.
Due to integer promotions, ro[lr]8 were already well-defined for shifts
in [0, 8], and ro[lr]16 were mostly well-defined for shifts in [0, 16]
(only mostly - u16 gets promoted to _signed_ int, so if bit 15 is set,
word << 16 is undefined). For consistency, update those as well.
Link: http://lkml.kernel.org/r/20190410211906.2190-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Vadim Pasternak <vadimp@mellanox.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9666d10a5 upstream.
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
# define HAVE_JUMP_LABEL
#endif
We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
[nc: Fix trivial conflicts in 4.19
arch/xtensa/kernel/jump_label.c doesn't exist yet
Ensured CC_HAVE_ASM_GOTO and HAVE_JUMP_LABEL were sufficiently
eliminated]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81b4568348 upstream.
__compiletime_assert_fallback() is supposed to stop building earlier
by using the negative-array-size method in case the compiler does not
support "error" attribute, but has never worked like that.
You can simply try:
BUILD_BUG_ON(1);
GCC immediately terminates the build, but Clang does not report
anything because Clang does not support the "error" attribute now.
It will later fail at link time, but __compiletime_assert_fallback()
is not working at least.
The root cause is commit 1d6a0d19c8 ("bug.h: prevent double evaluation
of `condition' in BUILD_BUG_ON"). Prior to that commit, BUILD_BUG_ON()
was checked by the negative-array-size method *and* the link-time trick.
Since that commit, the negative-array-size is not effective because
'__cond' is no longer constant. As the comment in <linux/build_bug.h>
says, GCC (and Clang as well) only emits the error for obvious cases.
When '__cond' is a variable,
((void)sizeof(char[1 - 2 * __cond]))
... is not obvious for the compiler to know the array size is negative.
Reverting that commit would break BUILD_BUG() because negative-size-array
is evaluated before the code is optimized out.
Let's give up __compiletime_assert_fallback(). This commit does not
change the current behavior since it just rips off the useless code.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
TLV_SET is called with a data pointer and a len parameter that tells us
how many bytes are pointed to by data. When invoking memcpy() we need
to careful to only copy len bytes.
Previously we would copy TLV_LENGTH(len) bytes which would copy an extra
4 bytes past the end of the data pointer which newer GCC versions
complain about.
In file included from test.c:17:
In function 'TLV_SET',
inlined from 'test' at test.c:186:5:
/usr/include/linux/tipc_config.h:317:3:
warning: 'memcpy' forming offset [33, 36] is out of the bounds [0, 32]
of object 'bearer_name' with type 'char[32]' [-Warray-bounds]
memcpy(TLV_DATA(tlv_ptr), data, tlv_len);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test.c: In function 'test':
test.c::161:10: note:
'bearer_name' declared here
char bearer_name[TIPC_MAX_BEARER_NAME];
^~~~~~~~~~~
We still want to ensure any padding bytes at the end are initialised, do
this with a explicit memset() rather than copy bytes past the end of
data. Apply the same logic to TCM_SET.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit df453700e8 ]
According to Amit Klein and Benny Pinkas, IP ID generation is too weak
and might be used by attackers.
Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
having 64bit key and Jenkins hash is risky.
It is time to switch to siphash and its 128bit keys.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7a8e61f847 ]
Several people reported testing failures after setting CLOCK_REALTIME close
to the limits of the kernel internal representation in nanoseconds,
i.e. year 2262.
The failures are exposed in subsequent operations, i.e. when arming timers
or when the advancing CLOCK_MONOTONIC makes the calculation of
CLOCK_REALTIME overflow into negative space.
Now people start to paper over the underlying problem by clamping
calculations to the valid range, but that's just wrong because such
workarounds will prevent detection of real issues as well.
It is reasonable to force an upper bound for the various methods of setting
CLOCK_REALTIME. Year 2262 is the absolute upper bound. Assume a maximum
uptime of 30 years which is plenty enough even for esoteric embedded
systems. That results in an upper bound of year 2232 for setting the time.
Once that limit is reached in reality this limit is only a small part of
the problem space. But until then this stops people from trying to paper
over the problem at the wrong places.
Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reported-by: Hongbo Yao <yaohongbo@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903231125480.2157@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 58e7515500 ]
As seen on some USB wireless keyboards manufactured by Primax, the HID
parser was using some assumptions that are not always true. In this case
it's s the fact that, inside the scope of a main item, an Usage Page
will always precede an Usage.
The spec is not pretty clear as 6.2.2.7 states "Any usage that follows
is interpreted as a Usage ID and concatenated with the Usage Page".
While 6.2.2.8 states "When the parser encounters a main item it
concatenates the last declared Usage Page with a Usage to form a
complete usage value." Being somewhat contradictory it was decided to
match Window's implementation, which follows 6.2.2.8.
In summary, the patch moves the Usage Page concatenation from the local
item parsing function to the main item parsing function.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Terry Junge <terry.junge@poly.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df1d80aee9 ]
For devices from the SigmaDelta family we need to keep CS low when doing a
conversion, since the device will use the MISO line as a interrupt to
indicate that the conversion is complete.
This is why the driver locks the SPI bus and when the SPI bus is locked
keeps as long as a conversion is going on. The current implementation gets
one small detail wrong though. CS is only de-asserted after the SPI bus is
unlocked. This means it is possible for a different SPI device on the same
bus to send a message which would be wrongfully be addressed to the
SigmaDelta device as well. Make sure that the last SPI transfer that is
done while holding the SPI bus lock de-asserts the CS signal.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Alexandru Ardelean <Alexandru.Ardelean@analog.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4dcabece4c ]
The number of descendant cgroups and the number of dying
descendant cgroups are currently synchronized using the cgroup_mutex.
The number of descendant cgroups will be required by the cgroup v2
freezer, which will use it to determine if a cgroup is frozen
(depending on total number of descendants and number of frozen
descendants). It's not always acceptable to grab the cgroup_mutex,
especially from quite hot paths (e.g. exit()).
To avoid this, let's additionally synchronize these counters using
the css_set_lock.
So, it's safe to read these counters with either cgroup_mutex or
css_set_lock locked, and for changing both locks should be acquired.
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c88e3c7ec ]
commit 2da78092dd "block: Fix dev_t minor allocation lifetime"
specifically moved blk_free_devt(dev->devt) call to part_release()
to avoid reallocating device number before the device is fully
shutdown.
However, it can cause use-after-free on gendisk in get_gendisk().
We use md device as example to show the race scenes:
Process1 Worker Process2
md_free
blkdev_open
del_gendisk
add delete_partition_work_fn() to wq
__blkdev_get
get_gendisk
put_disk
disk_release
kfree(disk)
find part from ext_devt_idr
get_disk_and_module(disk)
cause use after free
delete_partition_work_fn
put_device(part)
part_release
remove part from ext_devt_idr
Before <devt, hd_struct pointer> is removed from ext_devt_idr by
delete_partition_work_fn(), we can find the devt and then access
gendisk by hd_struct pointer. But, if we access the gendisk after
it have been freed, it can cause in use-after-freeon gendisk in
get_gendisk().
We fix this by adding a new helper blk_invalidate_devt() in
delete_partition() and del_gendisk(). It replaces hd_struct
pointer in idr with value 'NULL', and deletes the entry from
idr in part_release() as we do now.
Thanks to Jan Kara for providing the solution and more clear comments
for the code.
Fixes: 2da78092dd ("block: Fix dev_t minor allocation lifetime")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4645d30b5 ]
The test robot reported a wrong assignment of a per-CPU variable which
it detected by using sparse and sent a report. The assignment itself is
correct. The annotation for sparse was wrong and hence the report.
The first pointer is a "normal" pointer and points to the per-CPU memory
area. That means that the __percpu annotation has to be moved.
Move the __percpu annotation to pointer which points to the per-CPU
area. This change affects only the sparse tool (and is ignored by the
compiler).
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f97f8f06a4 ("smpboot: Provide infrastructure for percpu hotplug threads")
Link: http://lkml.kernel.org/r/20190424085253.12178-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f80c5dad7b ]
This commit makes the kernel not send the next queued HCI command until
a command complete arrives for the last HCI command sent to the
controller. This change avoids a problem with some buggy controllers
(seen on two SKUs of QCA9377) that send an extra command complete event
for the previous command after the kernel had already sent a new HCI
command to the controller.
The problem was reproduced when starting an active scanning procedure,
where an extra command complete event arrives for the LE_SET_RANDOM_ADDR
command. When this happends the kernel ends up not processing the
command complete for the following commmand, LE_SET_SCAN_PARAM, and
ultimately behaving as if a passive scanning procedure was being
performed, when in fact controller is performing an active scanning
procedure. This makes it impossible to discover BLE devices as no device
found events are sent to userspace.
This problem is reproducible on 100% of the attempts on the affected
controllers. The extra command complete event can be seen at timestamp
27.420131 on the btmon logs bellow.
Bluetooth monitor ver 5.50
= Note: Linux version 5.0.0+ (x86_64) 0.352340
= Note: Bluetooth subsystem version 2.22 0.352343
= New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0) [hci0] 0.352344
= Open Index: 80:C5:F2:8F:87:84 [hci0] 0.352345
= Index Info: 80:C5:F2:8F:87:84 (Qualcomm) [hci0] 0.352346
@ MGMT Open: bluetoothd (privileged) version 1.14 {0x0001} 0.352347
@ MGMT Open: btmon (privileged) version 1.14 {0x0002} 0.352366
@ MGMT Open: btmgmt (privileged) version 1.14 {0x0003} 27.302164
@ MGMT Command: Start Discovery (0x0023) plen 1 {0x0003} [hci0] 27.302310
Address type: 0x06
LE Public
LE Random
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6 #1 [hci0] 27.302496
Address: 15:60:F2:91:B2:24 (Non-Resolvable)
> HCI Event: Command Complete (0x0e) plen 4 #2 [hci0] 27.419117
LE Set Random Address (0x08|0x0005) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 #3 [hci0] 27.419244
Type: Active (0x01)
Interval: 11.250 msec (0x0012)
Window: 11.250 msec (0x0012)
Own address type: Random (0x01)
Filter policy: Accept all advertisement (0x00)
> HCI Event: Command Complete (0x0e) plen 4 #4 [hci0] 27.420131
LE Set Random Address (0x08|0x0005) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #5 [hci0] 27.420259
Scanning: Enabled (0x01)
Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4 #6 [hci0] 27.420969
LE Set Scan Parameters (0x08|0x000b) ncmd 1
Status: Success (0x00)
> HCI Event: Command Complete (0x0e) plen 4 #7 [hci0] 27.421983
LE Set Scan Enable (0x08|0x000c) ncmd 1
Status: Success (0x00)
@ MGMT Event: Command Complete (0x0001) plen 4 {0x0003} [hci0] 27.422059
Start Discovery (0x0023) plen 1
Status: Success (0x00)
Address type: 0x06
LE Public
LE Random
@ MGMT Event: Discovering (0x0013) plen 2 {0x0003} [hci0] 27.422067
Address type: 0x06
LE Public
LE Random
Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2 {0x0002} [hci0] 27.422067
Address type: 0x06
LE Public
LE Random
Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2 {0x0001} [hci0] 27.422067
Address type: 0x06
LE Public
LE Random
Discovery: Enabled (0x01)
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9419a3191d upstream.
What happens there is that we are replacing file->path.mnt of
a file we'd just opened with a clone and we need the write
count contribution to be transferred from original mount to
new one. That's it. We do *NOT* want any kind of freeze
protection for the duration of switchover.
IOW, we should just use __mnt_{want,drop}_write() for that
switchover; no need to bother with mnt_{want,drop}_write()
there.
Tested-by: Amir Goldstein <amir73il@gmail.com>
Reported-by: syzbot+2a73a6ea9507b7112141@syzkaller.appspotmail.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d65842f712 upstream.
Calling VIDIOC_DQBUF can release the core serialization lock pointed to
by vb2_queue->lock if it has to wait for a new buffer to arrive.
However, if userspace dup()ped the video device filehandle, then it is
possible to read or call DQBUF from two filehandles at the same time.
It is also possible to call REQBUFS from one filehandle while the other
is waiting for a buffer. This will remove all the buffers and reallocate
new ones. Removing all the buffers isn't the problem here (that's already
handled correctly by DQBUF), but the reallocating part is: DQBUF isn't
aware that the buffers have changed.
This is fixed by setting a flag whenever the lock is released while waiting
for a buffer to arrive. And checking the flag where needed so we can return
-EBUSY.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Reported-by: Syzbot <syzbot+4180ff9ca6810b06c1e9@syzkaller.appspotmail.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ede95a63b5 upstream.
Rick reported that the BPF JIT could potentially fill the entire module
space with BPF programs from unprivileged users which would prevent later
attempts to load normal kernel modules or privileged BPF programs, for
example. If JIT was enabled but unsuccessful to generate the image, then
before commit 290af86629 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
we would always fall back to the BPF interpreter. Nowadays in the case
where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort
with a failure since the BPF interpreter was compiled out.
Add a global limit and enforce it for unprivileged users such that in case
of BPF interpreter compiled out we fail once the limit has been reached
or we fall back to BPF interpreter earlier w/o using module mem if latter
was compiled in. In a next step, fair share among unprivileged users can
be resolved in particular for the case where we would fail hard once limit
is reached.
Fixes: 290af86629 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
Fixes: 0a14842f5a ("net: filter: Just In Time compiler for x86-64")
Co-Developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6110222c6 upstream.
Add a callback map_lookup_elem_sys_only() that map implementations
could use over map_lookup_elem() from system call side in case the
map implementation needs to handle the latter differently than from
the BPF data path. If map_lookup_elem_sys_only() is set, this will
be preferred pick for map lookups out of user space. This hook is
used in a follow-up fix for LRU map, but once development window
opens, we can convert other map types from map_lookup_elem() (here,
the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use
the callback to simplify and clean up the latter.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0edd6b64d1 ]
Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 025c65e119 ]
If an xfrmi is associated to a vrf layer 3 master device,
xfrm_policy_check() fails after traffic decapsulation. The input
interface is replaced by the layer 3 master device, and hence
xfrmi_decode_session() can't match the xfrmi anymore to satisfy
policy checking.
Extend ingress xfrmi lookup to honor the original layer 3 slave
device, allowing xfrm interfaces to operate within a vrf domain.
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dbb2483b2a ]
In commit 6a53b75932 ("xfrm: check id proto in validate_tmpl()")
I introduced a check for xfrm protocol, but according to Herbert
IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
it should be removed from validate_tmpl().
And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
protocols, this is why xfrm_state_flush() could still miss
IPPROTO_ROUTING, which leads that those entries are left in
net->xfrm.state_all before exit net. Fix this by replacing
IPSEC_PROTO_ANY with zero.
This patch also extracts the check from validate_tmpl() to
xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
With this, no other protocols should be added into xfrm.
Fixes: 6a53b75932 ("xfrm: check id proto in validate_tmpl()")
Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit bbd84f3365 upstream.
Starting from commit 9c225f2655 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client. See commit 10dce8af34 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d0 ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.
To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3Dhttps://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.
This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ec73791a6 upstream.
Due to an erratum in some Pericom PCIe-to-PCI bridges in reverse mode
(conventional PCI on primary side, PCIe on downstream side), the Retrain
Link bit needs to be cleared manually to allow the link training to
complete successfully.
If it is not cleared manually, the link training is continuously restarted
and no devices below the PCI-to-PCIe bridge can be accessed. That means
drivers for devices below the bridge will be loaded but won't work and may
even crash because the driver is only reading 0xffff.
See the Pericom Errata Sheet PI7C9X111SLB_errata_rev1.2_102711.pdf for
details. Devices known as affected so far are: PI7C9X110, PI7C9X111SL,
PI7C9X130.
Add a new flag, clear_retrain_link, in struct pci_dev. Quirks for affected
devices set this bit.
Note that pcie_retrain_link() lives in aspm.c because that's currently the
only place we use it, but this erratum is not specific to ASPM, and we may
retrain links for other reasons in the future.
Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu>
[bhelgaas: apply regardless of CONFIG_PCIEASPM]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit babc250e27 upstream.
Rendering calls may be done simultaneously from the workqueue,
dlfb_ops_write, dlfb_ops_ioctl, dlfb_ops_set_par and dlfb_dpy_deferred_io.
The code is robust enough so that it won't crash on concurrent rendering.
However, concurrent rendering may cause display corruption if the same
pixel is simultaneously being rendered. In order to avoid this corruption,
this patch adds a mutex around the rendering calls.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Bernie Thompson <bernie@plugable.com>
Cc: Ladislav Michl <ladis@linux-mips.org>
Cc: <stable@vger.kernel.org>
[b.zolnierkie: replace "dlfb:" with "uldfb:" in the patch summary]
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b11f9d843 upstream.
If a framebuffer device is used as a console, the rendering calls
(copyarea, fillrect, imageblit) may be done with the console spinlock
held. On udlfb, these function call dlfb_handle_damage that takes a
blocking semaphore before acquiring an URB.
In order to fix the bug, this patch changes the calls copyarea, fillrect
and imageblit to offload USB work to a workqueue.
A side effect of this patch is 3x improvement in console scrolling speed
because the device doesn't have to be updated after each copyarea call.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Bernie Thompson <bernie@plugable.com>
Cc: Ladislav Michl <ladis@linux-mips.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5467a68cbf upstream.
For lockless accesses to dentries we don't have pinned we rely
(among other things) upon having an RCU delay between dropping
the last reference and actually freeing the memory.
On the other hand, for things like pipes and sockets we neither
do that kind of lockless access, nor want to deal with the
overhead of an RCU delay every time a socket gets closed.
So delay was made optional - setting DCACHE_RCUACCESS in ->d_flags
made sure it would happen. We tried to avoid setting it unless
we knew we need it. Unfortunately, that had led to recurring
class of bugs, in which we missed the need to set it.
We only really need it for dentries that are created by
d_alloc_pseudo(), so let's not bother with trying to be smart -
just make having an RCU delay the default. The ones that do
*not* get it set the replacement flag (DCACHE_NORCU) and we'd
better use that sparingly. d_alloc_pseudo() is the only
such user right now.
FWIW, the race that finally prompted that switch had been
between __lock_parent() of immediate subdirectory of what's
currently the root of a disconnected tree (e.g. from
open-by-handle in progress) racing with d_splice_alias()
elsewhere picking another alias for the same inode, either
on outright corrupted fs image, or (in case of open-by-handle
on NFS) that subdirectory having been just moved on server.
It's not easy to hit, so the sky is not falling, but that's
not the first race on similar missed cases and the logics
for settinf DCACHE_RCUACCESS has gotten ridiculously
convoluted.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>