[ Upstream commit 0100e6bbdb ]
In the arch addition of PF_IO_WORKER, I missed parisc and powerpc for
some reason. Fix that up, ensuring they handle PF_IO_WORKER like they do
PF_KTHREAD in copy_thread().
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Fixes: 4727dc20e0 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD")
Change-Id: I3d0289912eb9e4545fc0b680df6890b6b837ebdd
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dd26e2cec7)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 4727dc20e0 ]
PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the
sense that we don't assign ->set_child_tid with our own structure. Just
ensure that every arch sets up the PF_IO_WORKER threads like kthreads
in the arch implementation of copy_thread().
Change-Id: Iec4a3c42a39f016b323476d7238f3d36aaf0e6cf
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 320c8057ec)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 3e684903a8 ]
A livepatch transition may stall indefinitely when a kvm vCPU is heavily
loaded. To the host, the vCPU task is a user thread which is spending a
very long time in the ioctl(KVM_RUN) syscall. During livepatch
transition, set_notify_signal() will be called on such tasks to
interrupt the syscall so that the task can be transitioned. This
interrupts guest execution, but when xfer_to_guest_mode_work() sees that
TIF_NOTIFY_SIGNAL is set but not TIF_SIGPENDING it concludes that an
exit to user mode is unnecessary, and guest execution is resumed without
transitioning the task for the livepatch.
This handling of TIF_NOTIFY_SIGNAL is incorrect, as set_notify_signal()
is expected to break tasks out of interruptible kernel loops and cause
them to return to userspace. Change xfer_to_guest_mode_work() to handle
TIF_NOTIFY_SIGNAL the same as TIF_SIGPENDING, signaling to the vCPU run
loop that an exit to userpsace is needed. Any pending task_work will be
run when get_signal() is called from exit_to_user_mode_loop(), so there
is no longer any need to run task work from xfer_to_guest_mode_work().
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Petr Mladek <pmladek@suse.com>
Change-Id: If14e86a516403671ccb122cea32cc704f774e8ce
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Message-Id: <20220504180840.2907296-1-sforshee@digitalocean.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 000de389ad)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 66ae0d1e2d ]
fork() fails if signal_pending() is true, but there are two conditions
that can lead to that:
1) An actual signal is pending. We want fork to fail for that one, like
we always have.
2) TIF_NOTIFY_SIGNAL is pending, because the task has pending task_work.
We don't need to make it fail for that case.
Allow fork() to proceed if just task_work is pending, by changing the
signal_pending() check to task_sigpending().
Change-Id: Iec007746b42f5d62581a8b5f6cca4006e707b8e3
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0f735cf52b)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 06af867944 ]
Olivier Langlois has been struggling with coredumps being incompletely written in
processes using io_uring.
Olivier Langlois <olivier@trillion01.com> writes:
> io_uring is a big user of task_work and any event that io_uring made a
> task waiting for that occurs during the core dump generation will
> generate a TIF_NOTIFY_SIGNAL.
>
> Here are the detailed steps of the problem:
> 1. io_uring calls vfs_poll() to install a task to a file wait queue
> with io_async_wake() as the wakeup function cb from io_arm_poll_handler()
> 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL
> 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling
> set_notify_signal()
The coredump code deliberately supports being interrupted by SIGKILL,
and depends upon prepare_signal to filter out all other signals. Now
that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack
in dump_emitted by the coredump code no longer works.
Make the coredump code more robust by explicitly testing for all of
the wakeup conditions the coredump code supports. This prevents
new wakeup conditions from breaking the coredump code, as well
as fixing the current issue.
The filesystem code that the coredump code uses already limits
itself to only aborting on fatal_signal_pending. So it should
not develop surprising wake-up reasons either.
v2: Don't remove the now unnecessary code in prepare_signal.
Cc: stable@vger.kernel.org
Fixes: 12db8b6900 ("entry: Add support for TIF_NOTIFY_SIGNAL")
Reported-by: Olivier Langlois <olivier@trillion01.com>
Change-Id: I84870bf0a620a97af50d9b495dd225f9ee2b66b8
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4b4d2c7992)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit e2c7554cc6 ]
it needs to be added to _TIF_WORK_MASK, or we might not reach
do_work_pending() in the first place...
Fixes: 5a9a8897c2 "alpha: add support for TIF_NOTIFY_SIGNAL"
Change-Id: I9f19ad6bc9833ce0cc103163af7904862425badb
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6e2bce21ac)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit bb12433bf5 ]
Linux 5.11.rcX was failing to boot on ARC HSDK board. Turns out we have
a couple of issues, this being the first one, and I'm to blame as I
didn't pay attention during review.
TIF_NOTIFY_SIGNAL support requires checking multiple TIF_* bits in
kernel return code path. Old code only needed to check a single bit so
BBIT0 <TIF_SIGPENDING> worked. New code needs to check multiple bits so
AND <bit-mask> instruction. So needs to use bit mask variant _TIF_SIGPENDING
Cc: Jens Axboe <axboe@kernel.dk>
Fixes: 53855e1258 ("arc: add support for TIF_NOTIFY_SIGNAL")
Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/34
Change-Id: I00aa9a6c3118f72e90575fb7ca4ebc08300b542e
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit db911277a2)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit f5f4fc4649 ]
Sergei and John both reported that ia64 failed to boot in 5.11, and it
was related to signals. Turns out the ia64 signal handling is a bit odd,
it doesn't check the return value of get_signal() for whether there's a
signal to deliver or not. With the introduction of TIF_NOTIFY_SIGNAL,
then task_work could trigger it.
Fix it by only calling handle_signal() if we actually have a real signal
to deliver. This brings it in line with all other archs, too.
Fixes: b269c229b0 ("ia64: add support for TIF_NOTIFY_SIGNAL")
Reported-by: Sergei Trofimovich <slyich@gmail.com>
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Sergei Trofimovich <slyich@gmail.com>
Change-Id: Id225cfc12645aa47e37c3107670ba440116c9b04
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a1240cc413)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 114518eb64 ]
If the arch supports TIF_NOTIFY_SIGNAL, then use that for TWA_SIGNAL as
it's more efficient than using the signal delivery method. This is
especially true on threaded applications, where ->sighand is shared across
threads, but it's also lighter weight on non-shared cases.
io_uring is a heavy consumer of TWA_SIGNAL based task_work. A test with
threads shows a nice improvement running an io_uring based echo server.
stock kernel:
0.01% <= 0.1 milliseconds
95.86% <= 0.2 milliseconds
98.27% <= 0.3 milliseconds
99.71% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.0 milliseconds
100.00% <= 1.1 milliseconds
100.00% <= 2 milliseconds
100.00% <= 3 milliseconds
100.00% <= 3 milliseconds
1378930.00 requests per second
~1600% CPU
1.38M requests/second, and all 16 CPUs are maxed out.
patched kernel:
0.01% <= 0.1 milliseconds
98.24% <= 0.2 milliseconds
99.47% <= 0.3 milliseconds
99.99% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.2 milliseconds
1666111.38 requests per second
~1450% CPU
1.67M requests/second, and we're no longer just hammering on the sighand
lock. The original reporter states:
"For 5.7.15 my benchmark achieves 1.6M qps and system cpu is at ~80%.
for 5.7.16 or later it achieves only 1M qps and the system cpu is is
at ~100%"
with the only difference there being that TWA_SIGNAL is used
unconditionally in 5.7.16, since it's required to be able to handle the
inability to run task_work if the application is waiting in the kernel
already on an event that needs task_work run to be satisfied. Also see
commit 0ba9c9edcd.
Reported-by: Roman Gershman <romger@amazon.com>
Change-Id: I80788634e6b91012ebf94e0ab6bf897c99f9f732
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20201026203230.386348-5-axboe@kernel.dk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit eb42e7b304)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 12db8b6900 ]
Add TIF_NOTIFY_SIGNAL handling in the generic entry code, which if set,
will return true if signal_pending() is used in a wait loop. That causes an
exit of the loop so that notify_signal tracehooks can be run. If the wait
loop is currently inside a system call, the system call is restarted once
task_work has been processed.
In preparation for only having arch_do_signal() handle syscall restarts if
_TIF_SIGPENDING isn't set, rename it to arch_do_signal_or_restart(). Pass
in a boolean that tells the architecture specific signal handler if it
should attempt to get a signal, or just process a potential syscall
restart.
For !CONFIG_GENERIC_ENTRY archs, add the TIF_NOTIFY_SIGNAL handling to
get_signal(). This is done to minimize the needed architecture changes to
support this feature.
Change-Id: Iec8202baf6ec6ff5d31c339869d8f34af4182677
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20201026203230.386348-3-axboe@kernel.dk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3c295bd2dd)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 53dec2ea74 ]
Assumes current->files->file_lock is already held on invocation. Helps
the caller check the file before removing the fd, if it needs to.
Change-Id: Idd87700a119403ce3867aa52294a14213b505faa
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d2136fc145)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 9fe83c43e7 ]
The function close_fd_get_file is explicitly a variant of
__close_fd[1]. Now that __close_fd has been renamed close_fd, rename
close_fd_get_file to be consistent with close_fd.
When __alloc_fd, __close_fd and __fd_install were introduced the
double underscore indicated that the function took a struct
files_struct parameter. The function __close_fd_get_file never has so
the naming has always been inconsistent. This just cleans things up
so there are not any lingering mentions or references __close_fd left
in the code.
[1] 80cd795630 ("binder: fix use-after-free due to ksys_close() during fdget()")
Link: https://lkml.kernel.org/r/20201120231441.29911-23-ebiederm@xmission.com
Change-Id: I1c759a36dfa09259eff5b09127fde4e041777a3e
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 57b2053036)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit e886663cfd ]
Pass in the struct filename pointers instead of the user string, and
update the three callers to do the same.
This behaves like do_unlinkat(), which also takes a filename struct and
puts it when it is done. Converting callers is then trivial.
Change-Id: Ie23f87f8c6bb18a61254a0848d861ad6fad14232
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 214f80e251)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 5c251e9dc0 ]
This is in preparation for maintaining signal_pending() as the decider of
whether or not a schedule() loop should be broken, or continue sleeping.
This is different than the core signal use cases, which really need to know
whether an actual signal is pending or not. task_sigpending() returns
non-zero if TIF_SIGPENDING is set.
Only core kernel use cases should care about the distinction between
the two, make sure those use the task_sigpending() helper.
Change-Id: I0d8572e173cc4536673da1682c9537db11f68167
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20201026203230.386348-2-axboe@kernel.dk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 52cfde6bbf)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 1e61463cfc ]
To pick the changes in:
99668f6180 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
That don't result in any change in tooling, only silences this perf
build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/openat2.h' differs from latest version at 'include/uapi/linux/openat2.h'
diff -u tools/include/uapi/linux/openat2.h include/uapi/linux/openat2.h
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Change-Id: I990de5703d50cb6aeceea1bcd7bf631b1f9c4484
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0b8cd5d814)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 99668f6180 ]
Now that we support non-blocking path resolution internally, expose it
via openat2() in the struct open_how ->resolve flags. This allows
applications using openat2() to limit path resolution to the extent that
it is already cached.
If the lookup cannot be satisfied in a non-blocking manner, openat2(2)
will return -1/-EAGAIN.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Iddb58268e0a2b8adfc54e56192da43dda1868d8c
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5683caa735)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 7d01ef7585 ]
Initialize them in set_nameidata() and make sure that terminate_walk() clears them
once the pointers become potentially invalid (i.e. we leave RCU mode or drop them
in non-RCU one). Currently we have "path_init() always initializes them and nobody
accesses them outside of path_init()/terminate_walk() segments", which is asking
for trouble.
With that change we would have nd->path.{mnt,dentry}
1) always valid - NULL or pointing to currently allocated objects.
2) non-NULL while we are successfully walking
3) NULL when we are not walking at all
4) contributing to refcounts whenever non-NULL outside of RCU mode.
Fixes: 6c6ec2b0a3 ("fs: add support for LOOKUP_CACHED")
Reported-by: syzbot+c88a7030da47945a3cc3@syzkaller.appspotmail.com
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>
Change-Id: I0532db6ea79fb760d50a88f75e2bb0691c24e93c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0cf0ce8fb5)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit eacd9aa8ce ]
After switching to non-RCU mode, we want nd->depth to match the number
of entries in nd->stack[] that need eventual path_put().
legitimize_links() takes care of that on failures; unfortunately,
failure exits added for LOOKUP_CACHED do not.
We could add the logics for that into those failure exits, both in
try_to_unlazy() and in try_to_unlazy_next(), but since both checks
are immediately followed by legitimize_links() and there's no calls
of legitimize_links() other than those two... It's easier to
move the check (and required handling of nd->depth on failure) into
legitimize_links() itself.
[caught by Jens: ... and since we are zeroing ->depth here, we need
to do drop_links() first]
Fixes: 6c6ec2b0a3 "fs: add support for LOOKUP_CACHED"
Tested-by: Jens Axboe <axboe@kernel.dk>
Change-Id: I6cf685bfce81acb4d68c3991b2a936968a39c739
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 146fe79fff)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 6c6ec2b0a3 ]
io_uring always punts opens to async context, since there's no control
over whether the lookup blocks or not. Add LOOKUP_CACHED to support
just doing the fast RCU based lookups, which we know will not block. If
we can do a cached path resolution of the filename, then we don't have
to always punt lookups for a worker.
During path resolution, we always do LOOKUP_RCU first. If that fails and
we terminate LOOKUP_RCU, then fail a LOOKUP_CACHED attempt as well.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: If3c62e8681cd47bfafaa5a4de05a7e0418c1c718
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c1fe7bd3e1)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit ae66db45fd ]
same as for the previous commit - instead of 0/-ECHILD make
it return true/false, rename to try_to_unlazy_child().
Change-Id: Ie949437504bd8db7f22f78bfbe5d5141e9959cf8
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 36ec31201a)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 8fb0f47a9d ]
In an ideal world, when someone is passed an iov_iter and returns X bytes,
then X bytes would have been consumed/advanced from the iov_iter. But we
have use cases that always consume the entire iterator, a few examples
of that are iomap and bdev O_DIRECT. This means we cannot rely on the
state of the iov_iter once we've called ->read_iter() or ->write_iter().
This would be easier if we didn't always have to deal with truncate of
the iov_iter, as rewinding would be trivial without that. We recently
added a commit to track the truncate state, but that grew the iov_iter
by 8 bytes and wasn't the best solution.
Implement a helper to save enough of the iov_iter state to sanely restore
it after we've called the read/write iterator helpers. This currently
only works for IOVEC/BVEC/KVEC as that's all we need, support for other
iterator types are left as an exercise for the reader.
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/
Bug: 268174392
Change-Id: Iab4de49932dea2823db03bcef673f726bcef4a9f
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e86db87191)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit cc440e8738 ]
Provide a generic helper for setting up an io_uring worker. Returns a
task_struct so that the caller can do whatever setup is needed, then call
wake_up_new_task() to kick it into gear.
Add a kernel_clone_args member, io_thread, which tells copy_process() to
mark the task with PF_IO_WORKER.
Change-Id: I670f155fc4ac1b93824391292f4822e32671215b
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1500fed008)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>