Add fiq_debugger_arm64.c that implements the platform-specific
functions.
Change-Id: I4d8b96777bb8503a93d4eb47bbde8e018740a5bf
Signed-off-by: Colin Cross <ccross@android.com>
[AmitP: Remove unused stackframe -> sp to align with upstream commit
31e43ad3b7 ("arm64: unwind: remove sp from struct stackframe")
Also folded following android-4.9 commit changes into this patch
a33d9f9fa3 ("ANDROID: fiq_debugger: Pass task parameter to unwind_frame()")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
(cherry picked from https://android.googlesource.com/kernel/common:android-4.14
commit c9d6bc7ab0a4360f79418aa16f3cd68ac2f57a40)
IRQ mode already passes in a struct pt_regs from get_irq_regs().
FIQ mode passes in something similar but not identical to a
struct pt_regs - FIQ mode stores the spsr of the interrupted mode
in slot 17, while pt_regs expects orig_r0.
Replace the existing mixture of void *regs, unsigned *regs, and
struct pt_regs * const with const struct pt_regs *. Modify
dump_regs not to print the spsr since it won't be there in a
struct pt_regs anyways. Modify dump_allregs to highlight the
mode that was interrupted, making spsr easy to find there.
Change-Id: Ibfe1723d702306c7605fd071737d7be9ee9d8c12
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
(cherry picked from https://android.googlesource.com/kernel/common:android-4.14
commit bf32f8accce8701e896879fee463e5b07cbeb8b9)
Rename variables and functions in the global namespace to avoid
future collisions.
Change-Id: Ic23a304b0f794efc94cc6d086fddd63231d99c98
Signed-off-by: Colin Cross <ccross@android.com>
[AmitP: Folded following android-4.9 commit changes into this patch
daeea72326 ("ANDROID: fiq_debugger: Build fixes for 4.1")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
(cherry picked from https://android.googlesource.com/kernel/common:android-4.14
commit d241f1dfc9a0b7d97c7b2bce43f88b5c74a3ce44)
debug_console_write calls debug_uart_flush, which will usually wait
until the serial port fifo empties. If another thread is continuously
calling fiq_tty_write, the fifo will constantly be refilled and
debug_uart_flush might never return.
Add a spinlock that is locked in debug_console_write and fiq_tty_write
to ensure they can't run at the same time. This has an extra advantage
of preventing lines from the console and tty from being mixed together.
Also reduce the size returned by fiq_tty_write_room to keep the time
spent with the spinlock held to a reasonable value.
In addition, make sure fiq context can't loop forever by never calling
debug_uart_flush when the console is enabled.
Change-Id: I5712b01f740ca0c84f680d2032c9fa16b7656939
Signed-off-by: Colin Cross <ccross@android.com>
[AmitP: Folded following android-4.9 commit changes into this patch
b4ffddec34 ("ANDROID: ARM: fiq_debugger: fix uninitialised spin_lock.")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
(cherry picked from https://android.googlesource.com/kernel/common:android-4.14
commit a8be462329b3c4a22097bf07d3b02a1884a8db09)
kernel_restart cannot be called from interrupt context. Add support for
commands called from a work function, and implement the "reboot" command
there. Also rename the existing irq-mode command to "reset" and change
it to use machine_restart instead of kernel_restart.
Change-Id: I3c423147c01db03d89e95a5b99096ca89462079f
Signed-off-by: Colin Cross <ccross@android.com>
[AmitP: Folded following android-4.9 commit changes into this patch
3b61956a41 ("ANDROID: fiq_debugger: Fix minor bug in code")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
(cherry picked from https://android.googlesource.com/kernel/common:android-4.14
commit cb5774b50389fc588d532e0864a0c1bc0010ac2a)
Fix setting up consoles on multiple fiq debugger devices by
splitting the tty driver init into the initcall, and initializing
the single tty device during probe. Has the side effect of moving
the tty device node to /dev/ttyFIQx, where x is the platform device
id, which should normally match the serial port.
To avoid having to pass a different console=/dev/ttyFIQx for every
device, make the fiq debugger a preferred console that will be used
by default if no console was passed on the command line.
Change-Id: I6cc2670628a41e84615859bc96adba189966d647
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
(cherry picked from https://android.googlesource.com/kernel/common:android-4.14
commit aecfa53f35c667ce33b4af14d996bfe51c3e614e)
Adds polling tty ops to the fiq debugger console tty, which allows
kgdb to run against an fiq debugger console.
Add a check in do_sysrq to prevent enabling kgdb from the fiq
debugger unless a flag (writable only by root) has been set. This
should make it safe to enable KGDB on a production device.
Also add a shortcut to enable the console and kgdb together, to
allow kgdb to be enabled when the shell on the console is not
responding.
Change-Id: Ifc65239ca96c9887431a6a36b9b44a539002f544
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
(cherry picked from https://android.googlesource.com/kernel/common:android-4.14
commit 992dbb2a9acf1cde972157b509f6b9f01aae69d2)
Change-Id: Ibb536c88f0dbaf4766d0599296907e35e42cbfd6
Signed-off-by: Iliyan Malchev <malchev@google.com>
Signed-off-by: Arve Hjønnevåg <arve@android.com>
[AmitP: Upstream commit 2a1f062a4a ("sched/headers: Move signal wakeup &
sigpending methods from <linux/sched.h> into <linux/sched/signal.h>")
moved few signal definitions to <linux/sched/signal.h>. Hence include
that instead of <linux/sched.h>.
Also folded following android-4.9 commit changes into this patch
69279c68ee ("ANDROID: ARM: fiq_debugger: fix compiling for v3.3")
d05890d37e ("ANDROID: ARM: fiq_debugger: Fix to compile on 3.7")
9de62e0d3f ("ANDROID: ARM: fiq_debugger: Use kmsg_dumper to dump kernel logs")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
(cherry picked from https://android.googlesource.com/kernel/common:android-4.14
commit 05e1c7f6172c591c3a8bb833eeea3cd0d77a596e)
128+64KB at the beginning of RAM reserved for ATF.
128KB for pstore.
Change-Id: I1306daec44c65258ff6668f6760be4981d7ca932
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
There is not need to build mach-rockchip when PSCI is enabled.
Save about 5K text, 7K data and fix compilation error for THUMB2_KERNEL.
Change-Id: Ieb17867592d7d49a8b983dc5c7e8d1d1df14d864
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
resource.img include logo optional.
boot.img: Image + resource.img
zboot.img: zImage + resource.img
Change-Id: I4da1ecd12200e3270f116c3fadcfa7ecbe30d7f9
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Eddie Cai <eddie.cai.linux@gmail.com>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
zboot.img: Image.lz4 + resouorce.img
boot.img: Image + resource.img
The ramdisk name should be ramdisk.img.
The ramdisk.img should be placed on $(objtree).
Support symbolic link.
Change-Id: Iaf50be16e321b13bbe45b9beb335d542e00507cf
Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
The lz4 Legacy format(which specified by -l) is
not supported by U-Boot.
Change-Id: I6b94881117b59384daca4efd796c933e8dc9e5a6
Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
After e08d6de4e5 ("kbuild: remove kbuild cache"), shell-cached is not
work anymore.
Fixes: d9e69500bc ("Revert "x86: Force asm-goto"")
Change-Id: Id6a4bdaacc8c2f1d5ace566941799d1e352bc028
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
Changes in 4.19.20
Fix "net: ipv4: do not handle duplicate fragments as overlapping"
drm/msm/gpu: fix building without debugfs
ipv6: Consider sk_bound_dev_if when binding a socket to an address
ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
ipvlan, l3mdev: fix broken l3s mode wrt local routes
l2tp: copy 4 more bytes to linear part if necessary
l2tp: fix reading optional fields of L2TPv3
net: ip_gre: always reports o_key to userspace
net: ip_gre: use erspan key field for tunnel lookup
net/mlx4_core: Add masking for a few queries on HCA caps
netrom: switch to sock timer API
net/rose: fix NULL ax25_cb kernel panic
net: set default network namespace in init_dummy_netdev()
ravb: expand rx descriptor data to accommodate hw checksum
sctp: improve the events for sctp stream reset
tun: move the call to tun_set_real_num_queues
ucc_geth: Reset BQL queue when stopping device
vhost: fix OOB in get_rx_bufs()
net: ip6_gre: always reports o_key to userspace
sctp: improve the events for sctp stream adding
net/mlx5e: Allow MAC invalidation while spoofchk is ON
ip6mr: Fix notifiers call on mroute_clean_tables()
Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
sctp: set chunk transport correctly when it's a new asoc
sctp: set flow sport from saddr only when it's 0
virtio_net: Don't enable NAPI when interface is down
virtio_net: Don't call free_old_xmit_skbs for xdp_frames
virtio_net: Fix not restoring real_num_rx_queues
virtio_net: Fix out of bounds access of sq
virtio_net: Don't process redirected XDP frames when XDP is disabled
virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
virtio_net: Differentiate sk_buff and xdp_frame on freeing
CIFS: Do not count -ENODATA as failure for query directory
CIFS: Fix trace command logging for SMB2 reads and writes
CIFS: Do not consider -ENODATA as stat failure for reads
fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
selftests/seccomp: Enhance per-arch ptrace syscall skip tests
NFS: Fix up return value on fatal errors in nfs_page_async_flush()
ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
arm64: Do not issue IPIs for user executable ptes
arm64: hyp-stub: Forbid kprobing of the hyp-stub
arm64: hibernate: Clean the __hyp_text to PoC after resume
gpio: altera-a10sr: Set proper output level for direction_output
gpiolib: fix line event timestamps for nested irqs
gpio: pcf857x: Fix interrupts on multiple instances
gpio: sprd: Fix the incorrect data register
gpio: sprd: Fix incorrect irq type setting for the async EIC
gfs2: Revert "Fix loop in gfs2_rbm_find"
mmc: bcm2835: Fix DMA channel leak on probe error
mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay
ALSA: usb-audio: Add Opus #3 to quirks for native DSD support
ALSA: hda/realtek - Fixed hp_pin no value
IB/hfi1: Remove overly conservative VM_EXEC flag check
platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
Btrfs: fix deadlock when allocating tree block during leaf/node split
btrfs: On error always free subvol_name in btrfs_mount
kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
oom, oom_reaper: do not enqueue same task twice
mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
mm, oom: fix use-after-free in oom_kill_process
mm: hwpoison: use do_send_sig_info() instead of force_sig()
mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
of: Convert to using %pOFn instead of device_node.name
of: overlay: add tests to validate kfrees from overlay removal
of: overlay: add missing of_node_get() in __of_attach_node_sysfs
of: overlay: use prop add changeset entry for property in new nodes
of: overlay: do not duplicate properties from overlay for new nodes
md/raid5: fix 'out of memory' during raid cache recovery
cifs: Always resolve hostname before reconnecting
Linux 4.19.20
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 483cbbeddd upstream.
This fixes the case when md array assembly fails because of raid cache recovery
unable to allocate a stripe, despite attempts to replay stripes and increase
cache size. This happens because stripes released by r5c_recovery_replay_stripes
and raid5_set_cache_size don't become available for allocation immediately.
Released stripes first are placed on conf->released_stripes list and require
md thread to merge them on conf->inactive_list before they can be allocated.
Patch allows final allocation attempt during cache recovery to wait for
new stripes to become availabe for allocation.
Cc: linux-raid@vger.kernel.org
Cc: Shaohua Li <shli@kernel.org>
Cc: linux-stable <stable@vger.kernel.org> # 4.10+
Fixes: b4c625c673 ("md/r5cache: r5cache recovery: part 1")
Signed-off-by: Alexei Naberezhnov <anaberezhnov@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8814dc46bd upstream.
When allocating a new node, add_changeset_node() was duplicating the
properties from the respective node in the overlay instead of
allocating a node with no properties.
When this patch is applied the errors reported by the devictree
unittest from patch "of: overlay: add tests to validate kfrees from
overlay removal" will no longer occur. These error messages are of
the form:
"OF: ERROR: ..."
and the unittest results will change from:
### dt-test ### end of unittest - 203 passed, 7 failed
to
### dt-test ### end of unittest - 210 passed, 0 failed
Tested-by: Alan Tull <atull@kernel.org>
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b4955ba7b upstream.
The changeset entry 'update property' was used for new properties in
an overlay instead of 'add property'.
The decision of whether to use 'update property' was based on whether
the property already exists in the subtree where the node is being
spliced into. At the top level of creating a changeset describing the
overlay, the target node is in the live devicetree, so checking whether
the property exists in the target node returns the correct result.
As soon as the changeset creation algorithm recurses into a new node,
the target is no longer in the live devicetree, but is instead in the
detached overlay tree, thus all properties are incorrectly found to
already exist in the target.
This fix will expose another devicetree bug that will be fixed
in the following patch in the series.
When this patch is applied the errors reported by the devictree
unittest will change, and the unittest results will change from:
### dt-test ### end of unittest - 210 passed, 0 failed
to
### dt-test ### end of unittest - 203 passed, 7 failed
Tested-by: Alan Tull <atull@kernel.org>
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b2c2f5a0e upstream.
There is a matching of_node_put() in __of_detach_node_sysfs()
Remove misleading comment from function header comment for
of_detach_node().
This patch may result in memory leaks from code that directly calls
the dynamic node add and delete functions directly instead of
using changesets.
This commit should result in powerpc systems that dynamically
allocate a node, then later deallocate the node to have a
memory leak when the node is deallocated.
The next commit will fix the leak.
Tested-by: Alan Tull <atull@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 144552c786 upstream.
Add checks:
- attempted kfree due to refcount reaching zero before overlay
is removed
- properties linked to an overlay node when the node is removed
- node refcount > one during node removal in a changeset destroy,
if the node was created by the changeset
After applying this patch, several validation warnings will be
reported from the devicetree unittest during boot due to
pre-existing devicetree bugs. The warnings will be similar to:
OF: ERROR: of_node_release(), unexpected properties in /testcase-data/overlay-node/test-bus/test-unittest11
OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /testcase-data-2/substation@100/
hvac-medium-2
Tested-by: Alan Tull <atull@kernel.org>
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0a352fabc upstream.
We had a race in the old balloon compaction code before b1123ea6d3
("mm: balloon: use general non-lru movable page feature") refactored it
that became visible after backporting 195a8c43e9 ("virtio-balloon:
deflate via a page list") without the refactoring.
The bug existed from commit d6d86c0a7f ("mm/balloon_compaction:
redesign ballooned pages management") till b1123ea6d3 ("mm: balloon:
use general non-lru movable page feature"). d6d86c0a7f
("mm/balloon_compaction: redesign ballooned pages management") was
backported to 3.12, so the broken kernels are stable kernels [3.12 -
4.7].
There was a subtle race between dropping the page lock of the newpage in
__unmap_and_move() and checking for __is_movable_balloon_page(newpage).
Just after dropping this page lock, virtio-balloon could go ahead and
deflate the newpage, effectively dequeueing it and clearing PageBalloon,
in turn making __is_movable_balloon_page(newpage) fail.
This resulted in dropping the reference of the newpage via
putback_lru_page(newpage) instead of put_page(newpage), leading to
page->lru getting modified and a !LRU page ending up in the LRU lists.
With 195a8c43e9 ("virtio-balloon: deflate via a page list")
backported, one would suddenly get corrupted lists in
release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Nowadays this race is no longer possible, but it is hidden behind very
ugly handling of __ClearPageMovable() and __PageMovable().
__ClearPageMovable() will not make __PageMovable() fail, only
PageMovable(). So the new check (__PageMovable(newpage)) will still
hold even after newpage was dequeued by virtio-balloon.
If anybody would ever change that special handling, the BUG would be
introduced again. So instead, make it explicit and use the information
of the original isolated page before migration.
This patch can be backported fairly easy to stable kernels (in contrast
to the refactoring).
Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com
Fixes: d6d86c0a7f ("mm/balloon_compaction: redesign ballooned pages management")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Vratislav Bendel <vbendel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vratislav Bendel <vbendel@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org> [3.12 - 4.7]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6376360ecb upstream.
Currently memory_failure() is racy against process's exiting, which
results in kernel crash by null pointer dereference.
The root cause is that memory_failure() uses force_sig() to forcibly
kill asynchronous (meaning not in the current context) processes. As
discussed in thread https://lkml.org/lkml/2010/6/8/236 years ago for OOM
fixes, this is not a right thing to do. OOM solves this issue by using
do_send_sig_info() as done in commit d2d393099d ("signal:
oom_kill_task: use SEND_SIG_FORCED instead of force_sig()"), so this
patch is suggesting to do the same for hwpoison. do_send_sig_info()
properly accesses to siglock with lock_task_sighand(), so is free from
the reported race.
I confirmed that the reported bug reproduces with inserting some delay
in kill_procs(), and it never reproduces with this patch.
Note that memory_failure() can send another type of signal using
force_sig_mceerr(), and the reported race shouldn't happen on it because
force_sig_mceerr() is called only for synchronous processes (i.e.
BUS_MCEERR_AR happens only when some process accesses to the corrupted
memory.)
Link: http://lkml.kernel.org/r/20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cefc7ef3c8 upstream.
Syzbot instance running on upstream kernel found a use-after-free bug in
oom_kill_process. On further inspection it seems like the process
selected to be oom-killed has exited even before reaching
read_lock(&tasklist_lock) in oom_kill_process(). More specifically the
tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task()
and the put_task_struct within for_each_thread() frees the tsk and
for_each_thread() tries to access the tsk. The easiest fix is to do
get/put across the for_each_thread() on the selected task.
Now the next question is should we continue with the oom-kill as the
previously selected task has exited? However before adding more
complexity and heuristics, let's answer why we even look at the children
of oom-kill selected task? The select_bad_process() has already selected
the worst process in the system/memcg. Due to race, the selected
process might not be the worst at the kill time but does that matter?
The userspace can use the oom_score_adj interface to prefer children to
be killed before the parent. I looked at the history but it seems like
this is there before git history.
Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com
Reported-by: syzbot+7fbbfa368521945f0e3d@syzkaller.appspotmail.com
Fixes: 6b0c81b3be ("mm, oom: reduce dependency on tasklist_lock")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9bcdeb51bd upstream.
Arkadiusz reported that enabling memcg's group oom killing causes
strange memcg statistics where there is no task in a memcg despite the
number of tasks in that memcg is not 0. It turned out that there is a
bug in wake_oom_reaper() which allows enqueuing same task twice which
makes impossible to decrease the number of tasks in that memcg due to a
refcount leak.
This bug existed since the OOM reaper became invokable from
task_will_free_mem(current) path in out_of_memory() in Linux 4.7,
T1@P1 |T2@P1 |T3@P1 |OOM reaper
----------+----------+----------+------------
# Processing an OOM victim in a different memcg domain.
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
out_of_memory()
oom_kill_process(P1)
do_send_sig_info(SIGKILL, @P1)
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T2@P1)
wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
mutex_unlock(&oom_lock)
# Completed processing an OOM victim in a different memcg domain.
spin_lock(&oom_reaper_lock)
# T1P1 is dequeued.
spin_unlock(&oom_reaper_lock)
but memcg's group oom killing made it easier to trigger this bug by
calling wake_oom_reaper() on the same task from one out_of_memory()
request.
Fix this bug using an approach used by commit 855b018325 ("oom,
oom_reaper: disable oom_reaper for oom_kill_allocating_task"). As a
side effect of this patch, this patch also avoids enqueuing multiple
threads sharing memory via task_will_free_mem(current) path.
Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
Fixes: af8e15cc85 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: Jay Kamat <jgkamat@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8fb335e078 upstream.
Currently, exit_ptrace() adds all ptraced tasks in a dead list, then
zap_pid_ns_processes() waits on all tasks in a current pidns, and only
then are tasks from the dead list released.
zap_pid_ns_processes() can get stuck on waiting tasks from the dead
list. In this case, we will have one unkillable process with one or
more dead children.
Thanks to Oleg for the advice to release tasks in find_child_reaper().
Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com
Fixes: 7c8bd2322c ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>