in x86 fault handler, only attempt spf if the vma is anonymous.
In do_handle_mm_fault(), let speculative page faults proceed as long
as they fall into anonymous vmas. This enables the speculative
handling code in __handle_mm_fault() and do_anonymous_page().
In handle_pte_fault(), if vmf->pte is set (the original pte was not
pte_none), catch speculative faults and return VM_FAULT_RETRY as
those cases are not implemented yet. Also assert that do_fault()
is not reached in the speculative case.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-20-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I875106fcfa1084f570c2bf8f24a129bdce55316b
Change do_anonymous_page() to handle the speculative case.
This involves aborting speculative faults if they have to allocate a new
anon_vma, and using pte_map_lock() instead of pte_offset_map_lock()
to complete the page fault.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-19-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I5ad955323faabc142c21f62415db039ac889066a
pte_map_lock() and pte_spinlock() are used by fault handlers to ensure
the pte is mapped and locked before they commit the faulted page to the
mm's address space at the end of the fault.
The functions differ in their preconditions; pte_map_lock() expects
the pte to be unmapped prior to the call, while pte_spinlock() expects
it to be already mapped.
In the speculative fault case, the functions verify, after locking the pte,
that the mmap sequence count has not changed since the start of the fault,
and thus that no mmap lock writers have been running concurrently with
the fault. After that point the page table lock serializes any further
races with concurrent mmap lock writers.
If the mmap sequence count check fails, both functions will return false
with the pte being left unmapped and unlocked.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-18-michel@lespinasse.org/
Conflicts:
include/linux/mm.h
1. Fixed pte_map_lock and pte_spinlock macros not to fail when
CONFIG_SPECULATIVE_PAGE_FAULT=n
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ibd7ccc2ead4fdf29f28c7657b312b2f677ac8836
The speculative path calls speculative_page_walk_begin() before walking
the page table tree to prevent page table reclamation. The logic is
otherwise similar to the non-speculative path, but with additional
restrictions: in the speculative path, we do not handle huge pages or
wiring new pages tables.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-17-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If099534da8b0ac105bbaa5ea4714a6654032592a
Move the code that initializes vmf->pte and vmf->orig_pte from
handle_pte_fault() to its single call site in __handle_mm_fault().
This ensures vmf->pte is now initialized together with the higher levels
of the page table hierarchy. This also prepares for speculative page fault
handling, where the entire page table walk (higher levels down to ptes)
needs special care in the speculative case.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-16-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Id550086fe568331aa71c91468f8314faad993b20
Speculative page faults will use these to protect against races with
page table reclamation.
This could always be handled by disabling local IRQs as the fast GUP
code does; however speculative page faults do not need to protect
against races with THP page splitting, so a weaker rcu read lock is
sufficient in the MMU_GATHER_RCU_TABLE_FREE case.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-15-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I3efe5fc6a5a49d537cf33e8093daeea42550077a
Attempt speculative mm fault handling first, and fall back to the
existing (non-speculative) code if that fails.
The speculative handling closely mirrors the non-speculative logic.
This includes some x86 specific bits such as the access_error() call.
This is why we chose to implement the speculative handling in arch/x86
rather than in common code.
The vma is first looked up and copied, under protection of the rcu
read lock. The mmap lock sequence count is used to verify the
integrity of the copied vma, and passed to do_handle_mm_fault() to
allow checking against races with mmap writers when finalizing the fault.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-14-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I2c078a173ee39f35af16daeee8c6a1466d10c3e8
This prepares for speculative page faults looking up and copying vmas
under protection of an rcu read lock, instead of the usual mmap read lock.
Note - it might also be feasible to just use SLAB_TYPESAFE_BY_RCU when
creating the vm_area_cachep, but that's probably too subtle to consider here.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-12-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I992fddb7c32c61bb4ab10b387f91c4e54c2250ef
The counter's write side is hooked into the existing mmap locking API:
mmap_write_lock() increments the counter to the next (odd) value, and
mmap_write_unlock() increments it again to the next (even) value.
The counter's speculative read side is supposed to be used as follows:
seq = mmap_seq_read_start(mm);
if (seq & 1)
goto fail;
.... speculative handling here ....
if (!mmap_seq_read_check(mm, seq)
goto fail;
This API guarantees that, if none of the "fail" tests abort
speculative execution, the speculative code section did not run
concurrently with any mmap writer.
This is very similar to a seqlock, but both the writer and speculative
readers are allowed to block. In the fail case, the speculative reader
does not spin on the sequence counter; instead it should fall back to
a different mechanism such as grabbing the mmap lock read side.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-11-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I60ba909e789371217cd77c39a562a66e156b68bb
Add a new do_handle_mm_fault function, which extends the existing
handle_mm_fault() API by adding an mmap sequence count, to be used
in the FAULT_FLAG_SPECULATIVE case.
In the initial implementation, FAULT_FLAG_SPECULATIVE always fails
(by returning VM_FAULT_RETRY).
The existing handle_mm_fault() API is kept as a wrapper around
do_handle_mm_fault() so that we do not have to immediately update
every handle_mm_fault() call site.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Conflicts:
mm/memory.c
1. Trivial merge conflict due to folios.
Link: https://lore.kernel.org/all/20220128131006.67712-10-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic07b6d84af3e5d1fcc856e0968f1a6dd1544fa88
Define the new FAULT_FLAG_SPECULATIVE flag, which indicates when we are
attempting speculative fault handling (without holding the mmap lock).
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Conflicts:
include/linux/mm_types.h
1. Merge conflict due to enum fault_flag being defined in mm.h instead of
mm_types.h
Link: https://lore.kernel.org/all/20220128131006.67712-9-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I48ab427dfa4d7bdbe9932588bec7ae99e9e80ae9
This configuration variable will be used to build the code needed to
handle speculative page fault.
This is enabled by default on supported architectures with SMP and MMU set.
The architecture support is needed since the speculative page fault handler
is called from the architecture's page faulting code, and some code has to
be added there to try speculative fault handling first.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-7-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ie1dc3af30bf3949173b126e6469f372c4505ec8e
In do_anonymous_page(), we have separate cases for the zero page vs
allocating new anonymous pages. However, once the pte entry has been
computed, the rest of the handling (mapping and locking the page table,
checking that we didn't lose a race with another page fault handler, etc)
is identical between the two cases.
This change reduces the code duplication between the two cases.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Conflicts:
mm/memory.c
1. Trivial merge conflict caused by folios in mem_cgroup_charge call.
Link: https://lore.kernel.org/all/20220128131006.67712-6-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic19579571925878d632e43aa40b9f50cdf473ee6
In the mmap locking API, the *_killable() functions return an error
(or 0 on success), and the *_trylock() functions return a boolean
(true on success).
Rename the return values "int error" and "bool ok", respectively,
rather than using "ret" for both cases which I find less readable.
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-4-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I19473932c2692833dca89db5b805dbb46970dc66
Change mmap_lock_is_contended to return a bool value, rather than an
int which the callers are then supposed to interpret as a bool. This
is to ensure consistency with other mmap lock API functions (such as
the trylock functions).
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Link: https://lore.kernel.org/all/20220128131006.67712-3-michel@lespinasse.org/
Bug: 161210518
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I7a11ff25a493adc58480b1fe8e3f14e44ad46fb3
When a shadow VM is torn down, its VMID can be reallocated as soon as
the shadow table entry is cleared to NULL. Since tearing down the
stage-2 page-table does not imply TLB invalidation, the TLB could still
contain stale entries from the old VM and the new user of the VMID could
end up seeing erroneous translations.
Invalidate the TLB for the VMID of the VM being torn down prior to
clearing its entry in the shadow table.
Bug: 226312378
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ice44d030bf01a1b7612413ee32440f3f38cb3e4e
The old and new mount user name spaces need to be populated
before calling vfs_rename(). Otherwise vfs_rename will try
to dereference a null ptr and segfault.
Bug: 211066171
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Change-Id: I3656073581218107fc3b1a52ebe7bcfd81a10fc2
The FROMLIST patches merged in aosp/1974918 that add vmalloc support to
KASAN now have a few fixes staged in linux-next/akpm. Sync the changes.
Bug: 217222520
Bug: 222221793
Change-Id: I33dd30e3834a4d1bb8eac611b350004afdb08a74
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
By default, thermal power throttle is always enable, but sometimes it
need to be disabled for a period of time, so add it to meet platform
thermal requirement.
Bug: 209386157
Signed-off-by: Jeson Gao <jeson.gao@unisoc.com>
Change-Id: If9c53a9669eec8e2821d837cfa3c660a9cfbf934
(cherry picked from commit 64999249d5)
Changes in 5.15.30
Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"
arm64: dts: rockchip: fix rk3399-puma-haikou USB OTG mode
xfrm: Check if_id in xfrm_migrate
xfrm: Fix xfrm migrate issues when address family changes
arm64: dts: rockchip: fix rk3399-puma eMMC HS400 signal integrity
arm64: dts: rockchip: align pl330 node name with dtschema
arm64: dts: rockchip: reorder rk3399 hdmi clocks
arm64: dts: agilex: use the compatible "intel,socfpga-agilex-hsotg"
ARM: dts: rockchip: reorder rk322x hmdi clocks
ARM: dts: rockchip: fix a typo on rk3288 crypto-controller
mac80211: refuse aggregations sessions before authorized
MIPS: smp: fill in sibling and core maps earlier
ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSE
Bluetooth: hci_core: Fix leaking sent_cmd skb
can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready
atm: firestream: check the return value of ioremap() in fs_init()
iwlwifi: don't advertise TWT support
drm/vrr: Set VRR capable prop only if it is attached to connector
nl80211: Update bss channel on channel switch for P2P_CLIENT
tcp: make tcp_read_sock() more robust
sfc: extend the locking on mcdi->seqno
bnx2: Fix an error message
kselftest/vm: fix tests build with old libc
x86/module: Fix the paravirt vs alternative order
ice: Fix race condition during interface enslave
Linux 5.15.30
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Icf3c6ca9fb4bb75435d3964e12c0fcb42397b50b
commit 5cb1ebdbc4 upstream.
Commit 5dbbbd01cb ("ice: Avoid RTNL lock when re-creating
auxiliary device") changes a process of re-creation of aux device
so ice_plug_aux_dev() is called from ice_service_task() context.
This unfortunately opens a race window that can result in dead-lock
when interface has left LAG and immediately enters LAG again.
Reproducer:
```
#!/bin/sh
ip link add lag0 type bond mode 1 miimon 100
ip link set lag0
for n in {1..10}; do
echo Cycle: $n
ip link set ens7f0 master lag0
sleep 1
ip link set ens7f0 nomaster
done
```
This results in:
[20976.208697] Workqueue: ice ice_service_task [ice]
[20976.213422] Call Trace:
[20976.215871] __schedule+0x2d1/0x830
[20976.219364] schedule+0x35/0xa0
[20976.222510] schedule_preempt_disabled+0xa/0x10
[20976.227043] __mutex_lock.isra.7+0x310/0x420
[20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
[20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
[20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core]
[20976.261079] ib_register_device+0x40d/0x580 [ib_core]
[20976.266139] irdma_ib_register_device+0x129/0x250 [irdma]
[20976.281409] irdma_probe+0x2c1/0x360 [irdma]
[20976.285691] auxiliary_bus_probe+0x45/0x70
[20976.289790] really_probe+0x1f2/0x480
[20976.298509] driver_probe_device+0x49/0xc0
[20976.302609] bus_for_each_drv+0x79/0xc0
[20976.306448] __device_attach+0xdc/0x160
[20976.310286] bus_probe_device+0x9d/0xb0
[20976.314128] device_add+0x43c/0x890
[20976.321287] __auxiliary_device_add+0x43/0x60
[20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice]
[20976.330109] ice_service_task+0xd0c/0xed0 [ice]
[20976.342591] process_one_work+0x1a7/0x360
[20976.350536] worker_thread+0x30/0x390
[20976.358128] kthread+0x10a/0x120
[20976.365547] ret_from_fork+0x1f/0x40
...
[20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084
[20976.446469] Call Trace:
[20976.448921] __schedule+0x2d1/0x830
[20976.452414] schedule+0x35/0xa0
[20976.455559] schedule_preempt_disabled+0xa/0x10
[20976.460090] __mutex_lock.isra.7+0x310/0x420
[20976.464364] device_del+0x36/0x3c0
[20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice]
[20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice]
[20976.477288] notifier_call_chain+0x47/0x70
[20976.481386] __netdev_upper_dev_link+0x18b/0x280
[20976.489845] bond_enslave+0xe05/0x1790 [bonding]
[20976.494475] do_setlink+0x336/0xf50
[20976.502517] __rtnl_newlink+0x529/0x8b0
[20976.543441] rtnl_newlink+0x43/0x60
[20976.546934] rtnetlink_rcv_msg+0x2b1/0x360
[20976.559238] netlink_rcv_skb+0x4c/0x120
[20976.563079] netlink_unicast+0x196/0x230
[20976.567005] netlink_sendmsg+0x204/0x3d0
[20976.570930] sock_sendmsg+0x4c/0x50
[20976.574423] ____sys_sendmsg+0x1eb/0x250
[20976.586807] ___sys_sendmsg+0x7c/0xc0
[20976.606353] __sys_sendmsg+0x57/0xa0
[20976.609930] do_syscall_64+0x5b/0x1a0
[20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca
1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
is called from ice_service_task() context, aux device is created
and associated device->lock is taken.
2. Command 'ip link ... set master...' calls ice's notifier under
RTNL lock and that notifier calls ice_unplug_aux_dev(). That
function tries to take aux device->lock but this is already taken
by ice_plug_aux_dev() in step 1
3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
taken in step 2
4. Dead-lock
The patch fixes this issue by following changes:
- Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
call in ice_service_task()
- The bit is checked in ice_clear_rdma_cap() and only if it is not set
then ice_unplug_aux_dev() is called. If it is set (in other words
plugging of aux device was requested and ice_plug_aux_dev() is
potentially running) then the function only clears the bit
- Once ice_plug_aux_dev() call (in ice_service_task) is finished
the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
whether it was already cleared by ice_clear_rdma_cap(). If so then
aux device is unplugged.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b773827e36 ]
The error message when I build vm tests on debian10 (GLIBC 2.28):
userfaultfd.c: In function `userfaultfd_pagemap_test':
userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use
in this function); did you mean `MADV_RANDOM'?
if (madvise(area_dst, test_pgsize, MADV_PAGEOUT))
^~~~~~~~~~~~
MADV_RANDOM
This patch includes these newer definitions from UAPI linux/mman.h, is
useful to fix tests build on systems without these definitions in glibc
sys/mman.h.
Link: https://lkml.kernel.org/r/20220227055330.43087-2-zhouchengming@bytedance.com
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f1fb205efb ]
seqno could be read as a stale value outside of the lock. The lock is
already acquired to protect the modification of seqno against a possible
race condition. Place the reading of this value also inside this locking
to protect it against a possible race condition.
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e50b88c4f0 ]
The wdev channel information is updated post channel switch only for
the station mode and not for the other modes. Due to this, the P2P client
still points to the old value though it moved to the new channel
when the channel change is induced from the P2P GO.
Update the bss channel after CSA channel switch completion for P2P client
interface as well.
Signed-off-by: Sreeramya Soratkal <quic_ssramya@quicinc.com>
Link: https://lore.kernel.org/r/1646114600-31479-1-git-send-email-quic_ssramya@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11c57c3ba9 ]
Resending this to properly add it to the patch tracker - thanks for letting
me know, Arnd :)
When ARM is enabled, and BITREVERSE is disabled,
Kbuild gives the following warning:
WARNING: unmet direct dependencies detected for HAVE_ARCH_BITREVERSE
Depends on [n]: BITREVERSE [=n]
Selected by [y]:
- ARM [=y] && (CPU_32v7M [=n] || CPU_32v7 [=y]) && !CPU_32v6 [=n]
This is because ARM selects HAVE_ARCH_BITREVERSE
without selecting BITREVERSE, despite
HAVE_ARCH_BITREVERSE depending on BITREVERSE.
This unmet dependency bug was found by Kismet,
a static analysis tool for Kconfig. Please advise if this
is not the appropriate solution.
Signed-off-by: Julian Braha <julianbraha@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2703def33 ]
After enabling CONFIG_SCHED_CORE (landed during 5.14 cycle),
2-core 2-thread-per-core interAptiv (CPS-driven) started emitting
the following:
[ 0.025698] CPU1 revision is: 0001a120 (MIPS interAptiv (multi))
[ 0.048183] ------------[ cut here ]------------
[ 0.048187] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:6025 sched_core_cpu_starting+0x198/0x240
[ 0.048220] Modules linked in:
[ 0.048233] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc3+ #35 b7b319f24073fd9a3c2aa7ad15fb7993eec0b26f
[ 0.048247] Stack : 817f0000 00000004 327804c8 810eb050 00000000 00000004 00000000 c314fdd1
[ 0.048278] 830cbd64 819c0000 81800000 817f0000 83070bf4 00000001 830cbd08 00000000
[ 0.048307] 00000000 00000000 815fcbc4 00000000 00000000 00000000 00000000 00000000
[ 0.048334] 00000000 00000000 00000000 00000000 817f0000 00000000 00000000 817f6f34
[ 0.048361] 817f0000 818a3c00 817f0000 00000004 00000000 00000000 4dc33260 0018c933
[ 0.048389] ...
[ 0.048396] Call Trace:
[ 0.048399] [<8105a7bc>] show_stack+0x3c/0x140
[ 0.048424] [<8131c2a0>] dump_stack_lvl+0x60/0x80
[ 0.048440] [<8108b5c0>] __warn+0xc0/0xf4
[ 0.048454] [<8108b658>] warn_slowpath_fmt+0x64/0x10c
[ 0.048467] [<810bd418>] sched_core_cpu_starting+0x198/0x240
[ 0.048483] [<810c6514>] sched_cpu_starting+0x14/0x80
[ 0.048497] [<8108c0f8>] cpuhp_invoke_callback_range+0x78/0x140
[ 0.048510] [<8108d914>] notify_cpu_starting+0x94/0x140
[ 0.048523] [<8106593c>] start_secondary+0xbc/0x280
[ 0.048539]
[ 0.048543] ---[ end trace 0000000000000000 ]---
[ 0.048636] Synchronize counters for CPU 1: done.
...for each but CPU 0/boot.
Basic debug printks right before the mentioned line say:
[ 0.048170] CPU: 1, smt_mask:
So smt_mask, which is sibling mask obviously, is empty when entering
the function.
This is critical, as sched_core_cpu_starting() calculates
core-scheduling parameters only once per CPU start, and it's crucial
to have all the parameters filled in at that moment (at least it
uses cpu_smt_mask() which in fact is `&cpu_sibling_map[cpu]` on
MIPS).
A bit of debugging led me to that set_cpu_sibling_map() performing
the actual map calculation, was being invocated after
notify_cpu_start(), and exactly the latter function starts CPU HP
callback round (sched_core_cpu_starting() is basically a CPU HP
callback).
While the flow is same on ARM64 (maps after the notifier, although
before calling set_cpu_online()), x86 started calculating sibling
maps earlier than starting the CPU HP callbacks in Linux 4.14 (see
[0] for the reference). Neither me nor my brief tests couldn't find
any potential caveats in calculating the maps right after performing
delay calibration, but the WARN splat is now gone.
The very same debug prints now yield exactly what I expected from
them:
[ 0.048433] CPU: 1, smt_mask: 0-1
[0] https://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git/commit/?id=76ce7cfe35ef
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 268a491aeb ]
The DWC2 USB controller on the Agilex platform does not support clock
gating, so use the chip specific "intel,socfpga-agilex-hsotg"
compatible.
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>