When handling speculative page fault, the vma->vm_flags and
vma->vm_page_prot fields are read once the page table lock is released. So
there is no more guarantee that these fields would not change in our back.
They will be saved in the vm_fault structure before the VMA is checked for
changes.
This patch also set the fields in hugetlb_no_page() and
__collapse_huge_page_swapin even if it is not need for the callee.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Change-Id: I9821f02ea32ef220b57b8bfd817992bbf71bbb1d
Link: https://lore.kernel.org/lkml/1523975611-15978-13-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
The speculative page fault handler must be protected against anon_vma
changes. This is because page_add_new_anon_rmap() is called during the
speculative path.
In addition, don't try speculative page fault if the VMA don't have an
anon_vma structure allocated because its allocation should be
protected by the mmap_sem.
In __vma_adjust() when importer->anon_vma is set, there is no need to
protect against speculative page faults since speculative page fault
is aborted if the vma->anon_vma is not set.
When calling page_add_new_anon_rmap() vma->anon_vma is necessarily
valid since we checked for it when locking the pte and the anon_vma is
removed once the pte is unlocked. So even if the speculative page
fault handler is running concurrently with do_unmap(), as the pte is
locked in unmap_region() - through unmap_vmas() - and the anon_vma
unlinked later, because we check for the vma sequence counter which is
updated in unmap_page_range() before locking the pte, and then in
free_pgtables() so when locking the pte the change will be detected.
Change-Id: I6c1f3b5c811d1ddd7b3f769082e8bbd40f5b52a0
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-12-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
If a thread is remapping an area while another one is faulting on the
destination area, the SPF handler may fetch the vma from the RB tree before
the pte has been moved by the other thread. This means that the moved ptes
will overwrite those create by the page fault handler leading to page
leaked.
CPU 1 CPU2
enter mremap()
unmap the dest area
copy_vma() Enter speculative page fault handler
>> at this time the dest area is present in the RB tree
fetch the vma matching dest area
create a pte as the VMA matched
Exit the SPF handler
<data written in the new page>
move_ptes()
> it is assumed that the dest area is empty,
> the move ptes overwrite the page mapped by the CPU2.
To prevent that, when the VMA matching the dest area is extended or created
by copy_vma(), it should be marked as non available to the SPF handler.
The usual way to so is to rely on vm_write_begin()/end().
This is already in __vma_adjust() called by copy_vma() (through
vma_merge()). But __vma_adjust() is calling vm_write_end() before returning
which create a window for another thread.
This patch adds a new parameter to vma_merge() which is passed down to
vma_adjust().
The assumption is that copy_vma() is returning a vma which should be
released by calling vm_raw_write_end() by the callee once the ptes have
been moved.
Change-Id: Icd338ad6e9b3c97b7334d3b8d30a8badfa2a4efa
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-11-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
The VMA sequence count has been introduced to allow fast detection of
VMA modification when running a page fault handler without holding
the mmap_sem.
This patch provides protection against the VMA modification done in :
- madvise()
- mpol_rebind_policy()
- vma_replace_policy()
- change_prot_numa()
- mlock(), munlock()
- mprotect()
- mmap_region()
- collapse_huge_page()
- userfaultd registering services
In addition, VMA fields which will be read during the speculative fault
path needs to be written using WRITE_ONCE to prevent write to be split
and intermediate values to be pushed to other CPUs.
Change-Id: Ic36046b7254e538b6baf7144c50ae577ee7f2074
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-10-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence
counts such that we can easily test if a VMA is changed.
The unmap_page_range() one allows us to make assumptions about
page-tables; when we find the seqcount hasn't changed we can assume
page-tables are still valid.
The flip side is that we cannot distinguish between a vma_adjust() and
the unmap_page_range() -- where with the former we could have
re-checked the vma bounds against the address.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Port to 4.12 kernel]
[Build depends on CONFIG_SPECULATIVE_PAGE_FAULT]
[Introduce vm_write_* inline function depending on
CONFIG_SPECULATIVE_PAGE_FAULT]
[Fix lock dependency between mapping->i_mmap_rwsem and vma->vm_sequence by
using vm_raw_write* functions]
[Fix a lock dependency warning in mmap_region() when entering the error
path]
[move sequence initialisation INIT_VMA()]
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-9-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Change-Id: Ibc23ef3b9dbb80323c0f24cb06da34b4c3a8fa71
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Some VMA struct fields need to be initialized once the VMA structure is
allocated.
Currently this only concerns anon_vma_chain field but some other will be
added to support the speculative page fault.
Instead of spreading the initialization calls all over the code, let's
introduce a dedicated inline function.
Change-Id: I9f6b29dc74055354318b548e2b6b22c37d4c61bb
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-8-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
pte_unmap_same() is making the assumption that the page table are still
around because the mmap_sem is held.
This is no more the case when running a speculative page fault and
additional check must be made to ensure that the final page table are still
there.
This is now done by calling pte_spinlock() to check for the VMA's
consistency while locking for the page tables.
This is requiring passing a vm_fault structure to pte_unmap_same() which is
containing all the needed parameters.
As pte_spinlock() may fail in the case of a speculative page fault, if the
VMA has been touched in our back, pte_unmap_same() should now return 3
cases :
1. pte are the same (0)
2. pte are different (VM_FAULT_PTNOTSAME)
3. a VMA's changes has been detected (VM_FAULT_RETRY)
The case 2 is handled by the introduction of a new VM_FAULT flag named
VM_FAULT_PTNOTSAME which is then trapped in cow_user_page().
If VM_FAULT_RETRY is returned, it is passed up to the callers to retry the
page fault while holding the mmap_sem.
Change-Id: Iaccfa0d877334f4343f8b0ec3400af5070ff5864
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Link: https://lore.kernel.org/lkml/1523975611-15978-7-git-send-email-ldufour@linux.vnet.ibm.com/
Bug: 161210518
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
When speculating faults (without holding mmap_sem) we need to validate
that the vma against which we loaded pages is still valid when we're
ready to install the new PTE.
Therefore, replace the pte_offset_map_lock() calls that (re)take the
PTL with pte_map_lock() which can fail in case we find the VMA changed
since we started the fault.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Port to 4.12 kernel]
[Remove the comment about the fault_env structure which has been
implemented as the vm_fault structure in the kernel]
[move pte_map_lock()'s definition upper in the file]
[move the define of FAULT_FLAG_SPECULATIVE later in the series]
[review error path in do_swap_page(), do_anonymous_page() and
wp_page_copy()]
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Bug: 161210518
Link: https://lore.kernel.org/lkml/1523975611-15978-5-git-send-email-ldufour@linux.vnet.ibm.com/
Change-Id: Id6dfae130fbfdd4bb92aa6415d6f1db7ef833266
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
rmqueue internal functions to allocate cma memory should use
alloc_flags instead of gfp_flags because it's more restricted flags
to be considered allocation context. Otherwise, we could allocate
page from CMA area even though current context already disable
cma memory allocation. For example, current allocation context is
limited not to allocate the page from CMA area by PF_MEMALLOC_NOCMA
to prevent longterm pin but it's ignored so the longterm pin page
could be allocated from CMA area.
Bug: 178019362
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I61bd4642c91ecd9153f6c59f89e296e8b515f1ad
The current implementation of fw_devlink is very inefficient because it
tries to get away without creating fwnode links in the name of saving
memory usage. Past attempts to optimize runtime at the cost of memory
usage were blocked with request for data showing that the optimization
made significant improvement for real world scenarios.
We have those scenarios now. There have been several reports of boot
time increase in the order of seconds in this thread [1]. Several OEMs
and SoC manufacturers have also privately reported significant
(350-400ms) increase in boot time due to all the parsing done by
fw_devlink.
So this patch uses all the setup done by the previous patches in this
series to refactor fw_devlink to be more efficient. Most of the code has
been moved out of firmware specific (DT mostly) code into driver core.
This brings the following benefits:
- Instead of parsing the device tree multiple times during bootup,
fw_devlink parses each fwnode node/property only once and creates
fwnode links. The rest of the fw_devlink code then just looks at these
fwnode links to do rest of the work.
- Makes it much easier to debug probe issue due to fw_devlink in the
future. fw_devlink=on blocks the probing of devices if they depend on
a device that hasn't been added yet. With this refactor, it'll be very
easy to tell what that device is because we now have a reference to
the fwnode of the device.
- Much easier to add fw_devlink support to ACPI and other firmware
types. A refactor to move the common bits from DT specific code to
driver core was in my TODO list as a prerequisite to adding ACPI
support to fw_devlink. This series gets that done.
[1] - https://lore.kernel.org/linux-omap/ea02f57e-871d-cd16-4418-c1da4bbc4696@ti.com/
Tested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-17-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f9aa460672)
Bug: 178143855
Bug: 169667932
Change-Id: Iabaf0f1793d1d211286218f010c8a80d76edf060
The semantics of add_links() has changed from creating device link
between devices to creating fwnode links between fwnodes. So, update the
implementation of add_links() to match the new semantics.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-16-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e82a840cb1)
Bug: 178143855
Bug: 169667932
Change-Id: Ieac21cf68ba1cc5289eee69c0891225b82878e74
To check if a device is still waiting for its supplier devices to be
added, we used to check if the devices is in a global
waiting_for_suppliers list. Since the global list will be deleted in
subsequent patches, this patch stops using this check.
Instead, this patch uses a more device specific check. It checks if the
device's fwnode has any fwnode links that haven't been converted to
device links yet.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-14-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 25ac86c6db)
Bug: 178143855
Bug: 169667932
Change-Id: I1ad987cbde98bf0fef9a850942d1910a43b54d11
This function is a wrapper around fwnode_operations.add_links().
This function parses each node in a fwnode tree and create fwnode links
for each of those nodes. The information for creating the fwnode links
(the supplier and consumer fwnode) is obtained by parsing the properties
in each of the fwnodes.
This function also ensures that no fwnode is parsed more than once by
marking the fwnodes as parsed.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-13-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c2c724c868)
Bug: 178143855
Bug: 169667932
Change-Id: I5e934d6a508d96a697016f1451cc595ef671d839
Change the meaning of fwnode_operations.add_links() to just create
fwnode links by parsing the properties of a given fwnode.
This patch doesn't actually make any code changes. To keeps things more
digestable, the actual functional changes come in later patches in this
series.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-12-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 04f63c213b)
Bug: 178143855
Bug: 169667932
Change-Id: Ifca0afb02969667e2395f25047f96d427f2b813b
Add fwnode_is_ancestor_of() helper function to check if a fwnode is an
ancestor of another fwnode.
Add fwnode_get_next_parent_dev() helper function that take as input a
fwnode and finds the closest ancestor fwnode that has a corresponding
struct device and returns that struct device.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-11-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b5d3e2fbcb)
Bug: 178143855
Bug: 169667932
Change-Id: Ia49bc3f92c67ed437825a043794b3edc2689e503
SYNC_STATE_ONLY device links only affect the behavior of sync_state()
callbacks. Specifically, they prevent sync_state() only callbacks from
being called on a device if one or more of its consumers haven't probed.
So, creating a SYNC_STATE_ONLY device link from an already probed
consumer is useless. So, don't allow creating such device links.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-10-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ac66c5bbb4)
Bug: 178143855
Bug: 169667932
Change-Id: Ife356503237b1378f80a84579b932417f257d5d8
Add support for creating supplier-consumer links between fwnodes. It is
intended for internal use the driver core and generic firmware support
code (eg. Device Tree, ACPI), so it is simple by design and the API
provided is limited.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-9-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7b337cb3eb)
Bug: 178143855
Bug: 169667932
Change-Id: I648e8f779b9f64f2544412854c084a38eb7f3c2c
This reverts commit 716a7a2596.
The fw_devlink_pause/resume() APIs added by the commit being reverted
were a first cut attempt at optimizing boot time. But these APIs don't
fully solve the problem and are very fragile (can only be used for the
top level devices being added). This series replaces them with a much
better optimization that works for all device additions and also has the
benefit of reducing the complexity of the firmware (DT, EFI) specific
code and abstracting out common code to driver core.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-7-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c84b90909e)
Bug: 178143855
Bug: 169667932
Change-Id: I6b998bff78a4d3a2939b7db607f5eb174f5d0e69
This reverts commit cec72f3efc.
Commit cec72f3efc ("driver core: Don't do deferred probe in parallel
with kernel_init thread") was fixing a commit 716a7a2596 ("driver
core: fw_devlink: Add support for batching fwnode parsing"). Since the
commit being fixed itself is going to be reverted, the fix can also be
reverted.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-4-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 96d8a9168e)
Bug: 178143855
Bug: 169667932
Change-Id: I66ce5ef3089e61e4c92ddfff785c43ad8c3f379e
This reverts commit ec7bd78498.
This field rename was done to reuse defer_syc list head for multiple
lists. That's not needed anymore and this list head will only be used
for defer sync. So revert this patch to avoid conflicts with the other
reverts coming after this.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3b052a3e30)
Bug: 178143855
Bug: 169667932
Change-Id: Ic5747a9d7e6f659fe29e09bb0f9d7f15efc52746
This reverts commit 2451e74647.
fw_devlink_pause/resume() was an incomplete attempt at boot time
optimization. That's going to get replaced by a much better optimization
at the end of the series. Since fw_devlink_pause/resume() is going away,
changes made for that can also go away.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201121020232.908850-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c95d64012a)
Bug: 178143855
Bug: 169667932
Change-Id: Id9c7a6cc8d6dea4172a88a05fa9853de33500f12
We add a vendor hook for util to freq calculation in schedutil,
so we need to do corresponding change for energy calculation.
android_vh_em_cpu_energy
adjust energy calculation
Bug: 178047619
Signed-off-by: Yun Hsiang <yun.hsiang@mediatek.com>
Change-Id: Iae772cf07881602eea3f27aeb75fba753e7c2635
PD3.0 Spec 6.8.1 describes how to handle Protocol Error. There are
general rules defined in Table 6-61 which regulate incoming Message
handling. If the incoming Message is unexpected, unsupported, or
unrecognized, Protocol Error occurs. Follow the rules to handle these
situations. Also consider PD2.0 connection (PD2.0 Spec Table 6-36) for
backward compatibilities.
To know the types of AMS in all the recipient's states, identify those
AMS who are initiated by the port partner but not yet recorded in the
current code.
Besides, introduce a new state CHUNK_NOT_SUPP to delay the NOT_SUPPORTED
message after receiving a chunked message.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Kyle Tso <kyletso@google.com>
Link: https://lore.kernel.org/r/20210114145053.1952756-3-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8dea75e113https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-next)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia0744de43bb709fcdece70848b72b8edfba8ac2e
Add hook to gather data of bug trap and summarize it with other
information.
Bug: 177483057
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: Ic44fd1c3c4e43f04510a871a8dbeb25aafc45e95
I found one case as follows and the current flush
location doesn't guarantee disabling BKOPS in the
case of requsting device power off.
1) The exceptional event handler is queued.
2) ufs suspend starts with a request of device power off
3) BKOPS is disabled in ufs suspend
4) The queued work for the handler is done and BKOPS
is enabled again.
Bug: 177395905
Change-Id: Ib69b8ebfb97469ff37ef8a70bdbc1ac2e79adcb0
(cherry picked from commit 6948a96a0dhttps://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git 5.11/scsi-fixes)
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Some SoCs could require one scatterlist entry for smaller than
page size, i.e. 4KB. For the cases of dispatching commands
with more than one scatterlist entry and under 4KB size,
They could behave as follows:
Given that a command to read something
from device is dispatched with two scatterlist entries that
are named AAA and BBB. After dispatching, host builds two PRDT
entries and during transmission, device sends just one DATA IN
because device doesn't care on host dma. The host then tranfers
the whole data from start address of the area named AAA.
In consequence, the area that follows AAA would be corrupted.
|<------------->|
+-------+------------ +-------+
+ AAA + (corrupted) ... + BBB +
+-------+------------ +-------+
In this case, you need to force an alignment with page size for
sg entries.
Bug: 177399609
Change-Id: I4a930e35efeed233d09ff935431c3db120c13e4d
(cherry picked from commit 2b2bfc8aa5https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git 5.12/scsi-staging)
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Set optimized values for those timeouts
- FC0_PROTECTION_TIMER
- TC0_REPLAY_TIMER
- AFC0_REQUEST_TIMER
Exynos doesn't yet use traffic class #1.
Bug: 163478167
Change-Id: I66b489f1676c795548a6a2c9f4be1192fc494490
(cherry picked from commit b1d0d2eb89https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git 5.12/scsi-queue)
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Unipro specification says attribute IDs of the following
thing are vendor-specific values, so some SoCs could have
no regions at the defined addresses
- DME_LocalFC0ProtectionTimeOutVal
- DME_LocalTC0ReplayTimeOutVal
- DME_LocalAFC0ReqTimeOutVal
The following things should be set considering the compatibility
between host and device, so those values must not be fixed and
you could use reset values or vendor specific values
- PA_PWRMODEUSERDATA0
- PA_PWRMODEUSERDATA1
- PA_PWRMODEUSERDATA2
- PA_PWRMODEUSERDATA3
- PA_PWRMODEUSERDATA4
- PA_PWRMODEUSERDATA5
Bug: 163478167
Change-Id: Ic56ea3d8182bc27899d17fb45370bd60ce900120
(cherry picked from commit b1d0d2eb89https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git 5.12/scsi-queue)
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>