Export register/unregister kernel/user break hook APIs
for modules to reference.
Bug: 169899018
Signed-off-by: Jonglin Lee <jonglin@google.com>
Change-Id: I0e58f72f97f2952485dd4318f36b68e8afa0e454
Export dump_backtrace symbol for modules to reference.
Bug: 169899018
Signed-off-by: Jonglin Lee <jonglin@google.com>
Change-Id: Ibc6caaabe212cb918d1c0c1e154e3244b28aa73e
Add menu control for VP9 codec levels. A total of 14 levels are
defined for Profile 0 (8bit) and Profile 2 (10bit). Each level
is a set of constrained bitstreams coded with targeted resolutions,
frame rates, and bitrates.
The definitions have been taken from webm project [1].
[1] https://www.webmproject.org/vp9/levels/
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Bug: 169874657
Change-Id: I6d147c6b5ced7562da934399d777bca419421109
(cherry picked from commit 5887f075922547908e9a17c50e383426d72c59b6
https://git.linuxtv.org/svarbanov/media_tree.git venus-for-next-v5.10-part2)
Signed-off-by: Maheshwar Ajja <majja@codeaurora.org>
Adds encoders standard v4l2 control for frame-skip. The control
is a copy of a custom encoder control so that other v4l2 encoder
drivers can use it.
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Bug: 151414336
Change-Id: Ie6b116d9a3c4954bb21f26e82a5a27370cfa1c8d
(cherry picked from commit 1edcd843ef8b198d391ade1828c0c84acc3b9c4c
https://git.linuxtv.org/svarbanov/media_tree.git venus-for-next-v5.10)
Signed-off-by: Maheshwar Ajja <majja@codeaurora.org>
When V4L2_CID_MPEG_VIDEO_BITRATE_MODE value is
V4L2_MPEG_VIDEO_BITRATE_MODE_CQ, encoder will produce
constant quality output indicated by
V4L2_CID_MPEG_VIDEO_CONSTANT_QUALITY control value.
Encoder will choose appropriate quantization parameter
and bitrate to produce requested frame quality level.
Signed-off-by: Maheshwar Ajja <majja@codeaurora.org>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Bug: 151414336
Change-Id: Ifc29387620ad0c164acf81e0320f5d28072dcd61
(cherry picked from commit ecc6e684128db22b89c0cc955e10de24c4163f6e
https://git.linuxtv.org/svarbanov/media_tree.git venus-for-next-v5.10)
Signed-off-by: Maheshwar Ajja <majja@codeaurora.org>
Enable HVC_DCC driver on gki_defconfig to support earlyconsole over
JTAG.
Bug: 169129589
Change-Id: I959c2a29d1ad38f936e170524eb11d4e7675498c
Signed-off-by: Elliot Berman <eberman@codeaurora.org>
Some debuggers, such as Trace32 from Lauterbach GmbH, do not handle
reads/writes from/to DCC on secondary cores. Each core has its
own DCC device registers, so when a core reads or writes from/to DCC,
it only accesses its own DCC device. Since kernel code can run on
any core, every time the kernel wants to write to the console, it
might write to a different DCC.
In SMP mode, Trace32 only uses the DCC on core 0. In AMP mode, it
creates multiple windows, and each window shows the DCC output
only from that core's DCC. The result is that console output is
either lost or scattered across windows.
Selecting this option will enable code that serializes all console
input and output to core 0. The DCC driver will create input and
output FIFOs that all cores will use. Reads and writes from/to DCC
are handled by a workqueue that runs only core 0.
Bug: 169129589
Link: https://lore.kernel.org/lkml/1435344756-20901-1-git-send-email-timur@codeaurora.org/
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Acked-by: Adam Wallis <awallis@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
[eberman@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Elliot Berman <eberman@codeaurora.org>
Change-Id: Ie5e3f25ad74a0f07edb862b8258ad875ea308d30
Add earlycon support to the GENI based UART hardware controller for
the Qualcomm SOCs.
Reason: Bringup
Bug: 144074026
Test: Earlyconsole logs are fine and switch to kernel console is fine.
Change-Id: I34c9910cc8aa9586f842362fae62bc7127bcee5e
Signed-off-by: Mukesh Kumar Savaliya <msavaliy@codeaurora.org>
Signed-off-by: Akash Asthana <akashast@codeaurora.org>
CMA allocations will fail if 'pinned' pages are in a CMA area, since
we cannot migrate pinned pages. The _refcount of a struct page being
greater than _mapcount for that page can cause pinning for anonymous
pages. This is because try_to_unmap(), which (1) is called in the CMA
allocation path, and (2) decrements both _refcount and _mapcount for a
page, will stop unmapping a page from VMAs once the _mapcount for a
page reaches 0. This implies that after try_to_unmap() has finished
successfully for a page where _recount > _mapcount, that _refcount
will be greater than 0. Later in the CMA allocation path in
migrate_page_move_mapping(), we will have one more reference count
than intended for anonymous pages, meaning the allocation will fail
for that page.
One example of where _refcount can be greater than _mapcount for a
page we would not expect to be pinned is inside of copy_one_pte(),
which is called during a fork. For ptes for which pte_present(pte) ==
true, copy_one_pte() will increment the _refcount field followed by
the _mapcount field of a page. If the process doing copy_one_pte() is
context switched out after incrementing _refcount but before
incrementing _mapcount, then the page will be temporarily pinned.
So, inside of cma_alloc(), instead of giving up when
alloc_contig_range() returns -EBUSY after having scanned a whole
CMA-region bitmap, perform retries with sleeps to give the system an
opportunity to unpin any pinned pages.
Additionally, based off feedback by Minchan Kim, add the ability to
exit early if a fatal signal is pending (this is a delta from the
mailing-list version of this patch).
Bug: 168521646
Link: https://lore.kernel.org/lkml/1596682582-29139-2-git-send-email-cgoldswo@codeaurora.org/
Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Co-developed-by: Susheel Khiani <skhiani@codeaurora.org>
Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>
Co-developed-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Change-Id: I2f0c8388f9163e0decd631d9ae07bb6ad9ab79c8
The GIC irqchips can now use a HW resend when a retrigger is invoked by
check_irq_resend(). However, should the HW resend fail, check_irq_resend()
will still attempt to trigger a SW resend, which is still a bad idea for
the GICs.
Prevent this from happening by setting IRQD_HANDLE_ENFORCE_IRQCTX on all
GIC IRQs. Technically per-cpu IRQs do not need this, as their flow handlers
never set IRQS_PENDING, but this aligns all IRQs wrt context enforcement:
this also forces all GIC IRQ handling to happen in IRQ context (as defined
by in_irq()).
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200730170321.31228-3-valentin.schneider@arm.com
Bug: 140053385
(cherry picked from commit 1b57d91b96https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/gic-retrigger)
(resolved trivial merge conflict in drivers/irqchip/irq-gic.c)
Change-Id: I26d068ff58660627b4fd02f2d0483f81f0cd2094
Signed-off-by: Eric Biggers <ebiggers@google.com>
It is pretty easy to provide a retrigger callback for the ITS,
as it we already have the required support in terms of
irq_set_irqchip_state().
Note that this only works for device-generated LPIs, and not
the GICv4 doorbells, which should never have to be retriggered
anyway.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit 5f774f5e12https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/gic-retrigger)
Change-Id: I08b9f838e90aa56d04f3e65310e54e4d17e2ec9b
Signed-off-by: Eric Biggers <ebiggers@google.com>
While digging around IRQCHIP_EOI_IF_HANDLED and irq/resend.c, it has come
to my attention that the IRQ resend situation seems a bit precarious for
the GIC(s).
When marking an IRQ with IRQS_PENDING, handle_fasteoi_irq() will bail out
and issue an irq_eoi(). Should the IRQ in question be re-enabled,
check_irq_resend() will trigger a SW resend, which will go through the flow
handler again and issue *another* irq_eoi() on the *same* IRQ
activation. This is something the GIC spec clearly describes as a bad idea:
any EOI must match a previous ACK.
Implement irq_chip.irq_retrigger() for the GIC chips by setting the GIC
pending bit of the relevant IRQ. After being called by check_irq_resend(),
this will eventually trigger a *new* interrupt which we will handle as usual.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200730170321.31228-2-valentin.schneider@arm.com
Bug: 140053385
(cherry picked from commit 17f644e949https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/gic-retrigger)
(resolved trivial merge conflict in drivers/irqchip/irq-gic.c)
Change-Id: I1c5e81c7cf33155e81ba31615e5eaf3f578350c2
Signed-off-by: Eric Biggers <ebiggers@google.com>
On resending an interrupt, we only check the outermost irqchip for
a irq_retrigger callback. However, this callback could be implemented
at an inner level. Use irq_chip_retrigger_hierarchy() in this case.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit cd1752d34ehttps://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/gic-retrigger)
Change-Id: Ie102eb40db715b94c88c7a58d5aaade0a507b598
Signed-off-by: Eric Biggers <ebiggers@google.com>
Introduce a static key identifying Samsung's unique creation, allowing
to replace the indirect call to compute the base addresses with
a simple test on the static key.
Faster, cheaper, negative diffstat.
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit 8594c3b851https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/ipi-as-irq)
Change-Id: I380d7bd5dac454ec7f9407512589e252c5504113
Signed-off-by: Eric Biggers <ebiggers@google.com>
To introduce IPIs as standard interrupts to the Armada 370-XP
driver, let's allocate a completely separate irqdomain and
irqchip combo that lives parallel to the "standard" one.
This effectively should be modelled as a chained interrupt
controller, but the code is in such a state that it is
pretty hard to shoehorn, as it would require the rewrite
of the MSI layer as well.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit f02147dd02https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/ipi-as-irq)
Change-Id: I9034297efc5a513cb55cedc488e39331e5253010
Signed-off-by: Eric Biggers <ebiggers@google.com>
In order to switch the hip04 driver to provide standard interrupts
for IPIs, rework the way interrupts are allocated, making sure
the irqdomain covers the SGIs as well as the rest of the interrupt
range.
The driver is otherwise so old-school that it creates all interrupts
upfront (duh!), so there is hardly anything else to change, apart
from communicating the IPIs to the arch code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit a2df12c589https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/ipi-as-irq)
Change-Id: I70ed8100824880900ca941ebafd41a8f3e8c8b92
Signed-off-by: Eric Biggers <ebiggers@google.com>
In order to switch the bcm2836 driver to privide standard interrupts
for IPIs, it first needs to stop lying about the way things work.
The mailbox interrupt is actually a multiplexer, with enough
bits to store 32 pending interrupts per CPU. So let's turn it
into a chained irqchip.
Once this is done, we can instanciate the corresponding IPIs,
and pass them to the architecture code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit 0809ae7249https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/ipi-as-irq)
Change-Id: I4ed1db6cdf042e7e1ae381fee1049d9167256b48
Signed-off-by: Eric Biggers <ebiggers@google.com>
Change the way we deal with GICv3 SGIs by turning them into proper
IRQs, and calling into the arch code to register the interrupt range
instead of a callback.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit 64b499d8dfhttps://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/ipi-as-irq)
Conflicts:
drivers/irqchip/irq-gic-v3.c
(Resolved conflict with
"ANDROID: power: wakeup_reason: wake reason enhancements".)
Change-Id: I37db47982e6fe3a86504f4cbacf3ba1a8344db73
Signed-off-by: Eric Biggers <ebiggers@google.com>
In order to deal with IPIs as normal interrupts, let's add
a new way to register them with the architecture code.
set_smp_ipi_range() takes a range of interrupts, and allows
the arch code to request them as if the were normal interrupts.
A standard handler is then called by the core IRQ code to deal
with the IPI.
This means that we don't need to call irq_enter/irq_exit, and
that we don't need to deal with set_irq_regs either. So let's
move the dispatcher into its own function, and leave handle_IPI()
as a compatibility function.
On the sending side, let's make use of ipi_send_mask, which
already exists for this purpose.
One of the major difference is that we end up, in some cases
(such as when performing IRQ time accounting on the scheduler
IPI), end up with nested irq_enter()/irq_exit() pairs.
Other than the (relatively small) overhead, there should be
no consequences to it (these pairs are designed to nest
correctly, and the accounting shouldn't be off).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit 56afcd3dbdhttps://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/ipi-as-irq)
Change-Id: Ib7187b49af45f4b5a51d2daf7466a06dbbb72fb1
Signed-off-by: Eric Biggers <ebiggers@google.com>
In order to deal with IPIs as normal interrupts, let's add
a new way to register them with the architecture code.
set_smp_ipi_range() takes a range of interrupts, and allows
the arch code to request them as if the were normal interrupts.
A standard handler is then called by the core IRQ code to deal
with the IPI.
This means that we don't need to call irq_enter/irq_exit, and
that we don't need to deal with set_irq_regs either. So let's
move the dispatcher into its own function, and leave handle_IPI()
as a compatibility function.
On the sending side, let's make use of ipi_send_mask, which
already exists for this purpose.
One of the major difference is that we end up, in some cases
(such as when performing IRQ time accounting on the scheduler
IPI), end up with nested irq_enter()/irq_exit() pairs.
Other than the (relatively small) overhead, there should be
no consequences to it (these pairs are designed to nest
correctly, and the accounting shouldn't be off).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit d3afc7f129https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/ipi-as-irq)
Change-Id: I0e0d44bc0e50f9b9155f3f45c1a809531143654e
Signed-off-by: Eric Biggers <ebiggers@google.com>
A number of architectures implement IPI statistics directly,
duplicating the core kstat_irqs accounting. As we move IPIs to
being actual IRQs, we would end-up with a confusing display
in /proc/interrupts (where the IPIs would appear twice).
In order to solve this, allow interrupts to be flagged as
"hidden", which excludes them from /proc/interrupts.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit 83cfac95c0https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/ipi-as-irq)
Change-Id: I46da38d255406a158b5c03e6b02cab80b47180aa
Signed-off-by: Eric Biggers <ebiggers@google.com>
For irqchips using the fasteoi flow, IPIs are a bit special.
They need to be EOI'd early (before calling the handler), as
funny things may happen in the handler (they do not necessarily
behave like a normal interrupt).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 140053385
(cherry picked from commit c5e5ec033chttps://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/ipi-as-irq)
Change-Id: I2755205f7ddfedf60c5542fdf438d9bff168ee64
Signed-off-by: Eric Biggers <ebiggers@google.com>
Export the device_pm_callback_start/end tracepoint so it can be used
in loadable modules.
Bug: 158528747
Change-Id: Icfe12f496f25d3b73dbd70394dc85d197709671a
Signed-off-by: Changki Kim <changki.kim@samsung.com>
(cherry picked from commit d16111a8cf537426853300149b2ce65790fb650c)
Signed-off-by: Jonglin Lee <jonglin@google.com>
https://lkml.org/lkml/2019/2/12/701 removed
tcpm_update_sink_capabilities. However, Pixel started using this
at a later point in time. The client code is not in upstream
though.
Bug: 169695061
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Change-Id: Icb206c902187c90b38dd14924987725e9977e47c
RPMSG unifies various transports that provide IPC to a remote proc.
Some of these transports require some set of side band signalling in
order to meet the specifications of the protocol they implement.
The GLINK native transport supports the tty serial signals to start
communication with modems that expect to receive the DTR serial signal.
Extend the rpmsg core with an interface to send and receive sideband
signals for the transports that need it.
Bug: 161128971
Link: https://lore.kernel.org/lkml/1593182819-30747-2-git-send-email-deesin@codeaurora.org/
Change-Id: I54539d8ddce1bfaec9016c2bec9b5a1372601995
Signed-off-by: Chris Lew <clew@codeaurora.org>
(cherry picked from commit 6839fc80fe1f6564eb6a0fc0fd081d459ec6c61b)
Used to track the constant voltage phase of charging and implement
tier transition for multi-step charging.
Bug: 168244640
Signed-off-by: AleX Pelosi <apelosi@google.com>
Change-Id: I49d3033eec671156ffd113d8d0e3972d2cdad982
Found by sparse:
fs/incfs/format.c:416:21: warning: incorrect type in assignment (different base types)
fs/incfs/format.c:416:21: expected restricted __le32 [assigned] [usertype] fh_flags
fs/incfs/format.c:416:21: got int
fs/incfs/pseudo_files.c:925:25: warning: incorrect type in argument 4 (different base types)
fs/incfs/pseudo_files.c:925:25: expected unsigned long long [usertype] size
fs/incfs/pseudo_files.c:925:25: got restricted __le64 [addressable] [assigned] [usertype] size_attr_value
fs/incfs/pseudo_files.c:925:42: warning: incorrect type in argument 5 (different base types)
fs/incfs/pseudo_files.c:925:42: expected unsigned long long [usertype] offset
fs/incfs/pseudo_files.c:925:42: got restricted __le64 [usertype]
fs/incfs/pseudo_files.c:1111:24: warning: incorrect type in return expression (different base types)
fs/incfs/pseudo_files.c:1111:24: expected restricted __poll_t
fs/incfs/pseudo_files.c:1111:24: got int
Bug: 169258814
Fixes: Sparse errors introduced by 3f4938108a, 8334d69e65 and cb776f4576
Test: incfs_test passes, sparse shows no errors
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I48596e9521069fc77bf38c345a568529d61c77dc
Get the generic casefolding code in sync with the patches that are
queued in f2fs.git#dev for 5.10.
Equivalently, this reverts the patch
"ANDROID-fs-adjust-casefolding-support-to-match-android-mainline.patch"
from the android-mainline quilt series, with the following conflicts:
Conflicts:
fs/ext4/hash.c # due to "ANDROID: ext4: Handle casefolding with encryption"
fs/ext4/namei.c # due to "ANDROID: ext4: Handle casefolding with encryption"
fs/f2fs/dir.c # due to "ANDROID: f2fs: Handle casefolding with Encryption"
Bug: 161184936
Cc: Daniel Rosenberg <drosen@google.com>
Cc: Paul Lawrence <paullawrence@google.com>
Cc: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: I0ae169f0f5f413fb21e4be7a163213aef3fa6756
Signed-off-by: Eric Biggers <ebiggers@google.com>
Pull Kbuild fixes from Masahiro Yamada:
- ignore compiler stubs for PPC to fix builds
- fix the usage of --target mentioned in the LLVM document
* tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Documentation/llvm: Fix clang target examples
scripts/kallsyms: skip ppc compiler stub *.long_branch.* / *.plt_branch.*
Pull x86 fixes from Thomas Gleixner:
"Two fixes for the x86 interrupt code:
- Unbreak the magic 'search the timer interrupt' logic in IO/APIC
code which got wreckaged when the core interrupt code made the
state tracking logic stricter.
That caused the interrupt line to stay masked after switching from
IO/APIC to PIC delivery mode, which obviously prevents interrupts
from being delivered.
- Make run_on_irqstack_code() typesafe. The function argument is a
void pointer which is then cast to 'void (*fun)(void *).
This breaks Control Flow Integrity checking in clang. Use proper
helper functions for the three variants reuqired"
* tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Unbreak check_timer()
x86/irq: Make run_on_irqstack_cond() typesafe
Pull timer updates from Thomas Gleixner:
"A set of clocksource/clockevents updates:
- Reset the TI/DM timer before enabling it instead of doing it the
other way round.
- Initialize the reload value for the GX6605s timer correctly so the
hardware counter starts at 0 again after overrun.
- Make error return value negative in the h8300 timer init function"
* tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/timer-gx6605s: Fixup counter reload
clocksource/drivers/timer-ti-dm: Do reset before enable
clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
Pinned pages shouldn't be write-protected when fork() happens, because
follow up copy-on-write on these pages could cause the pinned pages to
be replaced by random newly allocated pages.
For huge PMDs, we split the huge pmd if pinning is detected. So that
future handling will be done by the PTE level (with our latest changes,
each of the small pages will be copied). We can achieve this by let
copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll
fallthrough in copy_pmd_range() and finally land the next
copy_pte_range() call.
Huge PUDs will be even more special - so far it does not support
anonymous pages. But it can actually be done the same as the huge PMDs
even if the split huge PUDs means to erase the PUD entries. It'll
guarantee the follow up fault ins will remap the same pages in either
parent/child later.
This might not be the most efficient way, but it should be easy and
clean enough. It should be fine, since we're tackling with a very rare
case just to make sure userspaces that pinned some thps will still work
even without MADV_DONTFORK and after they fork()ed.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows copy_pte_range() to do early cow if the pages were pinned on
the source mm.
Currently we don't have an accurate way to know whether a page is pinned
or not. The only thing we have is page_maybe_dma_pinned(). However
that's good enough for now. Especially, with the newly added
mm->has_pinned flag to make sure we won't affect processes that never
pinned any pages.
It would be easier if we can do GFP_KERNEL allocation within
copy_one_pte(). Unluckily, we can't because we're with the page table
locks held for both the parent and child processes. So the page
allocation needs to be done outside copy_one_pte().
Some trick is there in copy_present_pte(), majorly the wrprotect trick
to block concurrent fast-gup. Comments in the function should explain
better in place.
Oleg Nesterov reported a (probably harmless) bug during review that we
didn't reset entry.val properly in copy_pte_range() so that potentially
there's chance to call add_swap_count_continuation() multiple times on
the same swp entry. However that should be harmless since even if it
happens, the same function (add_swap_count_continuation()) will return
directly noticing that there're enough space for the swp counter. So
instead of a standalone stable patch, it is touched up in this patch
directly.
Link: https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This prepares for the future work to trigger early cow on pinned pages
during fork().
No functional change intended.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>