read_user_stack_slow is called with interrupts soft disabled and it copies contents
from the page which we find mapped to a specific address. To convert
userspace address to pfn, the kernel now uses lockless page table walk.
The kernel needs to make sure the pfn value read remains stable and is not released
and reused for another process while the contents are read from the page. This
can only be achieved by holding a page reference.
One of the first approaches I tried was to check the pte value after the kernel
copies the contents from the page. But as shown below we can still get it wrong
CPU0 CPU1
pte = READ_ONCE(*ptep);
pte_clear(pte);
put_page(page);
page = alloc_page();
memcpy(page_address(page), "secret password", nr);
memcpy(buf, kaddr + offset, nb);
put_page(page);
handle_mm_fault()
page = alloc_page();
set_pte(pte, page);
if (pte_val(pte) != pte_val(*ptep))
Hence switch to __get_user_pages_fast.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-8-aneesh.kumar@linux.ibm.com
A lockless page table walk should be safe against parallel THP collapse, THP
split and madvise(MADV_DONTNEED)/parallel fault. This patch makes sure kernel
won't reload the pteval when checking for different conditions. The patch also added
a check for pte_present to make sure the kernel is indeed operating
on a PTE and not a pointer to level 0 table page.
The pfn value we find here can be different from the actual pfn on which
machine check happened. This can happen if we raced with a parallel update
of the page table. In such a scenario we end up isolating a wrong pfn. But that
doesn't have any other side effect.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-7-aneesh.kumar@linux.ibm.com
This makes the pte_present check stricter by checking for additional _PAGE_PTE
bit. A level 1 pte pointer (THP pte) can be switched to a pointer to level 0 pte
page table page by following two operations.
1) THP split.
2) madvise(MADV_DONTNEED) in parallel to page fault.
A lockless page table walk need to make sure we can handle such changes
gracefully.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-4-aneesh.kumar@linux.ibm.com
If multiple threads in userspace keep changing the protection keys
mapping a range, there can be a scenario where kernel takes a key fault
but the pkey value found in the siginfo struct is a permissive one.
This can confuse the userspace as shown in the below test case.
/* use this to control the number of test iterations */
static void pkeyreg_set(int pkey, unsigned long rights)
{
unsigned long reg, shift;
shift = (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY;
asm volatile("mfspr %0, 0xd" : "=r"(reg));
reg &= ~(((unsigned long) PKEY_BITS_MASK) << shift);
reg |= (rights & PKEY_BITS_MASK) << shift;
asm volatile("mtspr 0xd, %0" : : "r"(reg));
}
static unsigned long pkeyreg_get(void)
{
unsigned long reg;
asm volatile("mfspr %0, 0xd" : "=r"(reg));
return reg;
}
static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
{
return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
}
static int sys_pkey_alloc(unsigned long flags, unsigned long access_rights)
{
return syscall(SYS_pkey_alloc, flags, access_rights);
}
static int sys_pkey_free(int pkey)
{
return syscall(SYS_pkey_free, pkey);
}
static int faulting_pkey;
static int permissive_pkey;
static pthread_barrier_t pkey_set_barrier;
static pthread_barrier_t mprotect_barrier;
static void pkey_handle_fault(int signum, siginfo_t *sinfo, void *ctx)
{
unsigned long pkeyreg;
/* FIXME: printf is not signal-safe but for the current purpose,
it gets the job done. */
printf("pkey: exp = %d, got = %d\n", faulting_pkey, sinfo->si_pkey);
fflush(stdout);
assert(sinfo->si_code == SEGV_PKUERR);
assert(sinfo->si_pkey == faulting_pkey);
/* clear pkey permissions to let the faulting instruction continue */
pkeyreg_set(faulting_pkey, 0x0);
}
static void *do_mprotect_fault(void *p)
{
unsigned long rights, pkeyreg, pgsize;
unsigned int i;
void *region;
int pkey;
srand(time(NULL));
pgsize = sysconf(_SC_PAGESIZE);
rights = PKEY_DISABLE_WRITE;
region = p;
/* allocate key, no permissions */
assert((pkey = sys_pkey_alloc(0, PKEY_DISABLE_ACCESS)) > 0);
pkeyreg_set(4, 0x0);
/* cache the pkey here as the faulting pkey for future reference
in the signal handler */
faulting_pkey = pkey;
printf("%s: faulting pkey = %d\n", __func__, faulting_pkey);
/* try to allocate, mprotect and free pkeys repeatedly */
for (i = 0; i < NUM_ITERATIONS; i++) {
/* sync up with the other thread here */
pthread_barrier_wait(&pkey_set_barrier);
/* make sure that the pkey used by the non-faulting thread
is made permissive for this thread's context too so that
no faults are triggered because it still might have been
set to a restrictive value */
// pkeyreg_set(permissive_pkey, 0x0);
/* sync up with the other thread here */
pthread_barrier_wait(&mprotect_barrier);
/* perform mprotect */
assert(!sys_pkey_mprotect(region, pgsize, PROT_READ | PROT_WRITE, pkey));
/* choose a random byte from the protected region and
attempt to write to it, this will generate a fault */
*((char *) region + (rand() % pgsize)) = rand();
/* restore pkey permissions as the signal handler may have
cleared the bit out for the sake of continuing */
pkeyreg_set(pkey, PKEY_DISABLE_WRITE);
}
/* free pkey */
sys_pkey_free(pkey);
return NULL;
}
static void *do_mprotect_nofault(void *p)
{
unsigned long pgsize;
unsigned int i, j;
void *region;
int pkey;
pgsize = sysconf(_SC_PAGESIZE);
region = p;
/* try to allocate, mprotect and free pkeys repeatedly */
for (i = 0; i < NUM_ITERATIONS; i++) {
/* allocate pkey, all permissions */
assert((pkey = sys_pkey_alloc(0, 0)) > 0);
permissive_pkey = pkey;
/* sync up with the other thread here */
pthread_barrier_wait(&pkey_set_barrier);
pthread_barrier_wait(&mprotect_barrier);
/* perform mprotect on the common page, no faults will
be triggered as this is most permissive */
assert(!sys_pkey_mprotect(region, pgsize, PROT_READ | PROT_WRITE, pkey));
/* free pkey */
assert(!sys_pkey_free(pkey));
}
return NULL;
}
int main(int argc, char **argv)
{
pthread_t fault_thread, nofault_thread;
unsigned long pgsize;
struct sigaction act;
pthread_attr_t attr;
cpu_set_t fault_cpuset, nofault_cpuset;
unsigned int i;
void *region;
/* allocate memory region to protect */
pgsize = sysconf(_SC_PAGESIZE);
assert(region = memalign(pgsize, pgsize));
CPU_ZERO(&fault_cpuset);
CPU_SET(0, &fault_cpuset);
CPU_ZERO(&nofault_cpuset);
CPU_SET(8, &nofault_cpuset);
assert(!pthread_attr_init(&attr));
/* setup sigsegv signal handler */
act.sa_handler = 0;
act.sa_sigaction = pkey_handle_fault;
assert(!sigprocmask(SIG_SETMASK, 0, &act.sa_mask));
act.sa_flags = SA_SIGINFO;
act.sa_restorer = 0;
assert(!sigaction(SIGSEGV, &act, NULL));
/* setup barrier for the two threads */
pthread_barrier_init(&pkey_set_barrier, NULL, 2);
pthread_barrier_init(&mprotect_barrier, NULL, 2);
/* setup and start threads */
assert(!pthread_create(&fault_thread, &attr, &do_mprotect_fault, region));
assert(!pthread_setaffinity_np(fault_thread, sizeof(cpu_set_t), &fault_cpuset));
assert(!pthread_create(&nofault_thread, &attr, &do_mprotect_nofault, region));
assert(!pthread_setaffinity_np(nofault_thread, sizeof(cpu_set_t), &nofault_cpuset));
/* cleanup */
assert(!pthread_attr_destroy(&attr));
assert(!pthread_join(fault_thread, NULL));
assert(!pthread_join(nofault_thread, NULL));
assert(!pthread_barrier_destroy(&pkey_set_barrier));
assert(!pthread_barrier_destroy(&mprotect_barrier));
free(region);
puts("PASS");
return EXIT_SUCCESS;
}
The above test can result the below failure without this patch.
pkey: exp = 3, got = 3
pkey: exp = 3, got = 4
a.out: pkey-siginfo-race.c💯 pkey_handle_fault: Assertion `sinfo->si_pkey == faulting_pkey' failed.
Aborted
Check for vma access before considering this a key fault. If vma pkey allow
access retry the acess again.
Test case is written by Sandipan Das <sandipan@linux.ibm.com> hence added SOB
from him.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-3-aneesh.kumar@linux.ibm.com
This brings in a fix from the kvm-ppc tree that was merged to mainline
after rc2, and so isn't in the base of our topic branch. We'd like it
in the topic branch because it interacts with patches we plan to carry
in this branch.
When a feature register field is omitted from the description of the
register, the corresponding bits are treated as STRICT RES0, including
for KVM guests. This is subtly different to declaring the field as
HIDDEN/STRICT/EXACT/0, so update the comment to call this out.
Signed-off-by: Will Deacon <will@kernel.org>
Sort configuration options for vendor-specific errata by vendor, to
increase uniformity.
Move ARM64_WORKAROUND_REPEAT_TLBI up, as it is also selected by
ARM64_ERRATUM_1286807.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Will Deacon <will@kernel.org>
Commit
22090f84bc ("efi/libstub: unify EFI call wrappers for non-x86")
refactored the macros that are used to provide wrappers for mixed-mode
calls on x86, allowing us to boot a 64-bit kernel on 32-bit firmware.
Unfortunately, this broke mixed mode boot due to the fact that
efi_is_native() is not a macro on x86.
All of these macros should go together, so rather than testing each one
to see if it is defined, condition the generic macro definitions on a
new ARCH_HAS_EFISTUB_WRAPPERS, and remove the wrapper definitions on x86
as well if CONFIG_EFI_MIXED is not enabled.
Fixes: 22090f84bc ("efi/libstub: unify EFI call wrappers for non-x86")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200504150248.62482-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Fixes the following warnings for both g5 and g6 SoCs:
arch/arm/boot/dts/aspeed-g5.dtsi:376.19-381.8: Warning
(unit_address_vs_reg): /ahb/apb/lpc@1e789000/lpc-bmc@0/kcs1@0: node
has a unit name, but no reg property
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
The AST2600 XDMA engine requires the PCI-E root control reset be cleared
as well, so add a phandle to that syscon reset.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Add a node for the XDMA engine with all the necessary information.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Add a node for the XDMA engine with all the necessary information.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
The Yosemite V2 is a facebook multi-node server
platform that host four OCP server. The BMC
in the Yosemite V2 platform based on AST2500 SoC.
This patch adds linux device tree entry related to
Yosemite V2 specific devices connected to BMC SoC.
Signed-off-by: Manikandan Elumalai <manikandan.hcl.ers.epl@gmail.com>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Vijay Khemka <vkhemka@fb.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Nicole is an OpenPower machine with an Aspeed 2500 BMC SoC manufactured
by YADRO.
Signed-off-by: Alexander Filippov <a.filippov@yadro.com>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Patrick Williams <patrick@stwcx.xyz>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Set the bus id for each mux channel to avoid switching channels
multiple times
Signed-off-by: Ben Pai <Ben_Pai@wistron.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Connect the BMP280 and DPS310 to the hwmon subsystem with iio-hwmon
nodes.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
The BMC uses reserves the top 16MB of memory for the host to use for VGA
or PCIe communication.
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Name the GPIOs to help userspace work with them. The names describe the
functionality the lines provide, not the net or ball name. This makes it
easier to share userspace code across different systems and makes the
use of the lines more obvious.
Signed-off-by: Andrew Geissler <geisonator@yahoo.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Name the GPIOs to help userspace work with them. The names describe the
functionality the lines provide, not the net or ball name. This makes it
easier to share userspace code across different systems and makes the
use of the lines more obvious.
Signed-off-by: Andrew Geissler <geissonator@yahoo.com>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Name the GPIOs to help userspace work with them. The names describe the
functionality the lines provide, not the net or ball name. This makes it
easier to share userspace code across different systems and makes the
use of the lines more obvious.
Signed-off-by: Andrew Geissler <geissonator@yahoo.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Name the GPIOs to help userspace work with them. The names describe the
functionality the lines provide, not the net or ball name. This makes it
easier to share userspace code across different systems and makes the
use of the lines more obvious.
Signed-off-by: Andrew Geissler <geissonator@yahoo.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
According to ASPEED, FTTMR010 is not intended to be used in the AST2600.
The arch timer should be used, but Linux doesn't enable high-res timers
without being assured that the arch timer is always on, so set that
property in the devicetree.
The FTTMR010 device is described by set to disabled.
This fixes highres timer support for AST2600.
Fixes: 2ca5646b5c ("ARM: dts: aspeed: Add AST2600 and EVB")
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
GPIO Q7 is no longer used for air/water. It is repurposed on Tacoma to
indicate internal FSI (low) vs cabled (high).
GPIO B0 controls the muxing of FSI to the cable (low) or internal pins
(high).
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Change the name of power, fault and rear-id.
Remove the two leds.
Signed-off-by: Ben Pai <Ben_Pai@wistron.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
This adds the description of the Power CPUs that are attached to the
BMC.
Without this userspace will see the '/dev/scom66' style layout.
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
All 16 pins of the PCA9552 at 7-bit address 0x61 should be set as type
GPIO.
Signed-off-by: Matthew Barth <msbarth@linux.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
The second VUART is used to expose multiplexed, non-hypervisor consoles.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Add a node to describe the video engine on AST2400.
These changes were copied from aspeed-g5.dtsi
Signed-off-by: Alexander Filippov <a.filippov@yadro.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Enabling emmc without enabling its controller doesn't do any good.
Enable its controller as well to make it work.
Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: Joel Stanley <joel@jms.id.au>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Added GPIO line names for all gpio used in tiogapass platform,
these line names will be used by libgpiod to control GPIOs
Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Adding IPMB devices for facebook tiogapass platform.
Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>