From 1d50e5d0c5052446cb85a3bf11fe8ba4e8d770ca Mon Sep 17 00:00:00 2001 From: Bhupesh Sharma Date: Thu, 14 May 2020 00:22:36 +0530 Subject: [PATCH 01/40] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo Right now user-space tools like 'makedumpfile' and 'crash' need to rely on a best-guess method of determining value of 'MAX_PHYSMEM_BITS' supported by underlying kernel. This value is used in user-space code to calculate the bit-space required to store a section for SPARESMEM (similar to the existing calculation method used in the kernel implementation): #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) Now, regressions have been reported in user-space utilities like 'makedumpfile' and 'crash' on arm64, with the recently added kernel support for 52-bit physical address space, as there is no clear method of determining this value in user-space (other than reading kernel CONFIG flags). As per suggestion from makedumpfile maintainer (Kazu), it makes more sense to append 'MAX_PHYSMEM_BITS' to vmcoreinfo in the core code itself rather than in arch-specific code, so that the user-space code for other archs can also benefit from this addition to the vmcoreinfo and use it as a standard way of determining 'SECTIONS_SHIFT' value in user-land. A reference 'makedumpfile' implementation which reads the 'MAX_PHYSMEM_BITS' value from vmcoreinfo in a arch-independent fashion is available here: While at it also update vmcoreinfo documentation for 'MAX_PHYSMEM_BITS' variable being added to vmcoreinfo. 'MAX_PHYSMEM_BITS' defines the maximum supported physical address space memory. Signed-off-by: Bhupesh Sharma Tested-by: John Donnelly Acked-by: Dave Young Cc: Boris Petkov Cc: Ingo Molnar Cc: Thomas Gleixner Cc: James Morse Cc: Mark Rutland Cc: Will Deacon Cc: Michael Ellerman Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: Dave Anderson Cc: Kazuhito Hagio Cc: x86@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: kexec@lists.infradead.org Link: https://lore.kernel.org/r/1589395957-24628-2-git-send-email-bhsharma@redhat.com Signed-off-by: Catalin Marinas --- Documentation/admin-guide/kdump/vmcoreinfo.rst | 5 +++++ kernel/crash_core.c | 1 + 2 files changed, 6 insertions(+) diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst index e4ee8b2db604..2a632020f809 100644 --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst @@ -93,6 +93,11 @@ It exists in the sparse memory mapping model, and it is also somewhat similar to the mem_map variable, both of them are used to translate an address. +MAX_PHYSMEM_BITS +---------------- + +Defines the maximum supported physical address space memory. + page ---- diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 9f1557b98468..18175687133a 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -413,6 +413,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map); + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); #endif VMCOREINFO_STRUCT_SIZE(page); VMCOREINFO_STRUCT_SIZE(pglist_data); From bbdbc11804ff0b4130e7550113b452e96a74d16e Mon Sep 17 00:00:00 2001 From: Bhupesh Sharma Date: Thu, 14 May 2020 00:22:37 +0530 Subject: [PATCH 02/40] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo TCR_EL1.TxSZ, which controls the VA space size, is configured by a single kernel image to support either 48-bit or 52-bit VA space. If the ARMv8.2-LVA optional feature is present and we are running with a 64KB page size, then it is possible to use 52-bits of address space for both userspace and kernel addresses. However, any kernel binary that supports 52-bit must also be able to fall back to 48-bit at early boot time if the hardware feature is not present. Since TCR_EL1.T1SZ indicates the size of the memory region addressed by TTBR1_EL1, export the same in vmcoreinfo. User-space utilities like makedumpfile and crash-utility need to read this value from vmcoreinfo for determining if a virtual address lies in the linear map range. While at it also add documentation for TCR_EL1.T1SZ variable being added to vmcoreinfo. It indicates the size offset of the memory region addressed by TTBR1_EL1. Signed-off-by: Bhupesh Sharma Tested-by: John Donnelly Tested-by: Kamlakant Patel Tested-by: Amit Daniel Kachhap Reviewed-by: James Morse Reviewed-by: Amit Daniel Kachhap Cc: James Morse Cc: Mark Rutland Cc: Will Deacon Cc: Steve Capper Cc: Ard Biesheuvel Cc: Dave Anderson Cc: Kazuhito Hagio Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: kexec@lists.infradead.org Link: https://lore.kernel.org/r/1589395957-24628-3-git-send-email-bhsharma@redhat.com [catalin.marinas@arm.com: removed vabits_actual from the commit log] Signed-off-by: Catalin Marinas --- Documentation/admin-guide/kdump/vmcoreinfo.rst | 11 +++++++++++ arch/arm64/include/asm/pgtable-hwdef.h | 1 + arch/arm64/kernel/crash_core.c | 10 ++++++++++ 3 files changed, 22 insertions(+) diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst index 2a632020f809..2baad0bfb09d 100644 --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst @@ -404,6 +404,17 @@ KERNELPACMASK The mask to extract the Pointer Authentication Code from a kernel virtual address. +TCR_EL1.T1SZ +------------ + +Indicates the size offset of the memory region addressed by TTBR1_EL1. +The region size is 2^(64-T1SZ) bytes. + +TTBR1_EL1 is the table base address register specified by ARMv8-A +architecture which is used to lookup the page-tables for the Virtual +addresses in the higher VA range (refer to ARMv8 ARM document for +more details). + arm === diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 9c91a8f93a0e..9a757d724974 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -216,6 +216,7 @@ #define TCR_TxSZ(x) (TCR_T0SZ(x) | TCR_T1SZ(x)) #define TCR_TxSZ_WIDTH 6 #define TCR_T0SZ_MASK (((UL(1) << TCR_TxSZ_WIDTH) - 1) << TCR_T0SZ_OFFSET) +#define TCR_T1SZ_MASK (((UL(1) << TCR_TxSZ_WIDTH) - 1) << TCR_T1SZ_OFFSET) #define TCR_EPD0_SHIFT 7 #define TCR_EPD0_MASK (UL(1) << TCR_EPD0_SHIFT) diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c index 1f646b07e3e9..314391a156ee 100644 --- a/arch/arm64/kernel/crash_core.c +++ b/arch/arm64/kernel/crash_core.c @@ -7,6 +7,14 @@ #include #include #include +#include + +static inline u64 get_tcr_el1_t1sz(void); + +static inline u64 get_tcr_el1_t1sz(void) +{ + return (read_sysreg(tcr_el1) & TCR_T1SZ_MASK) >> TCR_T1SZ_OFFSET; +} void arch_crash_save_vmcoreinfo(void) { @@ -16,6 +24,8 @@ void arch_crash_save_vmcoreinfo(void) kimage_voffset); vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n", PHYS_OFFSET); + vmcoreinfo_append_str("NUMBER(TCR_EL1_T1SZ)=0x%llx\n", + get_tcr_el1_t1sz()); vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset()); vmcoreinfo_append_str("NUMBER(KERNELPACMASK)=0x%llx\n", system_supports_address_auth() ? From bc67f10ad1d76a30e01c539c0043417fa34648d7 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Fri, 3 Jul 2020 09:21:34 +0530 Subject: [PATCH 03/40] arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR0 register Enable EVC, FGT, EXS features bits in ID_AA64MMFR0 register as per ARM DDI 0487F.a specification. Suggested-by: Will Deacon Signed-off-by: Anshuman Khandual Reviewed-by: Suzuki K Poulose Cc: Will Deacon Cc: Mark Rutland Cc: Suzuki K Poulose Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1593748297-1965-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/sysreg.h | 3 +++ arch/arm64/kernel/cpufeature.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 463175f80341..2e36dfde2570 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -706,6 +706,9 @@ #define ID_AA64ZFR0_SVEVER_SVE2 0x1 /* id_aa64mmfr0 */ +#define ID_AA64MMFR0_ECV_SHIFT 60 +#define ID_AA64MMFR0_FGT_SHIFT 56 +#define ID_AA64MMFR0_EXS_SHIFT 44 #define ID_AA64MMFR0_TGRAN4_2_SHIFT 40 #define ID_AA64MMFR0_TGRAN64_2_SHIFT 36 #define ID_AA64MMFR0_TGRAN16_2_SHIFT 32 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9f63053a63a9..7a84f5f31527 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -269,6 +269,9 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = { }; static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = { + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_ECV_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_FGT_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_EXS_SHIFT, 4, 0), /* * Page size not being supported at Stage-2 is not fatal. You * just give up KVM if PAGE_SIZE isn't supported there. Go fix From 853772ba8023c25b1caae56b6426ca76dae1eaff Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Fri, 3 Jul 2020 09:21:35 +0530 Subject: [PATCH 04/40] arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR1 register Enable ETS, TWED, XNX and SPECSEI features bits in ID_AA64MMFR1 register as per ARM DDI 0487F.a specification. Suggested-by: Will Deacon Signed-off-by: Anshuman Khandual Reviewed-by: Suzuki K Poulose Cc: Will Deacon Cc: Mark Rutland Cc: Suzuki K Poulose Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1593748297-1965-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/sysreg.h | 4 ++++ arch/arm64/kernel/cpufeature.c | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 2e36dfde2570..889fa7729719 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -737,6 +737,10 @@ #endif /* id_aa64mmfr1 */ +#define ID_AA64MMFR1_ETS_SHIFT 36 +#define ID_AA64MMFR1_TWED_SHIFT 32 +#define ID_AA64MMFR1_XNX_SHIFT 28 +#define ID_AA64MMFR1_SPECSEI_SHIFT 24 #define ID_AA64MMFR1_PAN_SHIFT 20 #define ID_AA64MMFR1_LOR_SHIFT 16 #define ID_AA64MMFR1_HPD_SHIFT 12 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 7a84f5f31527..764793c4a188 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -315,6 +315,10 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = { }; static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = { + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_ETS_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_TWED_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_XNX_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_HIGHER_SAFE, ID_AA64MMFR1_SPECSEI_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_PAN_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_LOR_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_HPD_SHIFT, 4, 0), From 356fdfbe8761da55c4100bd543259f349fc1ca3a Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Fri, 3 Jul 2020 09:21:36 +0530 Subject: [PATCH 05/40] arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR2 register Enable EVT, BBM, TTL, IDS, ST, NV and CCIDX features bits in ID_AA64MMFR2 register as per ARM DDI 0487F.a specification. Suggested-by: Will Deacon Signed-off-by: Anshuman Khandual Reviewed-by: Suzuki K Poulose Cc: Will Deacon Cc: Mark Rutland Cc: Suzuki K Poulose Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1593748297-1965-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/sysreg.h | 7 +++++++ arch/arm64/kernel/cpufeature.c | 7 +++++++ 2 files changed, 14 insertions(+) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 889fa7729719..9ee324936ea2 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -753,8 +753,15 @@ /* id_aa64mmfr2 */ #define ID_AA64MMFR2_E0PD_SHIFT 60 +#define ID_AA64MMFR2_EVT_SHIFT 56 +#define ID_AA64MMFR2_BBM_SHIFT 52 +#define ID_AA64MMFR2_TTL_SHIFT 48 #define ID_AA64MMFR2_FWB_SHIFT 40 +#define ID_AA64MMFR2_IDS_SHIFT 36 #define ID_AA64MMFR2_AT_SHIFT 32 +#define ID_AA64MMFR2_ST_SHIFT 28 +#define ID_AA64MMFR2_NV_SHIFT 24 +#define ID_AA64MMFR2_CCIDX_SHIFT 20 #define ID_AA64MMFR2_LVA_SHIFT 16 #define ID_AA64MMFR2_IESB_SHIFT 12 #define ID_AA64MMFR2_LSM_SHIFT 8 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 764793c4a188..93797d9bb931 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -330,8 +330,15 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = { static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_E0PD_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_EVT_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_BBM_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_TTL_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_FWB_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_IDS_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_AT_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_ST_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_NV_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_CCIDX_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LVA_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_IESB_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LSM_SHIFT, 4, 0), From 8d3154afc10dd474265b62752cd169f66f40ae0d Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Fri, 3 Jul 2020 09:21:37 +0530 Subject: [PATCH 06/40] arm64/cpufeature: Replace all open bits shift encodings with macros There are many open bits shift encodings for various CPU ID registers that are scattered across cpufeature. This replaces them with register specific sensible macro definitions. This should not have any functional change. Signed-off-by: Anshuman Khandual Reviewed-by: Suzuki K Poulose Cc: Will Deacon Cc: Marc Zyngier Cc: Mark Rutland Cc: James Morse Cc: Suzuki K Poulose Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1593748297-1965-5-git-send-email-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/sysreg.h | 28 +++++++++++++++++ arch/arm64/kernel/cpufeature.c | 53 +++++++++++++++++---------------- 2 files changed, 55 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 9ee324936ea2..b74c727c3bcd 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -769,6 +769,7 @@ #define ID_AA64MMFR2_CNP_SHIFT 0 /* id_aa64dfr0 */ +#define ID_AA64DFR0_DOUBLELOCK_SHIFT 36 #define ID_AA64DFR0_PMSVER_SHIFT 32 #define ID_AA64DFR0_CTX_CMPS_SHIFT 28 #define ID_AA64DFR0_WRPS_SHIFT 20 @@ -821,18 +822,40 @@ #define ID_ISAR6_DP_SHIFT 4 #define ID_ISAR6_JSCVT_SHIFT 0 +#define ID_MMFR0_INNERSHR_SHIFT 28 +#define ID_MMFR0_FCSE_SHIFT 24 +#define ID_MMFR0_AUXREG_SHIFT 20 +#define ID_MMFR0_TCM_SHIFT 16 +#define ID_MMFR0_SHARELVL_SHIFT 12 +#define ID_MMFR0_OUTERSHR_SHIFT 8 +#define ID_MMFR0_PMSA_SHIFT 4 +#define ID_MMFR0_VMSA_SHIFT 0 + #define ID_MMFR4_EVT_SHIFT 28 #define ID_MMFR4_CCIDX_SHIFT 24 #define ID_MMFR4_LSM_SHIFT 20 #define ID_MMFR4_HPDS_SHIFT 16 #define ID_MMFR4_CNP_SHIFT 12 #define ID_MMFR4_XNX_SHIFT 8 +#define ID_MMFR4_AC2_SHIFT 4 #define ID_MMFR4_SPECSEI_SHIFT 0 #define ID_MMFR5_ETS_SHIFT 0 #define ID_PFR0_DIT_SHIFT 24 #define ID_PFR0_CSV2_SHIFT 16 +#define ID_PFR0_STATE3_SHIFT 12 +#define ID_PFR0_STATE2_SHIFT 8 +#define ID_PFR0_STATE1_SHIFT 4 +#define ID_PFR0_STATE0_SHIFT 0 + +#define ID_DFR0_PERFMON_SHIFT 24 +#define ID_DFR0_MPROFDBG_SHIFT 20 +#define ID_DFR0_MMAPTRC_SHIFT 16 +#define ID_DFR0_COPTRC_SHIFT 12 +#define ID_DFR0_MMAPDBG_SHIFT 8 +#define ID_DFR0_COPSDBG_SHIFT 4 +#define ID_DFR0_COPDBG_SHIFT 0 #define ID_PFR2_SSBS_SHIFT 4 #define ID_PFR2_CSV3_SHIFT 0 @@ -875,6 +898,11 @@ #define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN64_SUPPORTED #endif +#define MVFR2_FPMISC_SHIFT 4 +#define MVFR2_SIMDMISC_SHIFT 0 + +#define DCZID_DZP_SHIFT 4 +#define DCZID_BS_SHIFT 0 /* * The ZCR_ELx_LEN_* definitions intentionally include bits [8:4] which diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 93797d9bb931..19146bd338b4 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -359,7 +359,7 @@ static const struct arm64_ftr_bits ftr_ctr[] = { * make use of *minLine. * If we have differing I-cache policies, report it as the weakest - VIPT. */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_EXACT, 14, 2, ICACHE_POLICY_VIPT), /* L1Ip */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_EXACT, CTR_L1IP_SHIFT, 2, ICACHE_POLICY_VIPT), /* L1Ip */ ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_IMINLINE_SHIFT, 4, 0), ARM64_FTR_END, }; @@ -370,19 +370,19 @@ struct arm64_ftr_reg arm64_ftr_reg_ctrel0 = { }; static const struct arm64_ftr_bits ftr_id_mmfr0[] = { - S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 28, 4, 0xf), /* InnerShr */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 24, 4, 0), /* FCSE */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, 20, 4, 0), /* AuxReg */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 0), /* TCM */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 0), /* ShareLvl */ - S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0xf), /* OuterShr */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0), /* PMSA */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0), /* VMSA */ + S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR0_INNERSHR_SHIFT, 4, 0xf), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR0_FCSE_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_MMFR0_AUXREG_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR0_TCM_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR0_SHARELVL_SHIFT, 4, 0), + S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR0_OUTERSHR_SHIFT, 4, 0xf), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR0_PMSA_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR0_VMSA_SHIFT, 4, 0), ARM64_FTR_END, }; static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = { - S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 36, 4, 0), + S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_DOUBLELOCK_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_PMSVER_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_CTX_CMPS_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_WRPS_SHIFT, 4, 0), @@ -398,14 +398,14 @@ static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = { }; static const struct arm64_ftr_bits ftr_mvfr2[] = { - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0), /* FPMisc */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0), /* SIMDMisc */ + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, MVFR2_FPMISC_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, MVFR2_SIMDMISC_SHIFT, 4, 0), ARM64_FTR_END, }; static const struct arm64_ftr_bits ftr_dczid[] = { - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 4, 1, 1), /* DZP */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0), /* BS */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, DCZID_DZP_SHIFT, 1, 1), + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, DCZID_BS_SHIFT, 4, 0), ARM64_FTR_END, }; @@ -437,7 +437,8 @@ static const struct arm64_ftr_bits ftr_id_mmfr4[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR4_HPDS_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR4_CNP_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR4_XNX_SHIFT, 4, 0), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0), /* ac2 */ + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_MMFR4_AC2_SHIFT, 4, 0), + /* * SpecSEI = 1 indicates that the PE might generate an SError on an * external abort on speculative read. It is safe to assume that an @@ -479,10 +480,10 @@ static const struct arm64_ftr_bits ftr_id_isar6[] = { static const struct arm64_ftr_bits ftr_id_pfr0[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_DIT_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_PFR0_CSV2_SHIFT, 4, 0), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 0), /* State3 */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0), /* State2 */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0), /* State1 */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0), /* State0 */ + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE3_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE2_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE1_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE0_SHIFT, 4, 0), ARM64_FTR_END, }; @@ -506,13 +507,13 @@ static const struct arm64_ftr_bits ftr_id_pfr2[] = { static const struct arm64_ftr_bits ftr_id_dfr0[] = { /* [31:28] TraceFilt */ - S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 24, 4, 0xf), /* PerfMon */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 20, 4, 0), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 0), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 0), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0), + S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_PERFMON_SHIFT, 4, 0xf), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_MPROFDBG_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_MMAPTRC_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_COPTRC_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_MMAPDBG_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_COPSDBG_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR0_COPDBG_SHIFT, 4, 0), ARM64_FTR_END, }; From c6c83d757a13a5df51428a6fe133c9193810507b Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Tue, 7 Jul 2020 19:53:13 +0530 Subject: [PATCH 07/40] arm64/cpufeature: Validate feature bits spacing in arm64_ftr_regs[] arm64_feature_bits for a register in arm64_ftr_regs[] are in a descending order as per their shift values. Validate that these features bits are defined correctly and do not overlap with each other. This check protects against any inadvertent erroneous changes to the register definitions. Signed-off-by: Anshuman Khandual Reviewed-by: Suzuki K Poulose Cc: Will Deacon Cc: Suzuki K Poulose Cc: Mark Brown Cc: Mark Rutland Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1594131793-9498-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas --- arch/arm64/kernel/cpufeature.c | 47 +++++++++++++++++++++++++++++++--- 1 file changed, 44 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 19146bd338b4..d9b51cb9cb8c 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -712,11 +712,52 @@ static s64 arm64_ftr_safe_value(const struct arm64_ftr_bits *ftrp, s64 new, static void __init sort_ftr_regs(void) { - int i; + unsigned int i; - /* Check that the array is sorted so that we can do the binary search */ - for (i = 1; i < ARRAY_SIZE(arm64_ftr_regs); i++) + for (i = 0; i < ARRAY_SIZE(arm64_ftr_regs); i++) { + const struct arm64_ftr_reg *ftr_reg = arm64_ftr_regs[i].reg; + const struct arm64_ftr_bits *ftr_bits = ftr_reg->ftr_bits; + unsigned int j = 0; + + /* + * Features here must be sorted in descending order with respect + * to their shift values and should not overlap with each other. + */ + for (; ftr_bits->width != 0; ftr_bits++, j++) { + unsigned int width = ftr_reg->ftr_bits[j].width; + unsigned int shift = ftr_reg->ftr_bits[j].shift; + unsigned int prev_shift; + + WARN((shift + width) > 64, + "%s has invalid feature at shift %d\n", + ftr_reg->name, shift); + + /* + * Skip the first feature. There is nothing to + * compare against for now. + */ + if (j == 0) + continue; + + prev_shift = ftr_reg->ftr_bits[j - 1].shift; + WARN((shift + width) > prev_shift, + "%s has feature overlap at shift %d\n", + ftr_reg->name, shift); + } + + /* + * Skip the first register. There is nothing to + * compare against for now. + */ + if (i == 0) + continue; + /* + * Registers here must be sorted in ascending order with respect + * to sys_id for subsequent binary search in get_arm64_ftr_reg() + * to work correctly. + */ BUG_ON(arm64_ftr_regs[i].sys_id < arm64_ftr_regs[i - 1].sys_id); + } } /* From f011856ce7b600fdc2d1102d56873b787ff6d1bb Mon Sep 17 00:00:00 2001 From: Jay Chen Date: Mon, 6 Jul 2020 19:22:45 +0800 Subject: [PATCH 08/40] perf/smmuv3: To simplify code for ioremap page in pmcg Use the devm_platform_get_and_ioremap_resource to simplify the code a bit. Signed-off-by: Jay Chen Link: https://lore.kernel.org/r/20200706112246.92220-2-jkchen@linux.alibaba.com Signed-off-by: Will Deacon --- drivers/perf/arm_smmuv3_pmu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index 48e28ef93a70..2d09f3e47d12 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -755,8 +755,7 @@ static int smmu_pmu_probe(struct platform_device *pdev) .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; - res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0); - smmu_pmu->reg_base = devm_ioremap_resource(dev, res_0); + smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, &res_0); if (IS_ERR(smmu_pmu->reg_base)) return PTR_ERR(smmu_pmu->reg_base); From 1583052d111f8ea43f9954c5e749164fd2b954af Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 26 Jun 2020 17:58:31 +0200 Subject: [PATCH 09/40] arm64/acpi: disallow AML memory opregions to access kernel memory AML uses SystemMemory opregions to allow AML handlers to access MMIO registers of, e.g., GPIO controllers, or access reserved regions of memory that are owned by the firmware. Currently, we also allow AML access to memory that is owned by the kernel and mapped via the linear region, which does not seem to be supported by a valid use case, and exposes the kernel's internal state to AML methods that may be buggy and exploitable. On arm64, ACPI support requires booting in EFI mode, and so we can cross reference the requested region against the EFI memory map, rather than just do a minimal check on the first page. So let's only permit regions to be remapped by the ACPI core if - they don't appear in the EFI memory map at all (which is the case for most MMIO), or - they are covered by a single region in the EFI memory map, which is not of a type that describes memory that is given to the kernel at boot. Reported-by: Jason A. Donenfeld Signed-off-by: Ard Biesheuvel Acked-by: Lorenzo Pieralisi Link: https://lore.kernel.org/r/20200626155832.2323789-2-ardb@kernel.org Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/acpi.h | 15 +------- arch/arm64/kernel/acpi.c | 66 +++++++++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+), 14 deletions(-) diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index a45366c3909b..bd68e1b7f29f 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -47,20 +47,7 @@ pgprot_t __acpi_get_mem_attribute(phys_addr_t addr); /* ACPI table mapping after acpi_permanent_mmap is set */ -static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys, - acpi_size size) -{ - /* For normal memory we already have a cacheable mapping. */ - if (memblock_is_map_memory(phys)) - return (void __iomem *)__phys_to_virt(phys); - - /* - * We should still honor the memory's attribute here because - * crash dump kernel possibly excludes some ACPI (reclaim) - * regions from memblock list. - */ - return __ioremap(phys, size, __acpi_get_mem_attribute(phys)); -} +void __iomem *acpi_os_ioremap(acpi_physical_address phys, acpi_size size); #define acpi_os_ioremap acpi_os_ioremap typedef u64 phys_cpuid_t; diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c index a7586a4db142..01b861e225b0 100644 --- a/arch/arm64/kernel/acpi.c +++ b/arch/arm64/kernel/acpi.c @@ -261,6 +261,72 @@ pgprot_t __acpi_get_mem_attribute(phys_addr_t addr) return __pgprot(PROT_DEVICE_nGnRnE); } +void __iomem *acpi_os_ioremap(acpi_physical_address phys, acpi_size size) +{ + efi_memory_desc_t *md, *region = NULL; + pgprot_t prot; + + if (WARN_ON_ONCE(!efi_enabled(EFI_MEMMAP))) + return NULL; + + for_each_efi_memory_desc(md) { + u64 end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT); + + if (phys < md->phys_addr || phys >= end) + continue; + + if (phys + size > end) { + pr_warn(FW_BUG "requested region covers multiple EFI memory regions\n"); + return NULL; + } + region = md; + break; + } + + /* + * It is fine for AML to remap regions that are not represented in the + * EFI memory map at all, as it only describes normal memory, and MMIO + * regions that require a virtual mapping to make them accessible to + * the EFI runtime services. + */ + prot = __pgprot(PROT_DEVICE_nGnRnE); + if (region) { + switch (region->type) { + case EFI_LOADER_CODE: + case EFI_LOADER_DATA: + case EFI_BOOT_SERVICES_CODE: + case EFI_BOOT_SERVICES_DATA: + case EFI_CONVENTIONAL_MEMORY: + case EFI_PERSISTENT_MEMORY: + pr_warn(FW_BUG "requested region covers kernel memory @ %pa\n", &phys); + return NULL; + + case EFI_ACPI_RECLAIM_MEMORY: + /* + * ACPI reclaim memory is used to pass firmware tables + * and other data that is intended for consumption by + * the OS only, which may decide it wants to reclaim + * that memory and use it for something else. We never + * do that, but we usually add it to the linear map + * anyway, in which case we should use the existing + * mapping. + */ + if (memblock_is_map_memory(phys)) + return (void __iomem *)__phys_to_virt(phys); + /* fall through */ + + default: + if (region->attribute & EFI_MEMORY_WB) + prot = PAGE_KERNEL; + else if (region->attribute & EFI_MEMORY_WT) + prot = __pgprot(PROT_NORMAL_WT); + else if (region->attribute & EFI_MEMORY_WC) + prot = __pgprot(PROT_NORMAL_NC); + } + } + return __ioremap(phys, size, prot); +} + /* * Claim Synchronous External Aborts as a firmware first notification. * From 325f5585ec36953a3fe2e000451f690440fe1bf5 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 26 Jun 2020 17:58:32 +0200 Subject: [PATCH 10/40] arm64/acpi: disallow writeable AML opregion mapping for EFI code regions Given that the contents of EFI runtime code and data regions are provided by the firmware, as well as the DSDT, it is not unimaginable that AML code exists today that accesses EFI runtime code regions using a SystemMemory OpRegion. There is nothing fundamentally wrong with that, but since we take great care to ensure that executable code is never mapped writeable and executable at the same time, we should not permit AML to create writable mapping. Signed-off-by: Ard Biesheuvel Acked-by: Lorenzo Pieralisi Link: https://lore.kernel.org/r/20200626155832.2323789-3-ardb@kernel.org Signed-off-by: Catalin Marinas --- arch/arm64/kernel/acpi.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c index 01b861e225b0..455966401102 100644 --- a/arch/arm64/kernel/acpi.c +++ b/arch/arm64/kernel/acpi.c @@ -301,6 +301,15 @@ void __iomem *acpi_os_ioremap(acpi_physical_address phys, acpi_size size) pr_warn(FW_BUG "requested region covers kernel memory @ %pa\n", &phys); return NULL; + case EFI_RUNTIME_SERVICES_CODE: + /* + * This would be unusual, but not problematic per se, + * as long as we take care not to create a writable + * mapping for executable code. + */ + prot = PAGE_KERNEL_RO; + break; + case EFI_ACPI_RECLAIM_MEMORY: /* * ACPI reclaim memory is used to pass firmware tables From 539707caa1a89ee4efc57b4e4231c20c46575ccc Mon Sep 17 00:00:00 2001 From: Shaokun Zhang Date: Thu, 18 Jun 2020 21:35:44 +0800 Subject: [PATCH 11/40] arm64: perf: Correct the event index in sysfs When PMU event ID is equal or greater than 0x4000, it will be reduced by 0x4000 and it is not the raw number in the sysfs. Let's correct it and obtain the raw event ID. Before this patch: cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed event=0x001 After this patch: cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed event=0x4001 Signed-off-by: Shaokun Zhang Cc: Will Deacon Cc: Mark Rutland Cc: Link: https://lore.kernel.org/r/1592487344-30555-3-git-send-email-zhangshaokun@hisilicon.com [will: fixed formatting of 'if' condition] Signed-off-by: Will Deacon --- arch/arm64/kernel/perf_event.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 4d7879484cec..581602413a13 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -155,7 +155,7 @@ armv8pmu_events_sysfs_show(struct device *dev, pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr); - return sprintf(page, "event=0x%03llx\n", pmu_attr->id); + return sprintf(page, "event=0x%04llx\n", pmu_attr->id); } #define ARMV8_EVENT_ATTR(name, config) \ @@ -244,10 +244,13 @@ armv8pmu_event_attr_is_visible(struct kobject *kobj, test_bit(pmu_attr->id, cpu_pmu->pmceid_bitmap)) return attr->mode; - pmu_attr->id -= ARMV8_PMUV3_EXT_COMMON_EVENT_BASE; - if (pmu_attr->id < ARMV8_PMUV3_MAX_COMMON_EVENTS && - test_bit(pmu_attr->id, cpu_pmu->pmceid_ext_bitmap)) - return attr->mode; + if (pmu_attr->id >= ARMV8_PMUV3_EXT_COMMON_EVENT_BASE) { + u64 id = pmu_attr->id - ARMV8_PMUV3_EXT_COMMON_EVENT_BASE; + + if (id < ARMV8_PMUV3_MAX_COMMON_EVENTS && + test_bit(id, cpu_pmu->pmceid_ext_bitmap)) + return attr->mode; + } return 0; } From 1b86abc1c645ad5c9c7bf70910cb3ce73939d2d7 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 16 Jul 2020 13:11:24 +0800 Subject: [PATCH 12/40] sched_clock: Expose struct clock_read_data In order to support perf_event_mmap_page::cap_time features, an architecture needs, aside from a userspace readable counter register, to expose the exact clock data so that userspace can convert the counter register into a correct timestamp. Provide struct clock_read_data and two (seqcount) helpers so that architectures (arm64 in specific) can expose the numbers to userspace. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Leo Yan Link: https://lore.kernel.org/r/20200716051130.4359-2-leo.yan@linaro.org Signed-off-by: Will Deacon --- include/linux/sched_clock.h | 28 +++++++++++++++++++++++++ kernel/time/sched_clock.c | 41 ++++++++++++------------------------- 2 files changed, 41 insertions(+), 28 deletions(-) diff --git a/include/linux/sched_clock.h b/include/linux/sched_clock.h index 0bb04a96a6d4..528718e4ed52 100644 --- a/include/linux/sched_clock.h +++ b/include/linux/sched_clock.h @@ -6,6 +6,34 @@ #define LINUX_SCHED_CLOCK #ifdef CONFIG_GENERIC_SCHED_CLOCK +/** + * struct clock_read_data - data required to read from sched_clock() + * + * @epoch_ns: sched_clock() value at last update + * @epoch_cyc: Clock cycle value at last update. + * @sched_clock_mask: Bitmask for two's complement subtraction of non 64bit + * clocks. + * @read_sched_clock: Current clock source (or dummy source when suspended). + * @mult: Multipler for scaled math conversion. + * @shift: Shift value for scaled math conversion. + * + * Care must be taken when updating this structure; it is read by + * some very hot code paths. It occupies <=40 bytes and, when combined + * with the seqcount used to synchronize access, comfortably fits into + * a 64 byte cache line. + */ +struct clock_read_data { + u64 epoch_ns; + u64 epoch_cyc; + u64 sched_clock_mask; + u64 (*read_sched_clock)(void); + u32 mult; + u32 shift; +}; + +extern struct clock_read_data *sched_clock_read_begin(unsigned int *seq); +extern int sched_clock_read_retry(unsigned int seq); + extern void generic_sched_clock_init(void); extern void sched_clock_register(u64 (*read)(void), int bits, diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index fa3f800d7d76..0acaadc3156c 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -19,31 +19,6 @@ #include "timekeeping.h" -/** - * struct clock_read_data - data required to read from sched_clock() - * - * @epoch_ns: sched_clock() value at last update - * @epoch_cyc: Clock cycle value at last update. - * @sched_clock_mask: Bitmask for two's complement subtraction of non 64bit - * clocks. - * @read_sched_clock: Current clock source (or dummy source when suspended). - * @mult: Multipler for scaled math conversion. - * @shift: Shift value for scaled math conversion. - * - * Care must be taken when updating this structure; it is read by - * some very hot code paths. It occupies <=40 bytes and, when combined - * with the seqcount used to synchronize access, comfortably fits into - * a 64 byte cache line. - */ -struct clock_read_data { - u64 epoch_ns; - u64 epoch_cyc; - u64 sched_clock_mask; - u64 (*read_sched_clock)(void); - u32 mult; - u32 shift; -}; - /** * struct clock_data - all data needed for sched_clock() (including * registration of a new clock source) @@ -93,6 +68,17 @@ static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift) return (cyc * mult) >> shift; } +struct clock_read_data *sched_clock_read_begin(unsigned int *seq) +{ + *seq = raw_read_seqcount(&cd.seq); + return cd.read_data + (*seq & 1); +} + +int sched_clock_read_retry(unsigned int seq) +{ + return read_seqcount_retry(&cd.seq, seq); +} + unsigned long long notrace sched_clock(void) { u64 cyc, res; @@ -100,13 +86,12 @@ unsigned long long notrace sched_clock(void) struct clock_read_data *rd; do { - seq = raw_read_seqcount(&cd.seq); - rd = cd.read_data + (seq & 1); + rd = sched_clock_read_begin(&seq); cyc = (rd->read_sched_clock() - rd->epoch_cyc) & rd->sched_clock_mask; res = rd->epoch_ns + cyc_to_ns(cyc, rd->mult, rd->shift); - } while (read_seqcount_retry(&cd.seq, seq)); + } while (sched_clock_read_retry(seq)); return res; } From aadd6e5caaacd6feca9691ba30536e7de5a7d152 Mon Sep 17 00:00:00 2001 From: "Ahmed S. Darwish" Date: Thu, 16 Jul 2020 13:11:25 +0800 Subject: [PATCH 13/40] time/sched_clock: Use raw_read_seqcount_latch() sched_clock uses seqcount_t latching to switch between two storage places protected by the sequence counter. This allows it to have interruptible, NMI-safe, seqcount_t write side critical sections. Since 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"), raw_read_seqcount_latch() became the standardized way for seqcount_t latch read paths. Due to the dependent load, it also has one read memory barrier less than the currently used raw_read_seqcount() API. Use raw_read_seqcount_latch() for the seqcount_t latch read path. Signed-off-by: Ahmed S. Darwish Signed-off-by: Leo Yan Link: https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de Link: https://lore.kernel.org/r/20200716051130.4359-3-leo.yan@linaro.org References: 1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI") Signed-off-by: Will Deacon --- kernel/time/sched_clock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 0acaadc3156c..0deaf4b79fb4 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -70,7 +70,7 @@ static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift) struct clock_read_data *sched_clock_read_begin(unsigned int *seq) { - *seq = raw_read_seqcount(&cd.seq); + *seq = raw_read_seqcount_latch(&cd.seq); return cd.read_data + (*seq & 1); } From 950b74ddefc4a42add8b1ae0170aa309338ffe73 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 16 Jul 2020 13:11:26 +0800 Subject: [PATCH 14/40] arm64: perf: Implement correct cap_user_time As reported by Leo; the existing implementation is broken when the clock and counter don't intersect at 0. Use the sched_clock's struct clock_read_data information to correctly implement cap_user_time and cap_user_time_zero. Note that the ARM64 counter is architecturally only guaranteed to be 56bit wide (implementations are allowed to be wider) and the existing perf ABI cannot deal with wrap-around. This implementation should also be faster than the old; seeing how we don't need to recompute mult and shift all the time. [leoyan: Use mul_u64_u32_shr() to convert cyc to ns to avoid overflow] Reported-by: Leo Yan Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Leo Yan Link: https://lore.kernel.org/r/20200716051130.4359-4-leo.yan@linaro.org Signed-off-by: Will Deacon --- arch/arm64/kernel/perf_event.c | 38 ++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 581602413a13..3bbbc22a5148 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -19,6 +19,7 @@ #include #include #include +#include #include /* ARMv8 Cortex-A53 specific event types. */ @@ -1168,28 +1169,47 @@ device_initcall(armv8_pmu_driver_init) void arch_perf_update_userpage(struct perf_event *event, struct perf_event_mmap_page *userpg, u64 now) { - u32 freq; - u32 shift; + struct clock_read_data *rd; + unsigned int seq; + u64 ns; /* * Internal timekeeping for enabled/running/stopped times * is always computed with the sched_clock. */ - freq = arch_timer_get_rate(); userpg->cap_user_time = 1; + userpg->cap_user_time_zero = 1; + + do { + rd = sched_clock_read_begin(&seq); + + userpg->time_mult = rd->mult; + userpg->time_shift = rd->shift; + userpg->time_zero = rd->epoch_ns; + + /* + * This isn't strictly correct, the ARM64 counter can be + * 'short' and then we get funnies when it wraps. The correct + * thing would be to extend the perf ABI with a cycle and mask + * value, but because wrapping on ARM64 is very rare in + * practise this 'works'. + */ + ns = mul_u64_u32_shr(rd->epoch_cyc, rd->mult, rd->shift); + userpg->time_zero -= ns; + + } while (sched_clock_read_retry(seq)); + + userpg->time_offset = userpg->time_zero - now; - clocks_calc_mult_shift(&userpg->time_mult, &shift, freq, - NSEC_PER_SEC, 0); /* * time_shift is not expected to be greater than 31 due to * the original published conversion algorithm shifting a * 32-bit value (now specifies a 64-bit value) - refer * perf_event_mmap_page documentation in perf_event.h. */ - if (shift == 32) { - shift = 31; + if (userpg->time_shift == 32) { + userpg->time_shift = 31; userpg->time_mult >>= 1; } - userpg->time_shift = (u16)shift; - userpg->time_offset = -now; + } From 279a811eb520594fac3cd3a541e6c7ea50072ac9 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 16 Jul 2020 13:11:27 +0800 Subject: [PATCH 15/40] arm64: perf: Only advertise cap_user_time for arch_timer When sched_clock is running on anything other than arch_timer, don't advertise cap_user_time*. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Leo Yan Link: https://lore.kernel.org/r/20200716051130.4359-5-leo.yan@linaro.org Requested-by: Will Deacon Signed-off-by: Will Deacon --- arch/arm64/kernel/perf_event.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 3bbbc22a5148..674edc7ba8ca 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -13,6 +13,8 @@ #include #include +#include + #include #include #include @@ -1173,16 +1175,15 @@ void arch_perf_update_userpage(struct perf_event *event, unsigned int seq; u64 ns; - /* - * Internal timekeeping for enabled/running/stopped times - * is always computed with the sched_clock. - */ - userpg->cap_user_time = 1; - userpg->cap_user_time_zero = 1; + userpg->cap_user_time = 0; + userpg->cap_user_time_zero = 0; do { rd = sched_clock_read_begin(&seq); + if (rd->read_sched_clock != arch_timer_read_counter) + return; + userpg->time_mult = rd->mult; userpg->time_shift = rd->shift; userpg->time_zero = rd->epoch_ns; @@ -1212,4 +1213,10 @@ void arch_perf_update_userpage(struct perf_event *event, userpg->time_mult >>= 1; } + /* + * Internal timekeeping for enabled/running/stopped times + * is always computed with the sched_clock. + */ + userpg->cap_user_time = 1; + userpg->cap_user_time_zero = 1; } From 6c0246a4588d418f72acd40a7b7601be403d80a9 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 16 Jul 2020 13:11:28 +0800 Subject: [PATCH 16/40] perf: Add perf_event_mmap_page::cap_user_time_short ABI In order to support short clock counters, provide an ABI extension. As a whole: u64 time, delta, cyc = read_cycle_counter(); + if (cap_user_time_short) + cyc = time_cycle + ((cyc - time_cycle) & time_mask); delta = mul_u64_u32_shr(cyc, time_mult, time_shift); if (cap_user_time_zero) time = time_zero + delta; delta += time_offset; Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Leo Yan Link: https://lore.kernel.org/r/20200716051130.4359-6-leo.yan@linaro.org Signed-off-by: Will Deacon --- include/uapi/linux/perf_event.h | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 7b2d6fc9e6ed..21a1edd08cbe 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -532,9 +532,10 @@ struct perf_event_mmap_page { cap_bit0_is_deprecated : 1, /* Always 1, signals that bit 0 is zero */ cap_user_rdpmc : 1, /* The RDPMC instruction can be used to read counts */ - cap_user_time : 1, /* The time_* fields are used */ + cap_user_time : 1, /* The time_{shift,mult,offset} fields are used */ cap_user_time_zero : 1, /* The time_zero field is used */ - cap_____res : 59; + cap_user_time_short : 1, /* the time_{cycle,mask} fields are used */ + cap_____res : 58; }; }; @@ -593,13 +594,29 @@ struct perf_event_mmap_page { * ((rem * time_mult) >> time_shift); */ __u64 time_zero; + __u32 size; /* Header size up to __reserved[] fields. */ + __u32 __reserved_1; + + /* + * If cap_usr_time_short, the hardware clock is less than 64bit wide + * and we must compute the 'cyc' value, as used by cap_usr_time, as: + * + * cyc = time_cycles + ((cyc - time_cycles) & time_mask) + * + * NOTE: this form is explicitly chosen such that cap_usr_time_short + * is a correction on top of cap_usr_time, and code that doesn't + * know about cap_usr_time_short still works under the assumption + * the counter doesn't wrap. + */ + __u64 time_cycles; + __u64 time_mask; /* * Hole for extension of the self monitor capabilities */ - __u8 __reserved[118*8+4]; /* align to 1k. */ + __u8 __reserved[116*8]; /* align to 1k. */ /* * Control data for the mmap() data buffer. From c8f9eb0d6ebaa768c9f6eb2ee21b01d74230934d Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 16 Jul 2020 13:11:29 +0800 Subject: [PATCH 17/40] arm64: perf: Add cap_user_time_short This completes the ARM64 cap_user_time support. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Leo Yan Link: https://lore.kernel.org/r/20200716051130.4359-7-leo.yan@linaro.org Signed-off-by: Will Deacon --- arch/arm64/kernel/perf_event.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 674edc7ba8ca..fdb6029c9021 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -1177,6 +1177,7 @@ void arch_perf_update_userpage(struct perf_event *event, userpg->cap_user_time = 0; userpg->cap_user_time_zero = 0; + userpg->cap_user_time_short = 0; do { rd = sched_clock_read_begin(&seq); @@ -1187,13 +1188,13 @@ void arch_perf_update_userpage(struct perf_event *event, userpg->time_mult = rd->mult; userpg->time_shift = rd->shift; userpg->time_zero = rd->epoch_ns; + userpg->time_cycles = rd->epoch_cyc; + userpg->time_mask = rd->sched_clock_mask; /* - * This isn't strictly correct, the ARM64 counter can be - * 'short' and then we get funnies when it wraps. The correct - * thing would be to extend the perf ABI with a cycle and mask - * value, but because wrapping on ARM64 is very rare in - * practise this 'works'. + * Subtract the cycle base, such that software that + * doesn't know about cap_user_time_short still 'works' + * assuming no wraps. */ ns = mul_u64_u32_shr(rd->epoch_cyc, rd->mult, rd->shift); userpg->time_zero -= ns; @@ -1219,4 +1220,5 @@ void arch_perf_update_userpage(struct perf_event *event, */ userpg->cap_user_time = 1; userpg->cap_user_time_zero = 1; + userpg->cap_user_time_short = 1; } From 5271d915a99c696a2f16ae59cf6a037be35afa22 Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Thu, 16 Jul 2020 13:11:30 +0800 Subject: [PATCH 18/40] tools headers UAPI: Update tools's copy of linux/perf_event.h To get the changes in the commit: "perf: Add perf_event_mmap_page::cap_user_time_short ABI" This update is a prerequisite to add support for short clock counters related ABI extension. Signed-off-by: Leo Yan Link: https://lore.kernel.org/r/20200716051130.4359-8-leo.yan@linaro.org Signed-off-by: Will Deacon --- tools/include/uapi/linux/perf_event.h | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index 7b2d6fc9e6ed..21a1edd08cbe 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -532,9 +532,10 @@ struct perf_event_mmap_page { cap_bit0_is_deprecated : 1, /* Always 1, signals that bit 0 is zero */ cap_user_rdpmc : 1, /* The RDPMC instruction can be used to read counts */ - cap_user_time : 1, /* The time_* fields are used */ + cap_user_time : 1, /* The time_{shift,mult,offset} fields are used */ cap_user_time_zero : 1, /* The time_zero field is used */ - cap_____res : 59; + cap_user_time_short : 1, /* the time_{cycle,mask} fields are used */ + cap_____res : 58; }; }; @@ -593,13 +594,29 @@ struct perf_event_mmap_page { * ((rem * time_mult) >> time_shift); */ __u64 time_zero; + __u32 size; /* Header size up to __reserved[] fields. */ + __u32 __reserved_1; + + /* + * If cap_usr_time_short, the hardware clock is less than 64bit wide + * and we must compute the 'cyc' value, as used by cap_usr_time, as: + * + * cyc = time_cycles + ((cyc - time_cycles) & time_mask) + * + * NOTE: this form is explicitly chosen such that cap_usr_time_short + * is a correction on top of cap_usr_time, and code that doesn't + * know about cap_usr_time_short still works under the assumption + * the counter doesn't wrap. + */ + __u64 time_cycles; + __u64 time_mask; /* * Hole for extension of the self monitor capabilities */ - __u8 __reserved[118*8+4]; /* align to 1k. */ + __u8 __reserved[116*8]; /* align to 1k. */ /* * Control data for the mmap() data buffer. From 55fdc1f44cd6bb1d61c9ca946d8f7cd67ea0bf36 Mon Sep 17 00:00:00 2001 From: Shaokun Zhang Date: Tue, 21 Jul 2020 18:49:33 +0800 Subject: [PATCH 19/40] arm64: perf: Expose some new events via sysfs Some new PMU events can been detected by PMCEID1_EL0, but it can't be listed, Let's expose these through sysfs. Signed-off-by: Shaokun Zhang Cc: Will Deacon Cc: Mark Rutland Link: https://lore.kernel.org/r/1595328573-12751-2-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Will Deacon --- arch/arm64/include/asm/perf_event.h | 27 +++++++++++++++++++++++++++ arch/arm64/kernel/perf_event.c | 19 +++++++++++++++++++ 2 files changed, 46 insertions(+) diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h index e7765b62c712..2c2d7dbe8a02 100644 --- a/arch/arm64/include/asm/perf_event.h +++ b/arch/arm64/include/asm/perf_event.h @@ -72,6 +72,13 @@ #define ARMV8_PMUV3_PERFCTR_LL_CACHE_RD 0x36 #define ARMV8_PMUV3_PERFCTR_LL_CACHE_MISS_RD 0x37 #define ARMV8_PMUV3_PERFCTR_REMOTE_ACCESS_RD 0x38 +#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_LMISS_RD 0x39 +#define ARMV8_PMUV3_PERFCTR_OP_RETIRED 0x3A +#define ARMV8_PMUV3_PERFCTR_OP_SPEC 0x3B +#define ARMV8_PMUV3_PERFCTR_STALL 0x3C +#define ARMV8_PMUV3_PERFCTR_STALL_SLOT_BACKEND 0x3D +#define ARMV8_PMUV3_PERFCTR_STALL_SLOT_FRONTEND 0x3E +#define ARMV8_PMUV3_PERFCTR_STALL_SLOT 0x3F /* Statistical profiling extension microarchitectural events */ #define ARMV8_SPE_PERFCTR_SAMPLE_POP 0x4000 @@ -79,6 +86,26 @@ #define ARMV8_SPE_PERFCTR_SAMPLE_FILTRATE 0x4002 #define ARMV8_SPE_PERFCTR_SAMPLE_COLLISION 0x4003 +/* AMUv1 architecture events */ +#define ARMV8_AMU_PERFCTR_CNT_CYCLES 0x4004 +#define ARMV8_AMU_PERFCTR_STALL_BACKEND_MEM 0x4005 + +/* long-latency read miss events */ +#define ARMV8_PMUV3_PERFCTR_L1I_CACHE_LMISS 0x4006 +#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_LMISS_RD 0x4009 +#define ARMV8_PMUV3_PERFCTR_L2I_CACHE_LMISS 0x400A +#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_LMISS_RD 0x400B + +/* additional latency from alignment events */ +#define ARMV8_PMUV3_PERFCTR_LDST_ALIGN_LAT 0x4020 +#define ARMV8_PMUV3_PERFCTR_LD_ALIGN_LAT 0x4021 +#define ARMV8_PMUV3_PERFCTR_ST_ALIGN_LAT 0x4022 + +/* Armv8.5 Memory Tagging Extension events */ +#define ARMV8_MTE_PERFCTR_MEM_ACCESS_CHECKED 0x4024 +#define ARMV8_MTE_PERFCTR_MEM_ACCESS_CHECKED_RD 0x4025 +#define ARMV8_MTE_PERFCTR_MEM_ACCESS_CHECKED_WR 0x4026 + /* ARMv8 recommended implementation defined event types */ #define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_RD 0x40 #define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WR 0x41 diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index fdb6029c9021..462f9a9cc44b 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -225,10 +225,29 @@ static struct attribute *armv8_pmuv3_event_attrs[] = { ARMV8_EVENT_ATTR(ll_cache_rd, ARMV8_PMUV3_PERFCTR_LL_CACHE_RD), ARMV8_EVENT_ATTR(ll_cache_miss_rd, ARMV8_PMUV3_PERFCTR_LL_CACHE_MISS_RD), ARMV8_EVENT_ATTR(remote_access_rd, ARMV8_PMUV3_PERFCTR_REMOTE_ACCESS_RD), + ARMV8_EVENT_ATTR(l1d_cache_lmiss_rd, ARMV8_PMUV3_PERFCTR_L1D_CACHE_LMISS_RD), + ARMV8_EVENT_ATTR(op_retired, ARMV8_PMUV3_PERFCTR_OP_RETIRED), + ARMV8_EVENT_ATTR(op_spec, ARMV8_PMUV3_PERFCTR_OP_SPEC), + ARMV8_EVENT_ATTR(stall, ARMV8_PMUV3_PERFCTR_STALL), + ARMV8_EVENT_ATTR(stall_slot_backend, ARMV8_PMUV3_PERFCTR_STALL_SLOT_BACKEND), + ARMV8_EVENT_ATTR(stall_slot_frontend, ARMV8_PMUV3_PERFCTR_STALL_SLOT_FRONTEND), + ARMV8_EVENT_ATTR(stall_slot, ARMV8_PMUV3_PERFCTR_STALL_SLOT), ARMV8_EVENT_ATTR(sample_pop, ARMV8_SPE_PERFCTR_SAMPLE_POP), ARMV8_EVENT_ATTR(sample_feed, ARMV8_SPE_PERFCTR_SAMPLE_FEED), ARMV8_EVENT_ATTR(sample_filtrate, ARMV8_SPE_PERFCTR_SAMPLE_FILTRATE), ARMV8_EVENT_ATTR(sample_collision, ARMV8_SPE_PERFCTR_SAMPLE_COLLISION), + ARMV8_EVENT_ATTR(cnt_cycles, ARMV8_AMU_PERFCTR_CNT_CYCLES), + ARMV8_EVENT_ATTR(stall_backend_mem, ARMV8_AMU_PERFCTR_STALL_BACKEND_MEM), + ARMV8_EVENT_ATTR(l1i_cache_lmiss, ARMV8_PMUV3_PERFCTR_L1I_CACHE_LMISS), + ARMV8_EVENT_ATTR(l2d_cache_lmiss_rd, ARMV8_PMUV3_PERFCTR_L2D_CACHE_LMISS_RD), + ARMV8_EVENT_ATTR(l2i_cache_lmiss, ARMV8_PMUV3_PERFCTR_L2I_CACHE_LMISS), + ARMV8_EVENT_ATTR(l3d_cache_lmiss_rd, ARMV8_PMUV3_PERFCTR_L3D_CACHE_LMISS_RD), + ARMV8_EVENT_ATTR(ldst_align_lat, ARMV8_PMUV3_PERFCTR_LDST_ALIGN_LAT), + ARMV8_EVENT_ATTR(ld_align_lat, ARMV8_PMUV3_PERFCTR_LD_ALIGN_LAT), + ARMV8_EVENT_ATTR(st_align_lat, ARMV8_PMUV3_PERFCTR_ST_ALIGN_LAT), + ARMV8_EVENT_ATTR(mem_access_checked, ARMV8_MTE_PERFCTR_MEM_ACCESS_CHECKED), + ARMV8_EVENT_ATTR(mem_access_checked_rd, ARMV8_MTE_PERFCTR_MEM_ACCESS_CHECKED_RD), + ARMV8_EVENT_ATTR(mem_access_checked_wr, ARMV8_MTE_PERFCTR_MEM_ACCESS_CHECKED_WR), NULL, }; From d53b5c013e1e7c3b43a7487171a84ee2acdd9597 Mon Sep 17 00:00:00 2001 From: Andrei Vagin Date: Wed, 24 Jun 2020 01:33:16 -0700 Subject: [PATCH 20/40] arm64/vdso: use the fault callback to map vvar pages Currently the vdso has no awareness of time namespaces, which may apply distinct offsets to processes in different namespaces. To handle this within the vdso, we'll need to expose a per-namespace data page. As a preparatory step, this patch separates the vdso data page from the code pages, and has it faulted in via its own fault callback. Subsquent patches will extend this to support distinct pages per time namespace. The vvar vma has to be installed with the VM_PFNMAP flag to handle faults via its vma fault callback. Signed-off-by: Andrei Vagin Reviewed-by: Vincenzo Frascino Reviewed-by: Dmitry Safonov Link: https://lore.kernel.org/r/20200624083321.144975-2-avagin@gmail.com Signed-off-by: Catalin Marinas --- arch/arm64/kernel/vdso.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index e546df0efefb..eb7798e5eb00 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -107,29 +107,32 @@ static int __vdso_init(enum vdso_abi abi) vdso_info[abi].vdso_code_start) >> PAGE_SHIFT; - /* Allocate the vDSO pagelist, plus a page for the data. */ - vdso_pagelist = kcalloc(vdso_info[abi].vdso_pages + 1, + vdso_pagelist = kcalloc(vdso_info[abi].vdso_pages, sizeof(struct page *), GFP_KERNEL); if (vdso_pagelist == NULL) return -ENOMEM; - /* Grab the vDSO data page. */ - vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data)); - - /* Grab the vDSO code pages. */ pfn = sym_to_pfn(vdso_info[abi].vdso_code_start); for (i = 0; i < vdso_info[abi].vdso_pages; i++) - vdso_pagelist[i + 1] = pfn_to_page(pfn + i); + vdso_pagelist[i] = pfn_to_page(pfn + i); - vdso_info[abi].dm->pages = &vdso_pagelist[0]; - vdso_info[abi].cm->pages = &vdso_pagelist[1]; + vdso_info[abi].cm->pages = vdso_pagelist; return 0; } +static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, + struct vm_area_struct *vma, struct vm_fault *vmf) +{ + if (vmf->pgoff == 0) + return vmf_insert_pfn(vma, vmf->address, + sym_to_pfn(vdso_data)); + return VM_FAULT_SIGBUS; +} + static int __setup_additional_pages(enum vdso_abi abi, struct mm_struct *mm, struct linux_binprm *bprm, @@ -150,7 +153,7 @@ static int __setup_additional_pages(enum vdso_abi abi, } ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE, - VM_READ|VM_MAYREAD, + VM_READ|VM_MAYREAD|VM_PFNMAP, vdso_info[abi].dm); if (IS_ERR(ret)) goto up_fail; @@ -206,6 +209,7 @@ static struct vm_special_mapping aarch32_vdso_maps[] = { #ifdef CONFIG_COMPAT_VDSO [AA32_MAP_VVAR] = { .name = "[vvar]", + .fault = vvar_fault, }, [AA32_MAP_VDSO] = { .name = "[vdso]", @@ -371,6 +375,7 @@ enum aarch64_map { static struct vm_special_mapping aarch64_vdso_maps[] __ro_after_init = { [AA64_MAP_VVAR] = { .name = "[vvar]", + .fault = vvar_fault, }, [AA64_MAP_VDSO] = { .name = "[vdso]", From 1b6867d2916bb91e94ddcc9c709e4779419fe391 Mon Sep 17 00:00:00 2001 From: Andrei Vagin Date: Wed, 24 Jun 2020 01:33:17 -0700 Subject: [PATCH 21/40] arm64/vdso: Zap vvar pages when switching to a time namespace The order of vvar pages depends on whether a task belongs to the root time namespace or not. In the root time namespace, a task doesn't have a per-namespace page. In a non-root namespace, the VVAR page which contains the system-wide VDSO data is replaced with a namespace specific page that contains clock offsets. Whenever a task changes its namespace, the VVAR page tables are cleared and then they will be re-faulted with a corresponding layout. A task can switch its time namespace only if its ->mm isn't shared with another task. Signed-off-by: Andrei Vagin Reviewed-by: Vincenzo Frascino Reviewed-by: Dmitry Safonov Reviewed-by: Christian Brauner Link: https://lore.kernel.org/r/20200624083321.144975-3-avagin@gmail.com Signed-off-by: Catalin Marinas --- arch/arm64/kernel/vdso.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index eb7798e5eb00..33ac18060bfc 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -124,6 +124,37 @@ static int __vdso_init(enum vdso_abi abi) return 0; } +#ifdef CONFIG_TIME_NS +/* + * The vvar mapping contains data for a specific time namespace, so when a task + * changes namespace we must unmap its vvar data for the old namespace. + * Subsequent faults will map in data for the new namespace. + * + * For more details see timens_setup_vdso_data(). + */ +int vdso_join_timens(struct task_struct *task, struct time_namespace *ns) +{ + struct mm_struct *mm = task->mm; + struct vm_area_struct *vma; + + mmap_read_lock(mm); + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + unsigned long size = vma->vm_end - vma->vm_start; + + if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm)) + zap_page_range(vma, vma->vm_start, size); +#ifdef CONFIG_COMPAT_VDSO + if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm)) + zap_page_range(vma, vma->vm_start, size); +#endif + } + + mmap_read_unlock(mm); + return 0; +} +#endif + static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, struct vm_area_struct *vma, struct vm_fault *vmf) { From 3503d56cc7233ced602e38a4c13caa64f00ab2aa Mon Sep 17 00:00:00 2001 From: Andrei Vagin Date: Wed, 24 Jun 2020 01:33:18 -0700 Subject: [PATCH 22/40] arm64/vdso: Add time namespace page Allocate the time namespace page among VVAR pages. Provide __arch_get_timens_vdso_data() helper for VDSO code to get the code-relative position of VVARs on that special page. If a task belongs to a time namespace then the VVAR page which contains the system wide VDSO data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the VDSO data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. The time-namespace page isn't allocated on !CONFIG_TIME_NAMESPACE, but vma is the same size, which simplifies criu/vdso migration between different kernel configs. Signed-off-by: Andrei Vagin Reviewed-by: Vincenzo Frascino Reviewed-by: Dmitry Safonov Cc: Mark Rutland Link: https://lore.kernel.org/r/20200624083321.144975-4-avagin@gmail.com Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/vdso.h | 2 ++ .../include/asm/vdso/compat_gettimeofday.h | 12 ++++++++++++ arch/arm64/include/asm/vdso/gettimeofday.h | 8 ++++++++ arch/arm64/kernel/vdso.c | 19 ++++++++++++++++--- arch/arm64/kernel/vdso/vdso.lds.S | 5 ++++- arch/arm64/kernel/vdso32/vdso.lds.S | 5 ++++- include/vdso/datapage.h | 1 + 7 files changed, 47 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h index 07468428fd29..f99dcb94b438 100644 --- a/arch/arm64/include/asm/vdso.h +++ b/arch/arm64/include/asm/vdso.h @@ -12,6 +12,8 @@ */ #define VDSO_LBASE 0x0 +#define __VVAR_PAGES 2 + #ifndef __ASSEMBLY__ #include diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h index b6907ae78e53..b7c549d46d18 100644 --- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h +++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h @@ -152,6 +152,18 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void) return ret; } +#ifdef CONFIG_TIME_NS +static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void) +{ + const struct vdso_data *ret; + + /* See __arch_get_vdso_data(). */ + asm volatile("mov %0, %1" : "=r"(ret) : "r"(_timens_data)); + + return ret; +} +#endif + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_VDSO_GETTIMEOFDAY_H */ diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h index afba6ba332f8..cf39eae5eaaf 100644 --- a/arch/arm64/include/asm/vdso/gettimeofday.h +++ b/arch/arm64/include/asm/vdso/gettimeofday.h @@ -96,6 +96,14 @@ const struct vdso_data *__arch_get_vdso_data(void) return _vdso_data; } +#ifdef CONFIG_TIME_NS +static __always_inline +const struct vdso_data *__arch_get_timens_vdso_data(void) +{ + return _timens_data; +} +#endif + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_VDSO_GETTIMEOFDAY_H */ diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 33ac18060bfc..fcb559726920 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -40,6 +40,12 @@ enum vdso_abi { #endif /* CONFIG_COMPAT_VDSO */ }; +enum vvar_pages { + VVAR_DATA_PAGE_OFFSET, + VVAR_TIMENS_PAGE_OFFSET, + VVAR_NR_PAGES, +}; + struct vdso_abi_info { const char *name; const char *vdso_code_start; @@ -125,6 +131,11 @@ static int __vdso_init(enum vdso_abi abi) } #ifdef CONFIG_TIME_NS +struct vdso_data *arch_get_vdso_data(void *vvar_page) +{ + return (struct vdso_data *)(vvar_page); +} + /* * The vvar mapping contains data for a specific time namespace, so when a task * changes namespace we must unmap its vvar data for the old namespace. @@ -173,9 +184,11 @@ static int __setup_additional_pages(enum vdso_abi abi, unsigned long gp_flags = 0; void *ret; + BUILD_BUG_ON(VVAR_NR_PAGES != __VVAR_PAGES); + vdso_text_len = vdso_info[abi].vdso_pages << PAGE_SHIFT; /* Be sure to map the data page */ - vdso_mapping_len = vdso_text_len + PAGE_SIZE; + vdso_mapping_len = vdso_text_len + VVAR_NR_PAGES * PAGE_SIZE; vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0); if (IS_ERR_VALUE(vdso_base)) { @@ -183,7 +196,7 @@ static int __setup_additional_pages(enum vdso_abi abi, goto up_fail; } - ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE, + ret = _install_special_mapping(mm, vdso_base, VVAR_NR_PAGES * PAGE_SIZE, VM_READ|VM_MAYREAD|VM_PFNMAP, vdso_info[abi].dm); if (IS_ERR(ret)) @@ -192,7 +205,7 @@ static int __setup_additional_pages(enum vdso_abi abi, if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) && system_supports_bti()) gp_flags = VM_ARM64_BTI; - vdso_base += PAGE_SIZE; + vdso_base += VVAR_NR_PAGES * PAGE_SIZE; mm->context.vdso = (void *)vdso_base; ret = _install_special_mapping(mm, vdso_base, vdso_text_len, VM_READ|VM_EXEC|gp_flags| diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S index 7ad2d3a0cd48..d808ad31e01f 100644 --- a/arch/arm64/kernel/vdso/vdso.lds.S +++ b/arch/arm64/kernel/vdso/vdso.lds.S @@ -17,7 +17,10 @@ OUTPUT_ARCH(aarch64) SECTIONS { - PROVIDE(_vdso_data = . - PAGE_SIZE); + PROVIDE(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE); +#ifdef CONFIG_TIME_NS + PROVIDE(_timens_data = _vdso_data + PAGE_SIZE); +#endif . = VDSO_LBASE + SIZEOF_HEADERS; .hash : { *(.hash) } :text diff --git a/arch/arm64/kernel/vdso32/vdso.lds.S b/arch/arm64/kernel/vdso32/vdso.lds.S index 337d03522048..3348ce5ea306 100644 --- a/arch/arm64/kernel/vdso32/vdso.lds.S +++ b/arch/arm64/kernel/vdso32/vdso.lds.S @@ -17,7 +17,10 @@ OUTPUT_ARCH(arm) SECTIONS { - PROVIDE_HIDDEN(_vdso_data = . - PAGE_SIZE); + PROVIDE_HIDDEN(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE); +#ifdef CONFIG_TIME_NS + PROVIDE_HIDDEN(_timens_data = _vdso_data + PAGE_SIZE); +#endif . = VDSO_LBASE + SIZEOF_HEADERS; .hash : { *(.hash) } :text diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h index 7955c56d6b3c..ee810cae4e1e 100644 --- a/include/vdso/datapage.h +++ b/include/vdso/datapage.h @@ -109,6 +109,7 @@ struct vdso_data { * relocation, and this is what we need. */ extern struct vdso_data _vdso_data[CS_BASES] __attribute__((visibility("hidden"))); +extern struct vdso_data _timens_data[CS_BASES] __attribute__((visibility("hidden"))); /* * The generic vDSO implementation requires that gettimeofday.h From ee3cda8e46060b021087b6ef451e1cd9fa648af6 Mon Sep 17 00:00:00 2001 From: Andrei Vagin Date: Wed, 24 Jun 2020 01:33:19 -0700 Subject: [PATCH 23/40] arm64/vdso: Handle faults on timens page If a task belongs to a time namespace then the VVAR page which contains the system wide VDSO data is replaced with a namespace specific page which has the same layout as the VVAR page. Signed-off-by: Andrei Vagin Reviewed-by: Vincenzo Frascino Reviewed-by: Dmitry Safonov Link: https://lore.kernel.org/r/20200624083321.144975-5-avagin@gmail.com Signed-off-by: Catalin Marinas --- arch/arm64/kernel/vdso.c | 56 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 52 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index fcb559726920..c11ee18e3e79 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -164,15 +165,62 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns) mmap_read_unlock(mm); return 0; } + +static struct page *find_timens_vvar_page(struct vm_area_struct *vma) +{ + if (likely(vma->vm_mm == current->mm)) + return current->nsproxy->time_ns->vvar_page; + + /* + * VM_PFNMAP | VM_IO protect .fault() handler from being called + * through interfaces like /proc/$pid/mem or + * process_vm_{readv,writev}() as long as there's no .access() + * in special_mapping_vmops. + * For more details check_vma_flags() and __access_remote_vm() + */ + WARN(1, "vvar_page accessed remotely"); + + return NULL; +} +#else +static struct page *find_timens_vvar_page(struct vm_area_struct *vma) +{ + return NULL; +} #endif static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, struct vm_area_struct *vma, struct vm_fault *vmf) { - if (vmf->pgoff == 0) - return vmf_insert_pfn(vma, vmf->address, - sym_to_pfn(vdso_data)); - return VM_FAULT_SIGBUS; + struct page *timens_page = find_timens_vvar_page(vma); + unsigned long pfn; + + switch (vmf->pgoff) { + case VVAR_DATA_PAGE_OFFSET: + if (timens_page) + pfn = page_to_pfn(timens_page); + else + pfn = sym_to_pfn(vdso_data); + break; +#ifdef CONFIG_TIME_NS + case VVAR_TIMENS_PAGE_OFFSET: + /* + * If a task belongs to a time namespace then a namespace + * specific VVAR is mapped with the VVAR_DATA_PAGE_OFFSET and + * the real VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET + * offset. + * See also the comment near timens_setup_vdso_data(). + */ + if (!timens_page) + return VM_FAULT_SIGBUS; + pfn = sym_to_pfn(vdso_data); + break; +#endif /* CONFIG_TIME_NS */ + default: + return VM_FAULT_SIGBUS; + } + + return vmf_insert_pfn(vma, vmf->address, pfn); } static int __setup_additional_pages(enum vdso_abi abi, From bcf996434240c611f0fdab2c18cd75dd59cfa3c2 Mon Sep 17 00:00:00 2001 From: Andrei Vagin Date: Wed, 24 Jun 2020 01:33:20 -0700 Subject: [PATCH 24/40] arm64/vdso: Restrict splitting VVAR VMA Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the amount of corner-cases to consider while working further on VDSO time namespace support. As the offset from timens to VVAR page is computed compile-time, the pages in VVAR should stay together and not being partically mremap()'ed. Signed-off-by: Andrei Vagin Reviewed-by: Vincenzo Frascino Reviewed-by: Dmitry Safonov Link: https://lore.kernel.org/r/20200624083321.144975-6-avagin@gmail.com Signed-off-by: Catalin Marinas --- arch/arm64/kernel/vdso.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index c11ee18e3e79..d4202a32abc9 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -223,6 +223,17 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, return vmf_insert_pfn(vma, vmf->address, pfn); } +static int vvar_mremap(const struct vm_special_mapping *sm, + struct vm_area_struct *new_vma) +{ + unsigned long new_size = new_vma->vm_end - new_vma->vm_start; + + if (new_size != VVAR_NR_PAGES * PAGE_SIZE) + return -EINVAL; + + return 0; +} + static int __setup_additional_pages(enum vdso_abi abi, struct mm_struct *mm, struct linux_binprm *bprm, @@ -302,6 +313,7 @@ static struct vm_special_mapping aarch32_vdso_maps[] = { [AA32_MAP_VVAR] = { .name = "[vvar]", .fault = vvar_fault, + .mremap = vvar_mremap, }, [AA32_MAP_VDSO] = { .name = "[vdso]", @@ -468,6 +480,7 @@ static struct vm_special_mapping aarch64_vdso_maps[] __ro_after_init = { [AA64_MAP_VVAR] = { .name = "[vvar]", .fault = vvar_fault, + .mremap = vvar_mremap, }, [AA64_MAP_VDSO] = { .name = "[vdso]", From 9614cc576d76a7449cd51b60ef81fd0ce19ee694 Mon Sep 17 00:00:00 2001 From: Andrei Vagin Date: Wed, 24 Jun 2020 01:33:21 -0700 Subject: [PATCH 25/40] arm64: enable time namespace support CONFIG_TIME_NS is dependes on GENERIC_VDSO_TIME_NS. Signed-off-by: Andrei Vagin Reviewed-by: Vincenzo Frascino Reviewed-by: Dmitry Safonov Link: https://lore.kernel.org/r/20200624083321.144975-7-avagin@gmail.com Signed-off-by: Catalin Marinas --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 66dc41fd49f2..87255f02ec5b 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -118,6 +118,7 @@ config ARM64 select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL select GENERIC_GETTIMEOFDAY + select GENERIC_VDSO_TIME_NS select HANDLE_DOMAIN_IRQ select HARDIRQS_SW_RESEND select HAVE_PCI From 07d2e59f27cd728e6982b52441673886a6d04267 Mon Sep 17 00:00:00 2001 From: Lorenzo Pieralisi Date: Fri, 19 Jun 2020 09:20:02 +0100 Subject: [PATCH 26/40] ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC When the iort_match_node_callback is invoked for a named component the match should be executed upon a device with an ACPI companion. For devices with no ACPI companion set-up the ACPI device tree must be walked in order to find the first parent node with a companion set and check the parent node against the named component entry to check whether there is a match and therefore an IORT node describing the in/out ID translation for the device has been found. Signed-off-by: Lorenzo Pieralisi Cc: Will Deacon Cc: Hanjun Guo Cc: Sudeep Holla Cc: Robin Murphy Cc: "Rafael J. Wysocki" Link: https://lore.kernel.org/r/20200619082013.13661-2-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/acpi/arm64/iort.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 28a6b387e80e..5eee81758184 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -264,15 +264,31 @@ static acpi_status iort_match_node_callback(struct acpi_iort_node *node, if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT) { struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL }; - struct acpi_device *adev = to_acpi_device_node(dev->fwnode); + struct acpi_device *adev; struct acpi_iort_named_component *ncomp; + struct device *nc_dev = dev; + + /* + * Walk the device tree to find a device with an + * ACPI companion; there is no point in scanning + * IORT for a device matching a named component if + * the device does not have an ACPI companion to + * start with. + */ + do { + adev = ACPI_COMPANION(nc_dev); + if (adev) + break; + + nc_dev = nc_dev->parent; + } while (nc_dev); if (!adev) goto out; status = acpi_get_name(adev->handle, ACPI_FULL_PATHNAME, &buf); if (ACPI_FAILURE(status)) { - dev_warn(dev, "Can't get device full path name\n"); + dev_warn(nc_dev, "Can't get device full path name\n"); goto out; } From d1718a1b7a86743b9c517bf9521695ba909c734f Mon Sep 17 00:00:00 2001 From: Lorenzo Pieralisi Date: Fri, 19 Jun 2020 09:20:03 +0100 Subject: [PATCH 27/40] ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic iort_get_device_domain() is PCI specific but it need not be, since it can be used to retrieve IRQ domain nexus of any kind by adding an irq_domain_bus_token input to it. Make it PCI agnostic by also renaming the requestor ID input to a more generic ID name. Signed-off-by: Lorenzo Pieralisi Acked-by: Bjorn Helgaas # pci/msi.c Cc: Will Deacon Cc: Hanjun Guo Cc: Bjorn Helgaas Cc: Sudeep Holla Cc: Robin Murphy Cc: "Rafael J. Wysocki" Link: https://lore.kernel.org/r/20200619082013.13661-3-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/acpi/arm64/iort.c | 14 +++++++------- drivers/pci/msi.c | 3 ++- include/linux/acpi_iort.h | 7 ++++--- 3 files changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 5eee81758184..902e2aaca946 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -550,7 +550,6 @@ static struct acpi_iort_node *iort_find_dev_node(struct device *dev) node = iort_get_iort_node(dev->fwnode); if (node) return node; - /* * if not, then it should be a platform device defined in * DSDT/SSDT (with Named Component node in IORT) @@ -641,13 +640,13 @@ static int __maybe_unused iort_find_its_base(u32 its_id, phys_addr_t *base) /** * iort_dev_find_its_id() - Find the ITS identifier for a device * @dev: The device. - * @req_id: Device's requester ID + * @id: Device's ID * @idx: Index of the ITS identifier list. * @its_id: ITS identifier. * * Returns: 0 on success, appropriate error value otherwise */ -static int iort_dev_find_its_id(struct device *dev, u32 req_id, +static int iort_dev_find_its_id(struct device *dev, u32 id, unsigned int idx, int *its_id) { struct acpi_iort_its_group *its; @@ -657,7 +656,7 @@ static int iort_dev_find_its_id(struct device *dev, u32 req_id, if (!node) return -ENXIO; - node = iort_node_map_id(node, req_id, NULL, IORT_MSI_TYPE); + node = iort_node_map_id(node, id, NULL, IORT_MSI_TYPE); if (!node) return -ENXIO; @@ -680,19 +679,20 @@ static int iort_dev_find_its_id(struct device *dev, u32 req_id, * * Returns: the MSI domain for this device, NULL otherwise */ -struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id) +struct irq_domain *iort_get_device_domain(struct device *dev, u32 id, + enum irq_domain_bus_token bus_token) { struct fwnode_handle *handle; int its_id; - if (iort_dev_find_its_id(dev, req_id, 0, &its_id)) + if (iort_dev_find_its_id(dev, id, 0, &its_id)) return NULL; handle = iort_find_domain_token(its_id); if (!handle) return NULL; - return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI); + return irq_find_matching_fwnode(handle, bus_token); } static void iort_set_device_domain(struct device *dev, diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 6b43a5455c7a..74a91f52ecc0 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1558,7 +1558,8 @@ struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev) pci_for_each_dma_alias(pdev, get_msi_id_cb, &rid); dom = of_msi_map_get_device_domain(&pdev->dev, rid); if (!dom) - dom = iort_get_device_domain(&pdev->dev, rid); + dom = iort_get_device_domain(&pdev->dev, rid, + DOMAIN_BUS_PCI_MSI); return dom; } #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */ diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index 8e7e2ec37f1b..08ec6bd2297f 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -29,7 +29,8 @@ struct fwnode_handle *iort_find_domain_token(int trans_id); #ifdef CONFIG_ACPI_IORT void acpi_iort_init(void); u32 iort_msi_map_rid(struct device *dev, u32 req_id); -struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id); +struct irq_domain *iort_get_device_domain(struct device *dev, u32 id, + enum irq_domain_bus_token bus_token); void acpi_configure_pmsi_domain(struct device *dev); int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); /* IOMMU interface */ @@ -40,8 +41,8 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); static inline void acpi_iort_init(void) { } static inline u32 iort_msi_map_rid(struct device *dev, u32 req_id) { return req_id; } -static inline struct irq_domain *iort_get_device_domain(struct device *dev, - u32 req_id) +static inline struct irq_domain *iort_get_device_domain( + struct device *dev, u32 id, enum irq_domain_bus_token bus_token) { return NULL; } static inline void acpi_configure_pmsi_domain(struct device *dev) { } /* IOMMU interface */ From 39c3cf566ceafa7c1ae331a5f26fbb685d670001 Mon Sep 17 00:00:00 2001 From: Lorenzo Pieralisi Date: Fri, 19 Jun 2020 09:20:04 +0100 Subject: [PATCH 28/40] ACPI/IORT: Make iort_msi_map_rid() PCI agnostic There is nothing PCI specific in iort_msi_map_rid(). Rename the function using a bus protocol agnostic name, iort_msi_map_id(), and convert current callers to it. Signed-off-by: Lorenzo Pieralisi Acked-by: Bjorn Helgaas Cc: Will Deacon Cc: Hanjun Guo Cc: Bjorn Helgaas Cc: Sudeep Holla Cc: Robin Murphy Cc: "Rafael J. Wysocki" Link: https://lore.kernel.org/r/20200619082013.13661-4-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/acpi/arm64/iort.c | 12 ++++++------ drivers/pci/msi.c | 2 +- include/linux/acpi_iort.h | 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 902e2aaca946..53f9ef515089 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -568,22 +568,22 @@ static struct acpi_iort_node *iort_find_dev_node(struct device *dev) } /** - * iort_msi_map_rid() - Map a MSI requester ID for a device + * iort_msi_map_id() - Map a MSI input ID for a device * @dev: The device for which the mapping is to be done. - * @req_id: The device requester ID. + * @input_id: The device input ID. * - * Returns: mapped MSI RID on success, input requester ID otherwise + * Returns: mapped MSI ID on success, input ID otherwise */ -u32 iort_msi_map_rid(struct device *dev, u32 req_id) +u32 iort_msi_map_id(struct device *dev, u32 input_id) { struct acpi_iort_node *node; u32 dev_id; node = iort_find_dev_node(dev); if (!node) - return req_id; + return input_id; - iort_node_map_id(node, req_id, &dev_id, IORT_MSI_TYPE); + iort_node_map_id(node, input_id, &dev_id, IORT_MSI_TYPE); return dev_id; } diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 74a91f52ecc0..77f48b95e277 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1536,7 +1536,7 @@ u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev) of_node = irq_domain_get_of_node(domain); rid = of_node ? of_msi_map_rid(&pdev->dev, of_node, rid) : - iort_msi_map_rid(&pdev->dev, rid); + iort_msi_map_id(&pdev->dev, rid); return rid; } diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index 08ec6bd2297f..e51425e083da 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -28,7 +28,7 @@ void iort_deregister_domain_token(int trans_id); struct fwnode_handle *iort_find_domain_token(int trans_id); #ifdef CONFIG_ACPI_IORT void acpi_iort_init(void); -u32 iort_msi_map_rid(struct device *dev, u32 req_id); +u32 iort_msi_map_id(struct device *dev, u32 id); struct irq_domain *iort_get_device_domain(struct device *dev, u32 id, enum irq_domain_bus_token bus_token); void acpi_configure_pmsi_domain(struct device *dev); @@ -39,8 +39,8 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); #else static inline void acpi_iort_init(void) { } -static inline u32 iort_msi_map_rid(struct device *dev, u32 req_id) -{ return req_id; } +static inline u32 iort_msi_map_id(struct device *dev, u32 id) +{ return id; } static inline struct irq_domain *iort_get_device_domain( struct device *dev, u32 id, enum irq_domain_bus_token bus_token) { return NULL; } From 3a3d208beede7ae03f8c80bed01f47d6b98d4ceb Mon Sep 17 00:00:00 2001 From: Lorenzo Pieralisi Date: Fri, 19 Jun 2020 09:20:05 +0100 Subject: [PATCH 29/40] ACPI/IORT: Remove useless PCI bus walk The PCI bus domain number (used in the iort_match_node_callback() - pci_domain_nr() call) is cascaded through the PCI bus hierarchy at PCI bus enumeration time, therefore there is no need in iort_find_dev_node() to walk the PCI bus upwards to grab the root bus to be passed to iort_scan_node(), the device->bus PCI bus pointer will do. Remove this useless code. Signed-off-by: Lorenzo Pieralisi Cc: Will Deacon Cc: Hanjun Guo Cc: Sudeep Holla Cc: Robin Murphy Cc: "Rafael J. Wysocki" Link: https://lore.kernel.org/r/20200619082013.13661-5-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/acpi/arm64/iort.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 53f9ef515089..421c6976ab81 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -558,10 +558,7 @@ static struct acpi_iort_node *iort_find_dev_node(struct device *dev) iort_match_node_callback, dev); } - /* Find a PCI root bus */ pbus = to_pci_dev(dev)->bus; - while (!pci_is_root_bus(pbus)) - pbus = pbus->parent; return iort_scan_node(ACPI_IORT_NODE_PCI_ROOT_COMPLEX, iort_match_node_callback, &pbus->dev); From b8e069a2a8da02137605ba585837a3a0c45df01a Mon Sep 17 00:00:00 2001 From: Lorenzo Pieralisi Date: Fri, 19 Jun 2020 09:20:06 +0100 Subject: [PATCH 30/40] ACPI/IORT: Add an input ID to acpi_dma_configure() Some HW devices are created as child devices of proprietary busses, that have a bus specific policy defining how the child devices wires representing the devices ID are translated into IOMMU and IRQ controllers device IDs. Current IORT code provides translations for: - PCI devices, where the device ID is well identified at bus level as the requester ID (RID) - Platform devices that are endpoint devices where the device ID is retrieved from the ACPI object IORT mappings (Named components single mappings). A platform device is represented in IORT as a named component node For devices that are child devices of proprietary busses the IORT firmware represents the bus node as a named component node in IORT and it is up to that named component node to define in/out bus specific ID translations for the bus child devices that are allocated and created in a bus specific manner. In order to make IORT ID translations available for proprietary bus child devices, the current ACPI (and IORT) code must be augmented to provide an additional ID parameter to acpi_dma_configure() representing the child devices input ID. This ID is bus specific and it is retrieved in bus specific code. By adding an ID parameter to acpi_dma_configure(), the IORT code can map the child device ID to an IOMMU stream ID through the IORT named component representing the bus in/out ID mappings. Signed-off-by: Lorenzo Pieralisi Cc: Will Deacon Cc: Hanjun Guo Cc: Sudeep Holla Cc: Robin Murphy Cc: "Rafael J. Wysocki" Link: https://lore.kernel.org/r/20200619082013.13661-6-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/acpi/arm64/iort.c | 59 +++++++++++++++++++++++++++++---------- drivers/acpi/scan.c | 8 ++++-- include/acpi/acpi_bus.h | 9 ++++-- include/linux/acpi.h | 7 +++++ include/linux/acpi_iort.h | 7 +++-- 5 files changed, 67 insertions(+), 23 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 421c6976ab81..ec782e4a0fe4 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -978,19 +978,54 @@ static void iort_named_component_init(struct device *dev, nc->node_flags); } +static int iort_nc_iommu_map(struct device *dev, struct acpi_iort_node *node) +{ + struct acpi_iort_node *parent; + int err = -ENODEV, i = 0; + u32 streamid = 0; + + do { + + parent = iort_node_map_platform_id(node, &streamid, + IORT_IOMMU_TYPE, + i++); + + if (parent) + err = iort_iommu_xlate(dev, parent, streamid); + } while (parent && !err); + + return err; +} + +static int iort_nc_iommu_map_id(struct device *dev, + struct acpi_iort_node *node, + const u32 *in_id) +{ + struct acpi_iort_node *parent; + u32 streamid; + + parent = iort_node_map_id(node, *in_id, &streamid, IORT_IOMMU_TYPE); + if (parent) + return iort_iommu_xlate(dev, parent, streamid); + + return -ENODEV; +} + + /** - * iort_iommu_configure - Set-up IOMMU configuration for a device. + * iort_iommu_configure_id - Set-up IOMMU configuration for a device. * * @dev: device to configure + * @id_in: optional input id const value pointer * * Returns: iommu_ops pointer on configuration success * NULL on configuration failure */ -const struct iommu_ops *iort_iommu_configure(struct device *dev) +const struct iommu_ops *iort_iommu_configure_id(struct device *dev, + const u32 *id_in) { - struct acpi_iort_node *node, *parent; + struct acpi_iort_node *node; const struct iommu_ops *ops; - u32 streamid = 0; int err = -ENODEV; /* @@ -1019,21 +1054,13 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) if (fwspec && iort_pci_rc_supports_ats(node)) fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS; } else { - int i = 0; - node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT, iort_match_node_callback, dev); if (!node) return NULL; - do { - parent = iort_node_map_platform_id(node, &streamid, - IORT_IOMMU_TYPE, - i++); - - if (parent) - err = iort_iommu_xlate(dev, parent, streamid); - } while (parent && !err); + err = id_in ? iort_nc_iommu_map_id(dev, node, id_in) : + iort_nc_iommu_map(dev, node); if (!err) iort_named_component_init(dev, node); @@ -1058,6 +1085,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) return ops; } + #else static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device *dev) { return NULL; } @@ -1066,7 +1094,8 @@ static inline int iort_add_device_replay(const struct iommu_ops *ops, { return 0; } int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { return 0; } -const struct iommu_ops *iort_iommu_configure(struct device *dev) +const struct iommu_ops *iort_iommu_configure_id(struct device *dev, + const u32 *input_id) { return NULL; } #endif diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 8777faced51a..2142f1554761 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -1457,8 +1457,10 @@ int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset, * acpi_dma_configure - Set-up DMA configuration for the device. * @dev: The pointer to the device * @attr: device dma attributes + * @input_id: input device id const value pointer */ -int acpi_dma_configure(struct device *dev, enum dev_dma_attr attr) +int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr, + const u32 *input_id) { const struct iommu_ops *iommu; u64 dma_addr = 0, size = 0; @@ -1470,7 +1472,7 @@ int acpi_dma_configure(struct device *dev, enum dev_dma_attr attr) iort_dma_setup(dev, &dma_addr, &size); - iommu = iort_iommu_configure(dev); + iommu = iort_iommu_configure_id(dev, input_id); if (PTR_ERR(iommu) == -EPROBE_DEFER) return -EPROBE_DEFER; @@ -1479,7 +1481,7 @@ int acpi_dma_configure(struct device *dev, enum dev_dma_attr attr) return 0; } -EXPORT_SYMBOL_GPL(acpi_dma_configure); +EXPORT_SYMBOL_GPL(acpi_dma_configure_id); static void acpi_init_coherency(struct acpi_device *adev) { diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 5afb6ceb284f..a3abcc4b7d9f 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -588,8 +588,13 @@ bool acpi_dma_supported(struct acpi_device *adev); enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev); int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset, u64 *size); -int acpi_dma_configure(struct device *dev, enum dev_dma_attr attr); - +int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr, + const u32 *input_id); +static inline int acpi_dma_configure(struct device *dev, + enum dev_dma_attr attr) +{ + return acpi_dma_configure_id(dev, attr, NULL); +} struct acpi_device *acpi_find_child_device(struct acpi_device *parent, u64 address, bool check_children); int acpi_is_root_bridge(acpi_handle); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index d661cd0ee64d..6d2c47489d90 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -905,6 +905,13 @@ static inline int acpi_dma_configure(struct device *dev, return 0; } +static inline int acpi_dma_configure_id(struct device *dev, + enum dev_dma_attr attr, + const u32 *input_id) +{ + return 0; +} + #define ACPI_PTR(_ptr) (NULL) static inline void acpi_device_set_enumerated(struct acpi_device *adev) diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index e51425e083da..20a32120bb88 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -35,7 +35,8 @@ void acpi_configure_pmsi_domain(struct device *dev); int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); /* IOMMU interface */ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size); -const struct iommu_ops *iort_iommu_configure(struct device *dev); +const struct iommu_ops *iort_iommu_configure_id(struct device *dev, + const u32 *id_in); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); #else static inline void acpi_iort_init(void) { } @@ -48,8 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device *dev) { } /* IOMMU interface */ static inline void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size) { } -static inline const struct iommu_ops *iort_iommu_configure( - struct device *dev) +static inline const struct iommu_ops *iort_iommu_configure_id( + struct device *dev, const u32 *id_in) { return NULL; } static inline int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) From 746a71d02b5d15817fcb13c956ba999a87773952 Mon Sep 17 00:00:00 2001 From: Lorenzo Pieralisi Date: Fri, 19 Jun 2020 09:20:07 +0100 Subject: [PATCH 31/40] of/iommu: Make of_map_rid() PCI agnostic There is nothing PCI specific (other than the RID - requester ID) in the of_map_rid() implementation, so the same function can be reused for input/output IDs mapping for other busses just as well. Rename the RID instances/names to a generic "id" tag. No functionality change intended. Signed-off-by: Lorenzo Pieralisi Reviewed-by: Rob Herring Acked-by: Joerg Roedel Cc: Rob Herring Cc: Joerg Roedel Cc: Robin Murphy Cc: Marc Zyngier Link: https://lore.kernel.org/r/20200619082013.13661-7-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/iommu/of_iommu.c | 4 ++-- drivers/of/base.c | 42 ++++++++++++++++++++-------------------- drivers/of/irq.c | 2 +- include/linux/of.h | 4 ++-- 4 files changed, 26 insertions(+), 26 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index 20738aacac89..016316244737 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -129,7 +129,7 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data) struct of_phandle_args iommu_spec = { .args_count = 1 }; int err; - err = of_map_rid(info->np, alias, "iommu-map", "iommu-map-mask", + err = of_map_id(info->np, alias, "iommu-map", "iommu-map-mask", &iommu_spec.np, iommu_spec.args); if (err) return err == -ENODEV ? NO_IOMMU : err; @@ -145,7 +145,7 @@ static int of_fsl_mc_iommu_init(struct fsl_mc_device *mc_dev, struct of_phandle_args iommu_spec = { .args_count = 1 }; int err; - err = of_map_rid(master_np, mc_dev->icid, "iommu-map", + err = of_map_id(master_np, mc_dev->icid, "iommu-map", "iommu-map-mask", &iommu_spec.np, iommu_spec.args); if (err) diff --git a/drivers/of/base.c b/drivers/of/base.c index ae03b1218b06..ea44fea99813 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -2201,15 +2201,15 @@ int of_find_last_cache_level(unsigned int cpu) } /** - * of_map_rid - Translate a requester ID through a downstream mapping. + * of_map_id - Translate an ID through a downstream mapping. * @np: root complex device node. - * @rid: device requester ID to map. + * @id: device ID to map. * @map_name: property name of the map to use. * @map_mask_name: optional property name of the mask to use. * @target: optional pointer to a target device node. * @id_out: optional pointer to receive the translated ID. * - * Given a device requester ID, look up the appropriate implementation-defined + * Given a device ID, look up the appropriate implementation-defined * platform ID and/or the target device which receives transactions on that * ID, as per the "iommu-map" and "msi-map" bindings. Either of @target or * @id_out may be NULL if only the other is required. If @target points to @@ -2219,11 +2219,11 @@ int of_find_last_cache_level(unsigned int cpu) * * Return: 0 on success or a standard error code on failure. */ -int of_map_rid(struct device_node *np, u32 rid, +int of_map_id(struct device_node *np, u32 id, const char *map_name, const char *map_mask_name, struct device_node **target, u32 *id_out) { - u32 map_mask, masked_rid; + u32 map_mask, masked_id; int map_len; const __be32 *map = NULL; @@ -2235,7 +2235,7 @@ int of_map_rid(struct device_node *np, u32 rid, if (target) return -ENODEV; /* Otherwise, no map implies no translation */ - *id_out = rid; + *id_out = id; return 0; } @@ -2255,22 +2255,22 @@ int of_map_rid(struct device_node *np, u32 rid, if (map_mask_name) of_property_read_u32(np, map_mask_name, &map_mask); - masked_rid = map_mask & rid; + masked_id = map_mask & id; for ( ; map_len > 0; map_len -= 4 * sizeof(*map), map += 4) { struct device_node *phandle_node; - u32 rid_base = be32_to_cpup(map + 0); + u32 id_base = be32_to_cpup(map + 0); u32 phandle = be32_to_cpup(map + 1); u32 out_base = be32_to_cpup(map + 2); - u32 rid_len = be32_to_cpup(map + 3); + u32 id_len = be32_to_cpup(map + 3); - if (rid_base & ~map_mask) { - pr_err("%pOF: Invalid %s translation - %s-mask (0x%x) ignores rid-base (0x%x)\n", + if (id_base & ~map_mask) { + pr_err("%pOF: Invalid %s translation - %s-mask (0x%x) ignores id-base (0x%x)\n", np, map_name, map_name, - map_mask, rid_base); + map_mask, id_base); return -EFAULT; } - if (masked_rid < rid_base || masked_rid >= rid_base + rid_len) + if (masked_id < id_base || masked_id >= id_base + id_len) continue; phandle_node = of_find_node_by_phandle(phandle); @@ -2288,20 +2288,20 @@ int of_map_rid(struct device_node *np, u32 rid, } if (id_out) - *id_out = masked_rid - rid_base + out_base; + *id_out = masked_id - id_base + out_base; - pr_debug("%pOF: %s, using mask %08x, rid-base: %08x, out-base: %08x, length: %08x, rid: %08x -> %08x\n", - np, map_name, map_mask, rid_base, out_base, - rid_len, rid, masked_rid - rid_base + out_base); + pr_debug("%pOF: %s, using mask %08x, id-base: %08x, out-base: %08x, length: %08x, id: %08x -> %08x\n", + np, map_name, map_mask, id_base, out_base, + id_len, id, masked_id - id_base + out_base); return 0; } - pr_info("%pOF: no %s translation for rid 0x%x on %pOF\n", np, map_name, - rid, target && *target ? *target : NULL); + pr_info("%pOF: no %s translation for id 0x%x on %pOF\n", np, map_name, + id, target && *target ? *target : NULL); /* Bypasses translation */ if (id_out) - *id_out = rid; + *id_out = id; return 0; } -EXPORT_SYMBOL_GPL(of_map_rid); +EXPORT_SYMBOL_GPL(of_map_id); diff --git a/drivers/of/irq.c b/drivers/of/irq.c index a296eaf52a5b..d632bc5b3a2d 100644 --- a/drivers/of/irq.c +++ b/drivers/of/irq.c @@ -587,7 +587,7 @@ static u32 __of_msi_map_rid(struct device *dev, struct device_node **np, * "msi-map" property. */ for (parent_dev = dev; parent_dev; parent_dev = parent_dev->parent) - if (!of_map_rid(parent_dev->of_node, rid_in, "msi-map", + if (!of_map_id(parent_dev->of_node, rid_in, "msi-map", "msi-map-mask", np, &rid_out)) break; return rid_out; diff --git a/include/linux/of.h b/include/linux/of.h index c669c0a4732f..60abe3f636ad 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -554,7 +554,7 @@ bool of_console_check(struct device_node *dn, char *name, int index); extern int of_cpu_node_to_id(struct device_node *np); -int of_map_rid(struct device_node *np, u32 rid, +int of_map_id(struct device_node *np, u32 id, const char *map_name, const char *map_mask_name, struct device_node **target, u32 *id_out); @@ -978,7 +978,7 @@ static inline int of_cpu_node_to_id(struct device_node *np) return -ENODEV; } -static inline int of_map_rid(struct device_node *np, u32 rid, +static inline int of_map_id(struct device_node *np, u32 id, const char *map_name, const char *map_mask_name, struct device_node **target, u32 *id_out) { From a081bd4af4ce80d845a0bab355ab5d0822db8058 Mon Sep 17 00:00:00 2001 From: Lorenzo Pieralisi Date: Fri, 19 Jun 2020 09:20:08 +0100 Subject: [PATCH 32/40] of/device: Add input id to of_dma_configure() Devices sitting on proprietary busses have a device ID space that is owned by the respective bus and related firmware bindings. In order to let the generic OF layer handle the input translations to an IOMMU id, for such busses the current of_dma_configure() interface should be extended in order to allow the bus layer to provide the device input id parameter - that is retrieved/assigned in bus specific code and firmware. Augment of_dma_configure() to add an optional input_id parameter, leaving current functionality unchanged. Signed-off-by: Lorenzo Pieralisi Reviewed-by: Rob Herring Cc: Rob Herring Cc: Robin Murphy Cc: Joerg Roedel Cc: Laurentiu Tudor Link: https://lore.kernel.org/r/20200619082013.13661-8-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/bus/fsl-mc/fsl-mc-bus.c | 4 +- drivers/iommu/of_iommu.c | 85 ++++++++++++++++++--------------- drivers/of/device.c | 8 ++-- include/linux/of_device.h | 16 ++++++- include/linux/of_iommu.h | 6 ++- 5 files changed, 72 insertions(+), 47 deletions(-) diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c index 40526da5c6a6..8ead3f0238f2 100644 --- a/drivers/bus/fsl-mc/fsl-mc-bus.c +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c @@ -118,11 +118,13 @@ static int fsl_mc_bus_uevent(struct device *dev, struct kobj_uevent_env *env) static int fsl_mc_dma_configure(struct device *dev) { struct device *dma_dev = dev; + struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev); + u32 input_id = mc_dev->icid; while (dev_is_fsl_mc(dma_dev)) dma_dev = dma_dev->parent; - return of_dma_configure(dev, dma_dev->of_node, 0); + return of_dma_configure_id(dev, dma_dev->of_node, 0, &input_id); } static ssize_t modalias_show(struct device *dev, struct device_attribute *attr, diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index 016316244737..e505b9130a1c 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -118,6 +118,43 @@ static int of_iommu_xlate(struct device *dev, return ret; } +static int of_iommu_configure_dev_id(struct device_node *master_np, + struct device *dev, + const u32 *id) +{ + struct of_phandle_args iommu_spec = { .args_count = 1 }; + int err; + + err = of_map_id(master_np, *id, "iommu-map", + "iommu-map-mask", &iommu_spec.np, + iommu_spec.args); + if (err) + return err == -ENODEV ? NO_IOMMU : err; + + err = of_iommu_xlate(dev, &iommu_spec); + of_node_put(iommu_spec.np); + return err; +} + +static int of_iommu_configure_dev(struct device_node *master_np, + struct device *dev) +{ + struct of_phandle_args iommu_spec; + int err = NO_IOMMU, idx = 0; + + while (!of_parse_phandle_with_args(master_np, "iommus", + "#iommu-cells", + idx, &iommu_spec)) { + err = of_iommu_xlate(dev, &iommu_spec); + of_node_put(iommu_spec.np); + idx++; + if (err) + break; + } + + return err; +} + struct of_pci_iommu_alias_info { struct device *dev; struct device_node *np; @@ -126,38 +163,21 @@ struct of_pci_iommu_alias_info { static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data) { struct of_pci_iommu_alias_info *info = data; - struct of_phandle_args iommu_spec = { .args_count = 1 }; - int err; + u32 input_id = alias; - err = of_map_id(info->np, alias, "iommu-map", "iommu-map-mask", - &iommu_spec.np, iommu_spec.args); - if (err) - return err == -ENODEV ? NO_IOMMU : err; - - err = of_iommu_xlate(info->dev, &iommu_spec); - of_node_put(iommu_spec.np); - return err; + return of_iommu_configure_dev_id(info->np, info->dev, &input_id); } -static int of_fsl_mc_iommu_init(struct fsl_mc_device *mc_dev, - struct device_node *master_np) +static int of_iommu_configure_device(struct device_node *master_np, + struct device *dev, const u32 *id) { - struct of_phandle_args iommu_spec = { .args_count = 1 }; - int err; - - err = of_map_id(master_np, mc_dev->icid, "iommu-map", - "iommu-map-mask", &iommu_spec.np, - iommu_spec.args); - if (err) - return err == -ENODEV ? NO_IOMMU : err; - - err = of_iommu_xlate(&mc_dev->dev, &iommu_spec); - of_node_put(iommu_spec.np); - return err; + return (id) ? of_iommu_configure_dev_id(master_np, dev, id) : + of_iommu_configure_dev(master_np, dev); } const struct iommu_ops *of_iommu_configure(struct device *dev, - struct device_node *master_np) + struct device_node *master_np, + const u32 *id) { const struct iommu_ops *ops = NULL; struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); @@ -188,21 +208,8 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, pci_request_acs(); err = pci_for_each_dma_alias(to_pci_dev(dev), of_pci_iommu_init, &info); - } else if (dev_is_fsl_mc(dev)) { - err = of_fsl_mc_iommu_init(to_fsl_mc_device(dev), master_np); } else { - struct of_phandle_args iommu_spec; - int idx = 0; - - while (!of_parse_phandle_with_args(master_np, "iommus", - "#iommu-cells", - idx, &iommu_spec)) { - err = of_iommu_xlate(dev, &iommu_spec); - of_node_put(iommu_spec.np); - idx++; - if (err) - break; - } + err = of_iommu_configure_device(master_np, dev, id); fwspec = dev_iommu_fwspec_get(dev); if (!err && fwspec) diff --git a/drivers/of/device.c b/drivers/of/device.c index 27203bfd0b22..b439c1e05434 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -78,6 +78,7 @@ int of_device_add(struct platform_device *ofdev) * @np: Pointer to OF node having DMA configuration * @force_dma: Whether device is to be set up by of_dma_configure() even if * DMA capability is not explicitly described by firmware. + * @id: Optional const pointer value input id * * Try to get devices's DMA configuration from DT and update it * accordingly. @@ -86,7 +87,8 @@ int of_device_add(struct platform_device *ofdev) * can use a platform bus notifier and handle BUS_NOTIFY_ADD_DEVICE events * to fix up DMA configuration. */ -int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma) +int of_dma_configure_id(struct device *dev, struct device_node *np, + bool force_dma, const u32 *id) { u64 dma_addr, paddr, size = 0; int ret; @@ -160,7 +162,7 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma) dev_dbg(dev, "device is%sdma coherent\n", coherent ? " " : " not "); - iommu = of_iommu_configure(dev, np); + iommu = of_iommu_configure(dev, np, id); if (PTR_ERR(iommu) == -EPROBE_DEFER) return -EPROBE_DEFER; @@ -171,7 +173,7 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma) return 0; } -EXPORT_SYMBOL_GPL(of_dma_configure); +EXPORT_SYMBOL_GPL(of_dma_configure_id); int of_device_register(struct platform_device *pdev) { diff --git a/include/linux/of_device.h b/include/linux/of_device.h index 8d31e39dd564..07ca187fc5e4 100644 --- a/include/linux/of_device.h +++ b/include/linux/of_device.h @@ -55,9 +55,15 @@ static inline struct device_node *of_cpu_device_node_get(int cpu) return of_node_get(cpu_dev->of_node); } -int of_dma_configure(struct device *dev, +int of_dma_configure_id(struct device *dev, struct device_node *np, - bool force_dma); + bool force_dma, const u32 *id); +static inline int of_dma_configure(struct device *dev, + struct device_node *np, + bool force_dma) +{ + return of_dma_configure_id(dev, np, force_dma, NULL); +} #else /* CONFIG_OF */ static inline int of_driver_match_device(struct device *dev, @@ -106,6 +112,12 @@ static inline struct device_node *of_cpu_device_node_get(int cpu) return NULL; } +static inline int of_dma_configure_id(struct device *dev, + struct device_node *np, + bool force_dma) +{ + return 0; +} static inline int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma) diff --git a/include/linux/of_iommu.h b/include/linux/of_iommu.h index f3d40dd7bb66..16f4b3e87f20 100644 --- a/include/linux/of_iommu.h +++ b/include/linux/of_iommu.h @@ -13,7 +13,8 @@ extern int of_get_dma_window(struct device_node *dn, const char *prefix, size_t *size); extern const struct iommu_ops *of_iommu_configure(struct device *dev, - struct device_node *master_np); + struct device_node *master_np, + const u32 *id); #else @@ -25,7 +26,8 @@ static inline int of_get_dma_window(struct device_node *dn, const char *prefix, } static inline const struct iommu_ops *of_iommu_configure(struct device *dev, - struct device_node *master_np) + struct device_node *master_np, + const u32 *id) { return NULL; } From 5bda70c6162de9536cc983eacd24261c9c5de596 Mon Sep 17 00:00:00 2001 From: Laurentiu Tudor Date: Fri, 19 Jun 2020 09:20:09 +0100 Subject: [PATCH 33/40] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus The existing bindings cannot be used to specify the relationship between fsl-mc devices and GIC ITSes. Add a generic binding for mapping fsl-mc devices to GIC ITSes, using msi-map property. In addition, deprecate msi-parent property which no longer makes sense now that we support translating the MSIs. Signed-off-by: Laurentiu Tudor Signed-off-by: Diana Craciun Reviewed-by: Rob Herring Cc: Rob Herring Link: https://lore.kernel.org/r/20200619082013.13661-9-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 50 ++++++++++++++++--- 1 file changed, 44 insertions(+), 6 deletions(-) diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt index 9134e9bcca56..ebd329181c14 100644 --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt @@ -28,6 +28,16 @@ Documentation/devicetree/bindings/iommu/iommu.txt. For arm-smmu binding, see: Documentation/devicetree/bindings/iommu/arm,smmu.yaml. +The MSI writes are accompanied by sideband data which is derived from the ICID. +The msi-map property is used to associate the devices with both the ITS +controller and the sideband data which accompanies the writes. + +For generic MSI bindings, see +Documentation/devicetree/bindings/interrupt-controller/msi.txt. + +For GICv3 and GIC ITS bindings, see: +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. + Required properties: - compatible @@ -49,11 +59,6 @@ Required properties: region may not be present in some scenarios, such as in the device tree presented to a virtual machine. - - msi-parent - Value type: - Definition: Must be present and point to the MSI controller node - handling message interrupts for the MC. - - ranges Value type: Definition: A standard property. Defines the mapping between the child @@ -119,6 +124,28 @@ Optional properties: associated with the listed IOMMU, with the iommu-specifier (i - icid-base + iommu-base). +- msi-map: Maps an ICID to a GIC ITS and associated msi-specifier + data. + + The property is an arbitrary number of tuples of + (icid-base,gic-its,msi-base,length). + + Any ICID in the interval [icid-base, icid-base + length) is + associated with the listed GIC ITS, with the msi-specifier + (i - icid-base + msi-base). + +Deprecated properties: + + - msi-parent + Value type: + Definition: Describes the MSI controller node handling message + interrupts for the MC. When there is no translation + between the ICID and deviceID this property can be used + to describe the MSI controller used by the devices on the + mc-bus. + The use of this property for mc-bus is deprecated. Please + use msi-map. + Example: smmu: iommu@5000000 { @@ -128,13 +155,24 @@ Example: ... }; + gic: interrupt-controller@6000000 { + compatible = "arm,gic-v3"; + ... + } + its: gic-its@6020000 { + compatible = "arm,gic-v3-its"; + msi-controller; + ... + }; + fsl_mc: fsl-mc@80c000000 { compatible = "fsl,qoriq-mc"; reg = <0x00000008 0x0c000000 0 0x40>, /* MC portal base */ <0x00000000 0x08340000 0 0x40000>; /* MC control reg */ - msi-parent = <&its>; /* define map for ICIDs 23-64 */ iommu-map = <23 &smmu 23 41>; + /* define msi map for ICIDs 23-64 */ + msi-map = <23 &its 23 41>; #address-cells = <3>; #size-cells = <1>; From 6f881aba01109a01a43e4f135673c19190f61133 Mon Sep 17 00:00:00 2001 From: Diana Craciun Date: Fri, 19 Jun 2020 09:20:10 +0100 Subject: [PATCH 34/40] of/irq: make of_msi_map_get_device_domain() bus agnostic of_msi_map_get_device_domain() is PCI specific but it need not be and can be easily changed to be bus agnostic in order to be used by other busses by adding an IRQ domain bus token as an input parameter. Signed-off-by: Diana Craciun Signed-off-by: Lorenzo Pieralisi Reviewed-by: Rob Herring Acked-by: Bjorn Helgaas # pci/msi.c Cc: Bjorn Helgaas Cc: Rob Herring Cc: Marc Zyngier Link: https://lore.kernel.org/r/20200619082013.13661-10-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/of/irq.c | 8 +++++--- drivers/pci/msi.c | 2 +- include/linux/of_irq.h | 5 +++-- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/of/irq.c b/drivers/of/irq.c index d632bc5b3a2d..1005e4f349ef 100644 --- a/drivers/of/irq.c +++ b/drivers/of/irq.c @@ -613,18 +613,20 @@ u32 of_msi_map_rid(struct device *dev, struct device_node *msi_np, u32 rid_in) * of_msi_map_get_device_domain - Use msi-map to find the relevant MSI domain * @dev: device for which the mapping is to be done. * @rid: Requester ID for the device. + * @bus_token: Bus token * * Walk up the device hierarchy looking for devices with a "msi-map" * property. * * Returns: the MSI domain for this device (or NULL on failure) */ -struct irq_domain *of_msi_map_get_device_domain(struct device *dev, u32 rid) +struct irq_domain *of_msi_map_get_device_domain(struct device *dev, u32 id, + u32 bus_token) { struct device_node *np = NULL; - __of_msi_map_rid(dev, &np, rid); - return irq_find_matching_host(np, DOMAIN_BUS_PCI_MSI); + __of_msi_map_rid(dev, &np, id); + return irq_find_matching_host(np, bus_token); } /** diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 77f48b95e277..b4bfe0b03b2d 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1556,7 +1556,7 @@ struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev) u32 rid = pci_dev_id(pdev); pci_for_each_dma_alias(pdev, get_msi_id_cb, &rid); - dom = of_msi_map_get_device_domain(&pdev->dev, rid); + dom = of_msi_map_get_device_domain(&pdev->dev, rid, DOMAIN_BUS_PCI_MSI); if (!dom) dom = iort_get_device_domain(&pdev->dev, rid, DOMAIN_BUS_PCI_MSI); diff --git a/include/linux/of_irq.h b/include/linux/of_irq.h index 1214cabb2247..7142a3722758 100644 --- a/include/linux/of_irq.h +++ b/include/linux/of_irq.h @@ -52,7 +52,8 @@ extern struct irq_domain *of_msi_get_domain(struct device *dev, struct device_node *np, enum irq_domain_bus_token token); extern struct irq_domain *of_msi_map_get_device_domain(struct device *dev, - u32 rid); + u32 id, + u32 bus_token); extern void of_msi_configure(struct device *dev, struct device_node *np); u32 of_msi_map_rid(struct device *dev, struct device_node *msi_np, u32 rid_in); #else @@ -85,7 +86,7 @@ static inline struct irq_domain *of_msi_get_domain(struct device *dev, return NULL; } static inline struct irq_domain *of_msi_map_get_device_domain(struct device *dev, - u32 rid) + u32 id, u32 bus_token) { return NULL; } From 2bcdd8f2c07f1aa1bfd34fa0dab8e06949e34846 Mon Sep 17 00:00:00 2001 From: Lorenzo Pieralisi Date: Fri, 19 Jun 2020 09:20:11 +0100 Subject: [PATCH 35/40] of/irq: Make of_msi_map_rid() PCI bus agnostic There is nothing PCI bus specific in the of_msi_map_rid() implementation other than the requester ID tag for the input ID space. Rename requester ID to a more generic ID so that the translation code can be used by all busses that require input/output ID translations. No functional change intended. Signed-off-by: Lorenzo Pieralisi Reviewed-by: Rob Herring Cc: Bjorn Helgaas Cc: Rob Herring Cc: Marc Zyngier Link: https://lore.kernel.org/r/20200619082013.13661-11-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/of/irq.c | 28 ++++++++++++++-------------- drivers/pci/msi.c | 2 +- include/linux/of_irq.h | 8 ++++---- 3 files changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/of/irq.c b/drivers/of/irq.c index 1005e4f349ef..25d17b8a1a1a 100644 --- a/drivers/of/irq.c +++ b/drivers/of/irq.c @@ -576,43 +576,43 @@ err: } } -static u32 __of_msi_map_rid(struct device *dev, struct device_node **np, - u32 rid_in) +static u32 __of_msi_map_id(struct device *dev, struct device_node **np, + u32 id_in) { struct device *parent_dev; - u32 rid_out = rid_in; + u32 id_out = id_in; /* * Walk up the device parent links looking for one with a * "msi-map" property. */ for (parent_dev = dev; parent_dev; parent_dev = parent_dev->parent) - if (!of_map_id(parent_dev->of_node, rid_in, "msi-map", - "msi-map-mask", np, &rid_out)) + if (!of_map_id(parent_dev->of_node, id_in, "msi-map", + "msi-map-mask", np, &id_out)) break; - return rid_out; + return id_out; } /** - * of_msi_map_rid - Map a MSI requester ID for a device. + * of_msi_map_id - Map a MSI ID for a device. * @dev: device for which the mapping is to be done. * @msi_np: device node of the expected msi controller. - * @rid_in: unmapped MSI requester ID for the device. + * @id_in: unmapped MSI ID for the device. * * Walk up the device hierarchy looking for devices with a "msi-map" - * property. If found, apply the mapping to @rid_in. + * property. If found, apply the mapping to @id_in. * - * Returns the mapped MSI requester ID. + * Returns the mapped MSI ID. */ -u32 of_msi_map_rid(struct device *dev, struct device_node *msi_np, u32 rid_in) +u32 of_msi_map_id(struct device *dev, struct device_node *msi_np, u32 id_in) { - return __of_msi_map_rid(dev, &msi_np, rid_in); + return __of_msi_map_id(dev, &msi_np, id_in); } /** * of_msi_map_get_device_domain - Use msi-map to find the relevant MSI domain * @dev: device for which the mapping is to be done. - * @rid: Requester ID for the device. + * @id: Device ID. * @bus_token: Bus token * * Walk up the device hierarchy looking for devices with a "msi-map" @@ -625,7 +625,7 @@ struct irq_domain *of_msi_map_get_device_domain(struct device *dev, u32 id, { struct device_node *np = NULL; - __of_msi_map_rid(dev, &np, id); + __of_msi_map_id(dev, &np, id); return irq_find_matching_host(np, bus_token); } diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index b4bfe0b03b2d..19aeadb22f11 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1535,7 +1535,7 @@ u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev) pci_for_each_dma_alias(pdev, get_msi_id_cb, &rid); of_node = irq_domain_get_of_node(domain); - rid = of_node ? of_msi_map_rid(&pdev->dev, of_node, rid) : + rid = of_node ? of_msi_map_id(&pdev->dev, of_node, rid) : iort_msi_map_id(&pdev->dev, rid); return rid; diff --git a/include/linux/of_irq.h b/include/linux/of_irq.h index 7142a3722758..e8b78139f78c 100644 --- a/include/linux/of_irq.h +++ b/include/linux/of_irq.h @@ -55,7 +55,7 @@ extern struct irq_domain *of_msi_map_get_device_domain(struct device *dev, u32 id, u32 bus_token); extern void of_msi_configure(struct device *dev, struct device_node *np); -u32 of_msi_map_rid(struct device *dev, struct device_node *msi_np, u32 rid_in); +u32 of_msi_map_id(struct device *dev, struct device_node *msi_np, u32 id_in); #else static inline int of_irq_count(struct device_node *dev) { @@ -93,10 +93,10 @@ static inline struct irq_domain *of_msi_map_get_device_domain(struct device *dev static inline void of_msi_configure(struct device *dev, struct device_node *np) { } -static inline u32 of_msi_map_rid(struct device *dev, - struct device_node *msi_np, u32 rid_in) +static inline u32 of_msi_map_id(struct device *dev, + struct device_node *msi_np, u32 id_in) { - return rid_in; + return id_in; } #endif From 998fb7badf0362a2057694878098642ef363d899 Mon Sep 17 00:00:00 2001 From: Diana Craciun Date: Fri, 19 Jun 2020 09:20:12 +0100 Subject: [PATCH 36/40] bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver The DPRC driver is not taking into account the msi-map property and assumes that the icid is the same as the stream ID. Although this assumption is correct, generalize the code to include a translation between icid and streamID. Furthermore do not just copy the MSI domain from parent (for child containers), but use the information provided by the msi-map property. If the msi-map property is missing from the device tree retain the old behaviour for backward compatibility ie the child DPRC objects inherit the MSI domain from the parent. Signed-off-by: Diana Craciun Acked-by: Marc Zyngier Link: https://lore.kernel.org/r/20200619082013.13661-12-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/bus/fsl-mc/dprc-driver.c | 31 ++++++--------------- drivers/bus/fsl-mc/fsl-mc-bus.c | 4 +-- drivers/bus/fsl-mc/fsl-mc-msi.c | 29 +++++++++++-------- drivers/bus/fsl-mc/fsl-mc-private.h | 6 ++-- drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c | 15 +++++++++- 5 files changed, 46 insertions(+), 39 deletions(-) diff --git a/drivers/bus/fsl-mc/dprc-driver.c b/drivers/bus/fsl-mc/dprc-driver.c index c8b1c3842c1a..189bff2115a8 100644 --- a/drivers/bus/fsl-mc/dprc-driver.c +++ b/drivers/bus/fsl-mc/dprc-driver.c @@ -592,6 +592,7 @@ static int dprc_probe(struct fsl_mc_device *mc_dev) bool mc_io_created = false; bool msi_domain_set = false; u16 major_ver, minor_ver; + struct irq_domain *mc_msi_domain; if (!is_fsl_mc_bus_dprc(mc_dev)) return -EINVAL; @@ -621,31 +622,15 @@ static int dprc_probe(struct fsl_mc_device *mc_dev) return error; mc_io_created = true; + } - /* - * Inherit parent MSI domain: - */ - dev_set_msi_domain(&mc_dev->dev, - dev_get_msi_domain(parent_dev)); - msi_domain_set = true; + mc_msi_domain = fsl_mc_find_msi_domain(&mc_dev->dev); + if (!mc_msi_domain) { + dev_warn(&mc_dev->dev, + "WARNING: MC bus without interrupt support\n"); } else { - /* - * This is a root DPRC - */ - struct irq_domain *mc_msi_domain; - - if (dev_is_fsl_mc(parent_dev)) - return -EINVAL; - - error = fsl_mc_find_msi_domain(parent_dev, - &mc_msi_domain); - if (error < 0) { - dev_warn(&mc_dev->dev, - "WARNING: MC bus without interrupt support\n"); - } else { - dev_set_msi_domain(&mc_dev->dev, mc_msi_domain); - msi_domain_set = true; - } + dev_set_msi_domain(&mc_dev->dev, mc_msi_domain); + msi_domain_set = true; } error = dprc_open(mc_dev->mc_io, 0, mc_dev->obj_desc.id, diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c index 8ead3f0238f2..824ff77bbe86 100644 --- a/drivers/bus/fsl-mc/fsl-mc-bus.c +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c @@ -370,8 +370,8 @@ EXPORT_SYMBOL_GPL(fsl_mc_get_version); /** * fsl_mc_get_root_dprc - function to traverse to the root dprc */ -static void fsl_mc_get_root_dprc(struct device *dev, - struct device **root_dprc_dev) +void fsl_mc_get_root_dprc(struct device *dev, + struct device **root_dprc_dev) { if (!dev) { *root_dprc_dev = NULL; diff --git a/drivers/bus/fsl-mc/fsl-mc-msi.c b/drivers/bus/fsl-mc/fsl-mc-msi.c index 8b9c66d7c4ff..e7bbff445a83 100644 --- a/drivers/bus/fsl-mc/fsl-mc-msi.c +++ b/drivers/bus/fsl-mc/fsl-mc-msi.c @@ -177,23 +177,30 @@ struct irq_domain *fsl_mc_msi_create_irq_domain(struct fwnode_handle *fwnode, return domain; } -int fsl_mc_find_msi_domain(struct device *mc_platform_dev, - struct irq_domain **mc_msi_domain) +struct irq_domain *fsl_mc_find_msi_domain(struct device *dev) { - struct irq_domain *msi_domain; - struct device_node *mc_of_node = mc_platform_dev->of_node; + struct irq_domain *msi_domain = NULL; + struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev); - msi_domain = of_msi_get_domain(mc_platform_dev, mc_of_node, - DOMAIN_BUS_FSL_MC_MSI); + msi_domain = of_msi_map_get_device_domain(dev, mc_dev->icid, + DOMAIN_BUS_FSL_MC_MSI); + + /* + * if the msi-map property is missing assume that all the + * child containers inherit the domain from the parent + */ if (!msi_domain) { - pr_err("Unable to find fsl-mc MSI domain for %pOF\n", - mc_of_node); + struct device *root_dprc_dev; + struct device *bus_dev; - return -ENOENT; + fsl_mc_get_root_dprc(dev, &root_dprc_dev); + bus_dev = root_dprc_dev->parent; + msi_domain = of_msi_get_domain(bus_dev, + bus_dev->of_node, + DOMAIN_BUS_FSL_MC_MSI); } - *mc_msi_domain = msi_domain; - return 0; + return msi_domain; } static void fsl_mc_msi_free_descs(struct device *dev) diff --git a/drivers/bus/fsl-mc/fsl-mc-private.h b/drivers/bus/fsl-mc/fsl-mc-private.h index 21ca8c756ee7..7a46a12eb747 100644 --- a/drivers/bus/fsl-mc/fsl-mc-private.h +++ b/drivers/bus/fsl-mc/fsl-mc-private.h @@ -595,8 +595,7 @@ int fsl_mc_msi_domain_alloc_irqs(struct device *dev, void fsl_mc_msi_domain_free_irqs(struct device *dev); -int fsl_mc_find_msi_domain(struct device *mc_platform_dev, - struct irq_domain **mc_msi_domain); +struct irq_domain *fsl_mc_find_msi_domain(struct device *dev); int fsl_mc_populate_irq_pool(struct fsl_mc_bus *mc_bus, unsigned int irq_count); @@ -613,6 +612,9 @@ void fsl_destroy_mc_io(struct fsl_mc_io *mc_io); bool fsl_mc_is_root_dprc(struct device *dev); +void fsl_mc_get_root_dprc(struct device *dev, + struct device **root_dprc_dev); + struct fsl_mc_device *fsl_mc_device_lookup(struct fsl_mc_obj_desc *obj_desc, struct fsl_mc_device *mc_bus_dev); diff --git a/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c b/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c index 606efa64adff..a5c8d577e424 100644 --- a/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c +++ b/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c @@ -23,6 +23,18 @@ static struct irq_chip its_msi_irq_chip = { .irq_set_affinity = msi_domain_set_affinity }; +static u32 fsl_mc_msi_domain_get_msi_id(struct irq_domain *domain, + struct fsl_mc_device *mc_dev) +{ + struct device_node *of_node; + u32 out_id; + + of_node = irq_domain_get_of_node(domain); + out_id = of_msi_map_id(&mc_dev->dev, of_node, mc_dev->icid); + + return out_id; +} + static int its_fsl_mc_msi_prepare(struct irq_domain *msi_domain, struct device *dev, int nvec, msi_alloc_info_t *info) @@ -43,7 +55,8 @@ static int its_fsl_mc_msi_prepare(struct irq_domain *msi_domain, * NOTE: This device id corresponds to the IOMMU stream ID * associated with the DPRC object (ICID). */ - info->scratchpad[0].ul = mc_bus_dev->icid; + info->scratchpad[0].ul = fsl_mc_msi_domain_get_msi_id(msi_domain, + mc_bus_dev); msi_info = msi_get_domain_info(msi_domain->parent); /* Allocate at least 32 MSIs, and always as a power of 2 */ From 6305166c8771c33a8d5992fb53f93cfecedc14fd Mon Sep 17 00:00:00 2001 From: Makarand Pawagi Date: Fri, 19 Jun 2020 09:20:13 +0100 Subject: [PATCH 37/40] bus: fsl-mc: Add ACPI support for fsl-mc Add ACPI support in the fsl-mc driver. Driver parses MC DSDT table to extract memory and other resources. Interrupt (GIC ITS) information is extracted from the MADT table by drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c. IORT table is parsed to configure DMA. Signed-off-by: Makarand Pawagi Signed-off-by: Diana Craciun Signed-off-by: Laurentiu Tudor Link: https://lore.kernel.org/r/20200619082013.13661-13-lorenzo.pieralisi@arm.com Signed-off-by: Catalin Marinas --- drivers/bus/fsl-mc/fsl-mc-bus.c | 73 ++++++++++++---- drivers/bus/fsl-mc/fsl-mc-msi.c | 35 ++++---- drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c | 92 ++++++++++++++++----- 3 files changed, 149 insertions(+), 51 deletions(-) diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c index 824ff77bbe86..324d49d6df89 100644 --- a/drivers/bus/fsl-mc/fsl-mc-bus.c +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c @@ -18,6 +18,8 @@ #include #include #include +#include +#include #include "fsl-mc-private.h" @@ -38,6 +40,7 @@ struct fsl_mc { struct fsl_mc_device *root_mc_bus_dev; u8 num_translation_ranges; struct fsl_mc_addr_translation_range *translation_ranges; + void *fsl_mc_regs; }; /** @@ -56,6 +59,10 @@ struct fsl_mc_addr_translation_range { phys_addr_t start_phys_addr; }; +#define FSL_MC_FAPR 0x28 +#define MC_FAPR_PL BIT(18) +#define MC_FAPR_BMT BIT(17) + /** * fsl_mc_bus_match - device to driver matching callback * @dev: the fsl-mc device to match against @@ -124,7 +131,10 @@ static int fsl_mc_dma_configure(struct device *dev) while (dev_is_fsl_mc(dma_dev)) dma_dev = dma_dev->parent; - return of_dma_configure_id(dev, dma_dev->of_node, 0, &input_id); + if (dev_of_node(dma_dev)) + return of_dma_configure_id(dev, dma_dev->of_node, 0, &input_id); + + return acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id); } static ssize_t modalias_show(struct device *dev, struct device_attribute *attr, @@ -865,8 +875,11 @@ static int fsl_mc_bus_probe(struct platform_device *pdev) struct fsl_mc_io *mc_io = NULL; int container_id; phys_addr_t mc_portal_phys_addr; - u32 mc_portal_size; - struct resource res; + u32 mc_portal_size, mc_stream_id; + struct resource *plat_res; + + if (!iommu_present(&fsl_mc_bus_type)) + return -EPROBE_DEFER; mc = devm_kzalloc(&pdev->dev, sizeof(*mc), GFP_KERNEL); if (!mc) @@ -874,19 +887,33 @@ static int fsl_mc_bus_probe(struct platform_device *pdev) platform_set_drvdata(pdev, mc); + plat_res = platform_get_resource(pdev, IORESOURCE_MEM, 1); + mc->fsl_mc_regs = devm_ioremap_resource(&pdev->dev, plat_res); + if (IS_ERR(mc->fsl_mc_regs)) + return PTR_ERR(mc->fsl_mc_regs); + + if (IS_ENABLED(CONFIG_ACPI) && !dev_of_node(&pdev->dev)) { + mc_stream_id = readl(mc->fsl_mc_regs + FSL_MC_FAPR); + /* + * HW ORs the PL and BMT bit, places the result in bit 15 of + * the StreamID and ORs in the ICID. Calculate it accordingly. + */ + mc_stream_id = (mc_stream_id & 0xffff) | + ((mc_stream_id & (MC_FAPR_PL | MC_FAPR_BMT)) ? + 0x4000 : 0); + error = acpi_dma_configure_id(&pdev->dev, DEV_DMA_COHERENT, + &mc_stream_id); + if (error) + dev_warn(&pdev->dev, "failed to configure dma: %d.\n", + error); + } + /* * Get physical address of MC portal for the root DPRC: */ - error = of_address_to_resource(pdev->dev.of_node, 0, &res); - if (error < 0) { - dev_err(&pdev->dev, - "of_address_to_resource() failed for %pOF\n", - pdev->dev.of_node); - return error; - } - - mc_portal_phys_addr = res.start; - mc_portal_size = resource_size(&res); + plat_res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + mc_portal_phys_addr = plat_res->start; + mc_portal_size = resource_size(plat_res); error = fsl_create_mc_io(&pdev->dev, mc_portal_phys_addr, mc_portal_size, NULL, FSL_MC_IO_ATOMIC_CONTEXT_PORTAL, &mc_io); @@ -903,11 +930,13 @@ static int fsl_mc_bus_probe(struct platform_device *pdev) dev_info(&pdev->dev, "MC firmware version: %u.%u.%u\n", mc_version.major, mc_version.minor, mc_version.revision); - error = get_mc_addr_translation_ranges(&pdev->dev, - &mc->translation_ranges, - &mc->num_translation_ranges); - if (error < 0) - goto error_cleanup_mc_io; + if (dev_of_node(&pdev->dev)) { + error = get_mc_addr_translation_ranges(&pdev->dev, + &mc->translation_ranges, + &mc->num_translation_ranges); + if (error < 0) + goto error_cleanup_mc_io; + } error = dprc_get_container_id(mc_io, 0, &container_id); if (error < 0) { @@ -934,6 +963,7 @@ static int fsl_mc_bus_probe(struct platform_device *pdev) goto error_cleanup_mc_io; mc->root_mc_bus_dev = mc_bus_dev; + mc_bus_dev->dev.fwnode = pdev->dev.fwnode; return 0; error_cleanup_mc_io: @@ -967,11 +997,18 @@ static const struct of_device_id fsl_mc_bus_match_table[] = { MODULE_DEVICE_TABLE(of, fsl_mc_bus_match_table); +static const struct acpi_device_id fsl_mc_bus_acpi_match_table[] = { + {"NXP0008", 0 }, + { } +}; +MODULE_DEVICE_TABLE(acpi, fsl_mc_bus_acpi_match_table); + static struct platform_driver fsl_mc_bus_driver = { .driver = { .name = "fsl_mc_bus", .pm = NULL, .of_match_table = fsl_mc_bus_match_table, + .acpi_match_table = fsl_mc_bus_acpi_match_table, }, .probe = fsl_mc_bus_probe, .remove = fsl_mc_bus_remove, diff --git a/drivers/bus/fsl-mc/fsl-mc-msi.c b/drivers/bus/fsl-mc/fsl-mc-msi.c index e7bbff445a83..8edadf05cbb7 100644 --- a/drivers/bus/fsl-mc/fsl-mc-msi.c +++ b/drivers/bus/fsl-mc/fsl-mc-msi.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "fsl-mc-private.h" @@ -179,25 +180,31 @@ struct irq_domain *fsl_mc_msi_create_irq_domain(struct fwnode_handle *fwnode, struct irq_domain *fsl_mc_find_msi_domain(struct device *dev) { - struct irq_domain *msi_domain = NULL; + struct device *root_dprc_dev; + struct device *bus_dev; + struct irq_domain *msi_domain; struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev); - msi_domain = of_msi_map_get_device_domain(dev, mc_dev->icid, + fsl_mc_get_root_dprc(dev, &root_dprc_dev); + bus_dev = root_dprc_dev->parent; + + if (bus_dev->of_node) { + msi_domain = of_msi_map_get_device_domain(dev, + mc_dev->icid, DOMAIN_BUS_FSL_MC_MSI); - /* - * if the msi-map property is missing assume that all the - * child containers inherit the domain from the parent - */ - if (!msi_domain) { - struct device *root_dprc_dev; - struct device *bus_dev; + /* + * if the msi-map property is missing assume that all the + * child containers inherit the domain from the parent + */ + if (!msi_domain) - fsl_mc_get_root_dprc(dev, &root_dprc_dev); - bus_dev = root_dprc_dev->parent; - msi_domain = of_msi_get_domain(bus_dev, - bus_dev->of_node, - DOMAIN_BUS_FSL_MC_MSI); + msi_domain = of_msi_get_domain(bus_dev, + bus_dev->of_node, + DOMAIN_BUS_FSL_MC_MSI); + } else { + msi_domain = iort_get_device_domain(dev, mc_dev->icid, + DOMAIN_BUS_FSL_MC_MSI); } return msi_domain; diff --git a/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c b/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c index a5c8d577e424..634263dfd7b5 100644 --- a/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c +++ b/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c @@ -7,6 +7,8 @@ * */ +#include +#include #include #include #include @@ -30,7 +32,8 @@ static u32 fsl_mc_msi_domain_get_msi_id(struct irq_domain *domain, u32 out_id; of_node = irq_domain_get_of_node(domain); - out_id = of_msi_map_id(&mc_dev->dev, of_node, mc_dev->icid); + out_id = of_node ? of_msi_map_id(&mc_dev->dev, of_node, mc_dev->icid) : + iort_msi_map_id(&mc_dev->dev, mc_dev->icid); return out_id; } @@ -79,12 +82,71 @@ static const struct of_device_id its_device_id[] = { {}, }; -static int __init its_fsl_mc_msi_init(void) +static void __init its_fsl_mc_msi_init_one(struct fwnode_handle *handle, + const char *name) { - struct device_node *np; struct irq_domain *parent; struct irq_domain *mc_msi_domain; + parent = irq_find_matching_fwnode(handle, DOMAIN_BUS_NEXUS); + if (!parent || !msi_get_domain_info(parent)) { + pr_err("%s: unable to locate ITS domain\n", name); + return; + } + + mc_msi_domain = fsl_mc_msi_create_irq_domain(handle, + &its_fsl_mc_msi_domain_info, + parent); + if (!mc_msi_domain) { + pr_err("%s: unable to create fsl-mc domain\n", name); + return; + } + + pr_info("fsl-mc MSI: %s domain created\n", name); +} + +#ifdef CONFIG_ACPI +static int __init +its_fsl_mc_msi_parse_madt(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_madt_generic_translator *its_entry; + struct fwnode_handle *dom_handle; + const char *node_name; + int err = 0; + + its_entry = (struct acpi_madt_generic_translator *)header; + node_name = kasprintf(GFP_KERNEL, "ITS@0x%lx", + (long)its_entry->base_address); + + dom_handle = iort_find_domain_token(its_entry->translation_id); + if (!dom_handle) { + pr_err("%s: Unable to locate ITS domain handle\n", node_name); + err = -ENXIO; + goto out; + } + + its_fsl_mc_msi_init_one(dom_handle, node_name); + +out: + kfree(node_name); + return err; +} + + +static void __init its_fsl_mc_acpi_msi_init(void) +{ + acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_TRANSLATOR, + its_fsl_mc_msi_parse_madt, 0); +} +#else +static inline void its_fsl_mc_acpi_msi_init(void) { } +#endif + +static void __init its_fsl_mc_of_msi_init(void) +{ + struct device_node *np; + for (np = of_find_matching_node(NULL, its_device_id); np; np = of_find_matching_node(np, its_device_id)) { if (!of_device_is_available(np)) @@ -92,23 +154,15 @@ static int __init its_fsl_mc_msi_init(void) if (!of_property_read_bool(np, "msi-controller")) continue; - parent = irq_find_matching_host(np, DOMAIN_BUS_NEXUS); - if (!parent || !msi_get_domain_info(parent)) { - pr_err("%pOF: unable to locate ITS domain\n", np); - continue; - } - - mc_msi_domain = fsl_mc_msi_create_irq_domain( - of_node_to_fwnode(np), - &its_fsl_mc_msi_domain_info, - parent); - if (!mc_msi_domain) { - pr_err("%pOF: unable to create fsl-mc domain\n", np); - continue; - } - - pr_info("fsl-mc MSI: %pOF domain created\n", np); + its_fsl_mc_msi_init_one(of_node_to_fwnode(np), + np->full_name); } +} + +static int __init its_fsl_mc_msi_init(void) +{ + its_fsl_mc_of_msi_init(); + its_fsl_mc_acpi_msi_init(); return 0; } From c4334d576cf420a7d0f4349ce0b0a8ed0de3938f Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sat, 25 Jul 2020 17:32:05 -0700 Subject: [PATCH 38/40] arm64: pgtable-hwdef.h: delete duplicated words Drop the repeated words "at" and "the". Signed-off-by: Randy Dunlap Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20200726003207.20253-2-rdunlap@infradead.org Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/pgtable-hwdef.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 9c91a8f93a0e..b18ba4452873 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -29,7 +29,7 @@ * Size mapped by an entry at level n ( 0 <= n <= 3) * We map (PAGE_SHIFT - 3) at all translation levels and PAGE_SHIFT bits * in the final page. The maximum number of translation levels supported by - * the architecture is 4. Hence, starting at at level n, we have further + * the architecture is 4. Hence, starting at level n, we have further * ((4 - n) - 1) levels of translation excluding the offset within the page. * So, the total number of bits mapped by an entry at level n is : * @@ -98,7 +98,7 @@ #define CONT_PMDS (1 << CONT_PMD_SHIFT) #define CONT_PMD_SIZE (CONT_PMDS * PMD_SIZE) #define CONT_PMD_MASK (~(CONT_PMD_SIZE - 1)) -/* the the numerical offset of the PTE within a range of CONT_PTES */ +/* the numerical offset of the PTE within a range of CONT_PTES */ #define CONT_RANGE_OFFSET(addr) (((addr)>>PAGE_SHIFT)&(CONT_PTES-1)) /* From c4b5abba008399dc4450ab6f62b2deb5acd3697e Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sat, 25 Jul 2020 17:32:06 -0700 Subject: [PATCH 39/40] arm64: ptrace.h: delete duplicated word Drop the repeated word "the". Signed-off-by: Randy Dunlap Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20200726003207.20253-3-rdunlap@infradead.org Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index 953b6a1ce549..966ed30ed5f7 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -27,7 +27,7 @@ * * Some code sections either automatically switch back to PSR.I or explicitly * require to not use priority masking. If bit GIC_PRIO_PSR_I_SET is included - * in the the priority mask, it indicates that PSR.I should be set and + * in the priority mask, it indicates that PSR.I should be set and * interrupt disabling temporarily does not rely on IRQ priorities. */ #define GIC_PRIO_IRQON 0xe0 From 1a9ea25d1874ca457a596738b40fa4f3bec6fc8f Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sat, 25 Jul 2020 17:32:07 -0700 Subject: [PATCH 40/40] arm64: sigcontext.h: delete duplicated word Drop the repeated word "the". Signed-off-by: Randy Dunlap Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20200726003207.20253-4-rdunlap@infradead.org Signed-off-by: Catalin Marinas --- arch/arm64/include/uapi/asm/sigcontext.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h index 8b0ebce92427..0c796c795dbe 100644 --- a/arch/arm64/include/uapi/asm/sigcontext.h +++ b/arch/arm64/include/uapi/asm/sigcontext.h @@ -179,7 +179,7 @@ struct sve_context { * The same convention applies when returning from a signal: a caller * will need to remove or resize the sve_context block if it wants to * make the SVE registers live when they were previously non-live or - * vice-versa. This may require the the caller to allocate fresh + * vice-versa. This may require the caller to allocate fresh * memory and/or move other context blocks in the signal frame. * * Changing the vector length during signal return is not permitted: