Commit Graph

1154474 Commits

Author SHA1 Message Date
Zheng Yejian
608c6ed333 tracing/hist: Fix issue of losting command info in error_log
When input some constructed invalid 'trigger' command, command info
in 'error_log' are lost [1].

The root cause is that there is a path that event_hist_trigger_parse()
is recursely called once and 'last_cmd' which save origin command is
cleared, then later calling of hist_err() will no longer record origin
command info:

  event_hist_trigger_parse() {
    last_cmd_set()  // <1> 'last_cmd' save origin command here at first
    create_actions() {
      onmatch_create() {
        action_create() {
          trace_action_create() {
            trace_action_create_field_var() {
              create_field_var_hist() {
                event_hist_trigger_parse() {  // <2> recursely called once
                  hist_err_clear()  // <3> 'last_cmd' is cleared here
                }
                hist_err()  // <4> No longer find origin command!!!

Since 'glob' is empty string while running into the recurse call, we
can trickly check it and bypass the call of hist_err_clear() to solve it.

[1]
 # cd /sys/kernel/tracing
 # echo "my_synth_event int v1; int v2; int v3;" >> synthetic_events
 # echo 'hist:keys=pid' >> events/sched/sched_waking/trigger
 # echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\
pid,pid1)" >> events/sched/sched_switch/trigger
 # cat error_log
[  8.405018] hist:sched:sched_switch: error: Couldn't find synthetic event
  Command:
hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1)
                                                          ^
[  8.816902] hist:sched:sched_switch: error: Couldn't find field
  Command:
hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1)
                          ^
[  8.816902] hist:sched:sched_switch: error: Couldn't parse field variable
  Command:
hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1)
                          ^
[  8.999880] : error: Couldn't find field
  Command:
           ^
[  8.999880] : error: Couldn't parse field variable
  Command:
           ^
[  8.999880] : error: Couldn't find field
  Command:
           ^
[  8.999880] : error: Couldn't create histogram for field
  Command:
           ^

Link: https://lore.kernel.org/linux-trace-kernel/20221207135326.3483216-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Fixes: f404da6e1d ("tracing: Add 'last error' error facility for hist triggers")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-10 13:36:04 -05:00
Zheng Yejian
ff4837f7fe tracing: Fix issue of missing one synthetic field
The maximum number of synthetic fields supported is defined as
SYNTH_FIELDS_MAX which value currently is 64, but it actually fails
when try to generate a synthetic event with 64 fields by executing like:

  # echo "my_synth_event int v1; int v2; int v3; int v4; int v5; int v6;\
   int v7; int v8; int v9; int v10; int v11; int v12; int v13; int v14;\
   int v15; int v16; int v17; int v18; int v19; int v20; int v21; int v22;\
   int v23; int v24; int v25; int v26; int v27; int v28; int v29; int v30;\
   int v31; int v32; int v33; int v34; int v35; int v36; int v37; int v38;\
   int v39; int v40; int v41; int v42; int v43; int v44; int v45; int v46;\
   int v47; int v48; int v49; int v50; int v51; int v52; int v53; int v54;\
   int v55; int v56; int v57; int v58; int v59; int v60; int v61; int v62;\
   int v63; int v64" >> /sys/kernel/tracing/synthetic_events

Correct the field counting to fix it.

Link: https://lore.kernel.org/linux-trace-kernel/20221207091557.3137904-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: c9e759b1e8 ("tracing: Rework synthetic event command parsing")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-10 13:36:04 -05:00
Zheng Yejian
82470f7d90 tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
When generate a synthetic event with many params and then create a trace
action for it [1], kernel panic happened [2].

It is because that in trace_action_create() 'data->n_params' is up to
SYNTH_FIELDS_MAX (current value is 64), and array 'data->var_ref_idx'
keeps indices into array 'hist_data->var_refs' for each synthetic event
param, but the length of 'data->var_ref_idx' is TRACING_MAP_VARS_MAX
(current value is 16), so out-of-bound write happened when 'data->n_params'
more than 16. In this case, 'data->match_data.event' is overwritten and
eventually cause the panic.

To solve the issue, adjust the length of 'data->var_ref_idx' to be
SYNTH_FIELDS_MAX and add sanity checks to avoid out-of-bound write.

[1]
 # cd /sys/kernel/tracing/
 # echo "my_synth_event int v1; int v2; int v3; int v4; int v5; int v6;\
int v7; int v8; int v9; int v10; int v11; int v12; int v13; int v14;\
int v15; int v16; int v17; int v18; int v19; int v20; int v21; int v22;\
int v23; int v24; int v25; int v26; int v27; int v28; int v29; int v30;\
int v31; int v32; int v33; int v34; int v35; int v36; int v37; int v38;\
int v39; int v40; int v41; int v42; int v43; int v44; int v45; int v46;\
int v47; int v48; int v49; int v50; int v51; int v52; int v53; int v54;\
int v55; int v56; int v57; int v58; int v59; int v60; int v61; int v62;\
int v63" >> synthetic_events
 # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="bash"' >> \
events/sched/sched_waking/trigger
 # echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\
pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
pid,pid,pid,pid,pid,pid,pid,pid,pid)" >> events/sched/sched_switch/trigger

[2]
BUG: unable to handle page fault for address: ffff91c900000000
PGD 61001067 P4D 61001067 PUD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 322 Comm: bash Tainted: G        W          6.1.0-rc8+ #229
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:strcmp+0xc/0x30
Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee
c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14
07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3
RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000
RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000
RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000
R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580
R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538
FS:  00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0
Call Trace:
 <TASK>
 __find_event_file+0x55/0x90
 action_create+0x76c/0x1060
 event_hist_trigger_parse+0x146d/0x2060
 ? event_trigger_write+0x31/0xd0
 trigger_process_regex+0xbb/0x110
 event_trigger_write+0x6b/0xd0
 vfs_write+0xc8/0x3e0
 ? alloc_fd+0xc0/0x160
 ? preempt_count_add+0x4d/0xa0
 ? preempt_count_add+0x70/0xa0
 ksys_write+0x5f/0xe0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f1d1d0cf077
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e
fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74
RSP: 002b:00007ffcebb0e568 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000143 RCX: 00007f1d1d0cf077
RDX: 0000000000000143 RSI: 00005639265aa7e0 RDI: 0000000000000001
RBP: 00005639265aa7e0 R08: 000000000000000a R09: 0000000000000142
R10: 000056392639c017 R11: 0000000000000246 R12: 0000000000000143
R13: 00007f1d1d1ae6a0 R14: 00007f1d1d1aa4a0 R15: 00007f1d1d1a98a0
 </TASK>
Modules linked in:
CR2: ffff91c900000000
---[ end trace 0000000000000000 ]---
RIP: 0010:strcmp+0xc/0x30
Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee
c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14
07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3
RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000
RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000
RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000
R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580
R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538
FS:  00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0

Link: https://lore.kernel.org/linux-trace-kernel/20221207035143.2278781-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: d380dcde9a ("tracing: Fix now invalid var_ref_vals assumption in trace action")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-10 13:36:04 -05:00
Zheng Yejian
2cc6a52888 tracing/hist: Fix wrong return value in parse_action_params()
When number of synth fields is more than SYNTH_FIELDS_MAX,
parse_action_params() should return -EINVAL.

Link: https://lore.kernel.org/linux-trace-kernel/20221207034635.2253990-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: c282a386a3 ("tracing: Add 'onmatch' hist trigger action support")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-10 13:36:04 -05:00
Steven Rostedt
20fb6c9976 x86/mm/kmmio: Use rcu_read_lock_sched_notrace()
The mmiotrace tracer is "special". The purpose is to help reverse engineer
binary drivers by removing the memory allocated by the driver and when the
driver goes to access it, a fault occurs, the mmiotracer will record what
the driver was doing and then do the work on its behalf by single stepping
through the process.

But to achieve this ability, it must do some special things. One is to
take the rcu_read_lock() when the fault occurs, and then release it in the
breakpoint that is single stepping. This makes lockdep unhappy, as it
changes the state of RCU from within an exception that is not contained in
that exception, and we get a nasty splat from lockdep.

Instead, switch to rcu_read_lock_sched_notrace() as the RCU sched variant
has the same grace period as normal RCU. This is basically the same as
rcu_read_lock() but does not make lockdep complain about it.

Note, the preempt_disable() is still needed as it uses preempt_enable_no_resched().

Link: https://lore.kernel.org/linux-trace-kernel/20221209134144.04f33626@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Karol Herbst <karolherbst@gmail.com>
Cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-10 13:35:46 -05:00
Anna Schumaker
7fd461c47c NFSv4.2: Change the default KConfig value for READ_PLUS
Now that we've worked out performance issues and have a server patch
addressing the failed xfstests, we can safely enable this feature by
default.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-10 13:24:59 -05:00
Linus Torvalds
296a7b7eb7 Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
 "One further ARM fix for 6.1 from Wang Kefeng, fixing up the handling
  for kfence faults"

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9278/1: kfence: only handle translation faults
2022-12-10 10:14:52 -08:00
Bjorn Helgaas
f826afe5ea Merge branch 'pci/kbuild'
- Remove unnecessary <linux/of_irq.h> includes (Bjorn Helgaas)

* pci/kbuild:
  PCI: Drop of_match_ptr() to avoid unused variables
  PCI: Remove unnecessary <linux/of_irq.h> includes
  PCI: xgene-msi: Include <linux/irqdomain.h> explicitly
  PCI: mvebu: Include <linux/irqdomain.h> explicitly
  PCI: microchip: Include <linux/irqdomain.h> explicitly
  PCI: altera-msi: Include <linux/irqdomain.h> explicitly

# Conflicts:
#	drivers/pci/controller/pci-mvebu.c
2022-12-10 10:36:52 -06:00
Bjorn Helgaas
e4d741e9e4 Merge branch 'pci/ctrl/xilinx'
- Fix whitespace issues (Michal Simek)

* pci/ctrl/xilinx:
  PCI: xilinx-nwl: Fix coding style violations
2022-12-10 10:36:42 -06:00
Bjorn Helgaas
4e5194733a Merge branch 'pci/ctrl/mvebu'
- Switch to the gpiod API so we can make of_get_named_gpio_flags() private
  (Dmitry Torokhov)

* pci/ctrl/mvebu:
  PCI: mvebu: Switch to using gpiod API
2022-12-10 10:36:41 -06:00
Bjorn Helgaas
0454c6c0ed Merge branch 'pci/ctrl/aardvark'
- Switch to using devm_gpiod_get_optional() so we can stop exporting
  devm_gpiod_get_from_of_node() (Dmitry Torokhov)

* pci/ctrl/aardvark:
  PCI: aardvark: Switch to using devm_gpiod_get_optional()
2022-12-10 10:36:40 -06:00
Bjorn Helgaas
bcccaa0a48 Merge branch 'remotes/lorenzo/pci/misc'
- Register notifier if core_init_notifier is enabled in pci-epf-test
  (Kunihiko Hayashi)

- Fixup Kconfig indentation (Shunsuke Mie)

* remotes/lorenzo/pci/misc:
  PCI: endpoint: Fix Kconfig indent style
  PCI: pci-epf-test: Register notifier if only core_init_notifier is enabled
2022-12-10 10:36:40 -06:00
Bjorn Helgaas
ba7deaa2a8 Merge branch 'remotes/lorenzo/pci/vmd'
- Restore MSI remapping configuration during resume because the
  configuration is cleared out by firmware when suspending (Nirmal Patel)

- Reset the hierarchy below VMD when probing the VMD; we attempted this
  before, but with the wrong device, so it didn't work (Francisco Munoz)

* remotes/lorenzo/pci/vmd:
  PCI: vmd: Fix secondary bus reset for Intel bridges
  PCI: vmd: Disable MSI remapping after suspend
2022-12-10 10:36:39 -06:00
Bjorn Helgaas
4e5db7983d Merge branch 'remotes/lorenzo/pci/tegra'
- Switch from devm_gpiod_get_from_of_node() to devm_fwnode_gpiod_get()
  (Dmitry Torokhov)

* remotes/lorenzo/pci/tegra:
  PCI: tegra: Switch to using devm_fwnode_gpiod_get
2022-12-10 10:36:39 -06:00
Bjorn Helgaas
008ee711f9 Merge branch 'remotes/lorenzo/pci/qcom'
- Add DT and driver support for SC8280XP/SA8540P basic interconnects where
  interconnect bandwidth must be requested before enabling interconnect
  clocks (Johan Hovold)

- Add 'dma-coherent' property (Johan Hovold)

* remotes/lorenzo/pci/qcom:
  dt-bindings: PCI: qcom: Allow 'dma-coherent' property
  PCI: qcom: Add basic interconnect support
  dt-bindings: PCI: qcom: Add SC8280XP/SA8540P interconnects
2022-12-10 10:36:38 -06:00
Bjorn Helgaas
8ecdba32a5 Merge branch 'remotes/lorenzo/pci/mt7621'
- Add sentinel to mt7621_pcie_quirks_match[] to prevent oops when parsing
  the table (John Thomson)

* remotes/lorenzo/pci/mt7621:
  PCI: mt7621: Add sentinel to quirks table
2022-12-10 10:36:38 -06:00
Bjorn Helgaas
c00a109054 Merge branch 'remotes/lorenzo/pci/endpoint'
- Add a .release() callback for the Endpoint Controller library so an
  Endpoint driver is removable (Yoshihiro Shimoda)

- Fix pci-epf-vntb kernel-doc and whitespace (Frank Li)

- Fix pci-epf-vntb error path usage of pci_epc_mem_free_addr() (Frank Li)

- Remove pci-epf-vntb unused epf_db_phy (Frank Li)

- Fix pci-epf-vntb sparse warnings (Frank Li)

* remotes/lorenzo/pci/endpoint:
  PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning
  PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db
  PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32)
  PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member
  PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path
  PCI: endpoint: pci-epf-vntb: Fix struct epf_ntb_ctrl indentation
  PCI: endpoint: pci-epf-vntb: Clean up kernel_doc warning
  PCI: endpoint: Fix WARN() when an endpoint driver is removed
2022-12-10 10:36:37 -06:00
Bjorn Helgaas
29a3e5aedc Merge branch 'remotes/lorenzo/pci/dwc'
- Fix n_fts[] array overrun (Vidya Sagar)

- Don't advertise PTM Responder role for Endpoints (Vidya Sagar)

- Fix qcom "reset assert" error message (Manivannan Sadhasivam)

- Downgrade "link didn't come up" message to dev_info (Vidya Sagar)

- Initialize PHY before deasserting core reset so the link comes up on
  boards where the PHY provides the reference clock (this was a regression
  in v6.0) (Sascha Hauer)

- Switch histb to the gpiod API (Dmitry Torokhov)

- Fix imx6sx and imx8mq clock names in DT binding (Serge Semin)

- Fix visconti MSI interrupt in DT binding (Serge Semin)

- Consolidate reset-gpio, cdm, windows info in common DT shared by both
  Root Port and Endpoint bindings (Serge Semin)

- Remove bus node from DT examples (Serge Semin)

- Add common phys, phy-names to DT (Serge Semin)

- Add default max-link-speed of Gen5 to DT (Serge Semin)

- Apply generic schema for generic device  (Serge Semin)

- Add default max-functions of 32 to DT (Serge Semin)

- Add common interrupts, interrupt-names to DT (Serge Semin)

- Add common regs, reg-names to DT (Serge Semin)

- Add common clocks, resets to DT (Serge Semin)

- Add dma-coherent to DT (Serge Semin)

- Apply common schema to Rockchip DT (Serge Semin)

- Add Baikal-T1 DT bindings (Serge Semin)

- Add dma-ranges support in DesignWare core (Serge Semin)

- Add dw_pcie_cap_is() for testing controller capabilities (Serge Semin)

- Add generic resources getter to DesignWare core (Serge Semin)

- Combine iATU detection procedures (Serge Semin)

- Add generic clock and reset names to DesignWare core (Serge Semin)

- Add Baikal-T1 PCIe controller driver (Serge Semin)

* remotes/lorenzo/pci/dwc:
  PCI: dwc: Add Baikal-T1 PCIe controller support
  PCI: dwc: Introduce generic platform clocks and resets
  PCI: dwc: Combine iATU detection procedures
  PCI: dwc: Introduce generic resources getter
  PCI: dwc: Introduce generic controller capabilities interface
  PCI: dwc: Introduce dma-ranges property support for RC-host
  dt-bindings: PCI: dwc: Add Baikal-T1 PCIe Root Port bindings
  dt-bindings: PCI: dwc: Apply common schema to Rockchip DW PCIe nodes
  dt-bindings: PCI: dwc: Add dma-coherent property
  dt-bindings: PCI: dwc: Add clocks/resets common properties
  dt-bindings: PCI: dwc: Add reg/reg-names common properties
  dt-bindings: PCI: dwc: Add interrupts/interrupt-names common properties
  dt-bindings: PCI: dwc: Add max-functions EP property
  dt-bindings: PCI: dwc: Apply generic schema for generic device only
  dt-bindings: PCI: dwc: Add max-link-speed common property
  dt-bindings: PCI: dwc: Add phys/phy-names common properties
  dt-bindings: PCI: dwc: Remove bus node from the examples
  dt-bindings: PCI: dwc: Detach common RP/EP DT bindings
  dt-bindings: visconti-pcie: Fix interrupts array max constraints
  dt-bindings: imx6q-pcie: Fix clock names for imx6sx and imx8mq
  PCI: histb: Switch to using gpiod API
  PCI: imx6: Initialize PHY before deasserting core reset
  PCI: dwc: Use dev_info for PCIe link down event logging
  PCI: qcom: Fix error message for reset_control_assert()
  PCI: designware-ep: Disable PTM capabilities for EP mode
  PCI: Add PCI_PTM_CAP_RES macro
  PCI: dwc: Fix n_fts[] array overrun
2022-12-10 10:36:37 -06:00
Bjorn Helgaas
0ef283080e Merge branch 'remotes/lorenzo/pci/brcmstb'
- Enable Multi-MSI (Jim Quinlan)

- Wait for 100ms after PERST# deassert for power and clocks to stabilize
  (Jim Quinlan)

- Use readl_poll_timeout_atomic() instead of hand-rolled timeout loop (Jim
  Quinlan)

- Drop needless "inline" annotations (Jim Quinlan)

- Set RCB_MPS mode bit so data for reads up to MPS are returned in a single
  completion (Jim Quinlan)

* remotes/lorenzo/pci/brcmstb:
  PCI: brcmstb: Set RCB_{MPS,64B}_MODE bits
  PCI: brcmstb: Drop needless 'inline' annotations
  PCI: brcmstb: Replace status loops with read_poll_timeout_atomic()
  PCI: brcmstb: Wait for 100ms following PERST# deassert
  PCI: brcmstb: Enable Multi-MSI
2022-12-10 10:36:36 -06:00
Bjorn Helgaas
e6936c8d7c Merge branch 'remotes/lorenzo/pci/dt'
- Add ti,j721e-pci-host interrupt controller definition (Matt Ranostay)

- Add ti,j721e-pci-host interrupt properties (Matt Ranostay)

- Add ti,j721s2 host mode device-id (Matt Ranostay)

- Add mediatek-gen3 iommu, power properties (Jianjun Wang)

- Add mediatek-gen3 SoC-based clock names (Frank Wunderlich)

- Add mediatek-gen3 mt7986 support (Frank Wunderlich)

* remotes/lorenzo/pci/dt:
  dt-bindings: PCI: mediatek-gen3: add support for mt7986
  dt-bindings: PCI: mediatek-gen3: add SoC based clock config
  dt-bindings: PCI: Add host mode device-id for j721s2 platform
  dt-bindings: PCI: mediatek-gen3: Support mt8195
  dt-bindings: PCI: ti,j721e-pci-*: Add missing interrupt properties
  dt-bindings: PCI: ti,j721e-pci-host: add interrupt controller definition
2022-12-10 10:36:36 -06:00
Bjorn Helgaas
0084cd6072 Merge branch 'pci/sysfs'
- Fix a double free in the error path of creating sysfs "resource%d"
  attributes (Sascha Hauer)

* pci/sysfs:
  PCI/sysfs: Fix double free in error path
2022-12-10 10:36:35 -06:00
Bjorn Helgaas
8961fc4f8c Merge branch 'pci/resource'
- Remove EfiMemoryMappedIO regions from the E820 map to allow PCI core to
  allocate BARs from them.  The only purpose of EfiMemoryMappedIO is to
  tell the OS to map things needed by EFI runtime services, so it's often
  used for PCI host bridge apertures.  If we can't allocate from those
  apertures, we can't hot-add devices (Bjorn Helgaas)

* pci/resource:
  x86/PCI: Use pr_info() when possible
  x86/PCI: Fix log message typo
  x86/PCI: Tidy E820 removal messages
  PCI: Skip allocate_resource() if too little space available
  efi/x86: Remove EfiMemoryMappedIO from E820 map
2022-12-10 10:36:34 -06:00
Bjorn Helgaas
9303050181 Merge branch 'pci/portdrv'
- Squash portdrv_core.c and portdrv_pci.c into portdrv.c to make it easier
  to find things (Bjorn Helgaas)

- Allow AER service only for Root Ports & RCECs so portdrv can successfully
  bind to other devices that have AER but lack MSI (which they don't need
  for AER), which allows power management for those devices (Bjorn Helgaas)

* pci/portdrv:
  PCI/portdrv: Allow AER service only for Root Ports & RCECs
  PCI/portdrv: Unexport pcie_port_service_register(), pcie_port_service_unregister()
  PCI/portdrv: Move private things to portdrv.c
  PCI/portdrv: Squash into portdrv.c
2022-12-10 10:36:34 -06:00
Bjorn Helgaas
ec7c9a681d Merge branch 'pci/pm-agp'
- Convert AGP efficeon, intel, amd-k7, ati, nvidia to generic power
  management (Bjorn Helgaas)

* pci/pm-agp:
  agp/via: Update to DEFINE_SIMPLE_DEV_PM_OPS()
  agp/sis: Update to DEFINE_SIMPLE_DEV_PM_OPS()
  agp/amd64: Update to DEFINE_SIMPLE_DEV_PM_OPS()
  agp/nvidia: Convert to generic power management
  agp/ati: Convert to generic power management
  agp/amd-k7: Convert to generic power management
  agp/intel: Convert to generic power management
  agp/efficeon: Convert to generic power management
2022-12-10 10:36:33 -06:00
Bjorn Helgaas
e1f2d15397 Merge branch 'pci/pm'
- Remove unused 'state' parameter to pci_legacy_suspend_late() (Bjorn
  Helgaas)

* pci/pm:
  PCI/PM: Remove unused 'state' parameter to pci_legacy_suspend_late()
2022-12-10 10:36:33 -06:00
Bjorn Helgaas
eae10935ef Merge branch 'pci/misc'
- Use METHOD_NAME__UID instead of plain string to make it easier to find
  all uses (Yipeng Zou)

* pci/misc:
  PCI/ACPI: Use METHOD_NAME__UID instead of plain string
2022-12-10 10:36:32 -06:00
Bjorn Helgaas
84c3482963 Merge branch 'pci/hotplug'
- Enable pciehp by default if USB4 is enabled because USB4/Thunderbolt
  tunneling depends on native PCIe hotplug (Albert Zhou)

- Make sure pciehp binds only to Downstream Ports, not Upstream Ports
  (Rafael J. Wysocki)

- Remove unused get_mode1_ECC_cap callback in shpchp (Ian Cowan)

- Enable pciehp Command Completed Interrupt only if supported to reduce
  confusion when looking at lspci output (Pali Rohár)

* pci/hotplug:
  PCI: pciehp: Enable Command Completed Interrupt only if supported
  PCI: shpchp: Remove unused get_mode1_ECC_cap callback
  PCI: acpiphp: Avoid setting is_hotplug_bridge for PCIe Upstream Ports
  PCI/portdrv: Set PCIE_PORT_SERVICE_HP for Root and Downstream Ports only
  PCI: pciehp: Enable by default if USB4 enabled
2022-12-10 10:36:32 -06:00
Bjorn Helgaas
51ef4873c6 Merge branch 'pci/enumeration'
- Only read/write PCIe Link 2 registers for devices with Links and PCIe
  Capability version >= 2 (Maciej W. Rozycki)

- Revert a patch that cleared PCI_STATUS during enumeration because it
  broke Linux guests on Apple's virtualization framework (Bjorn Helgaas)

- Assign PCI domain IDs using IDAs so IDs can be easily reused after
  loading/unloading host bridge drivers (Pali Rohár)

- Fix pci_device_is_present(), which previously always returned "false" for
  VFs because their vendor ID is always 0xfff (Michael S. Tsirkin)

- Check for alloc failure in pci_request_irq() (Zeng Heng)

* pci/enumeration:
  PCI: Check for alloc failure in pci_request_irq()
  PCI: Fix pci_device_is_present() for VFs by checking PF
  PCI: Assign PCI domain IDs by ida_alloc()
  Revert "PCI: Clear PCI_STATUS when setting up device"
  PCI: Access Link 2 registers only for devices with Links
2022-12-10 10:36:32 -06:00
Bjorn Helgaas
cad4f43f36 Merge branch 'pci/doe'
- Fix calculation of DOE length to account for the "0 means 2^18 DWORDs"
  special case (Li Ming)

* pci/doe:
  PCI/DOE: Fix maximum data object length miscalculation
2022-12-10 10:36:31 -06:00
Bjorn Helgaas
d91482bb21 x86/PCI: Use pr_info() when possible
Use pr_info() and similar when possible.  No functional change intended.

Link: https://lore.kernel.org/r/20221209205131.GA1726524@bhelgaas
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2022-12-10 10:33:18 -06:00
Bjorn Helgaas
2bfa89fab5 x86/PCI: Fix log message typo
Add missing word in the log message:

  - ... so future kernels can this automatically
  + ... so future kernels can do this automatically

Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://lore.kernel.org/r/20221208190341.1560157-5-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
2022-12-10 10:33:18 -06:00
Bjorn Helgaas
00904bf64c x86/PCI: Tidy E820 removal messages
These messages:

  clipped [mem size 0x00000000 64bit] to [mem size 0xfffffffffffa0000 64bit] for e820 entry [mem 0x0009f000-0x000fffff]

aren't as useful as they could be because (a) the resource is often
IORESOURCE_UNSET, so we print the size instead of the start/end and (b) we
print the available resource even if it is empty after removing the E820
entry.

Print the available space by hand to avoid the IORESOURCE_UNSET problem and
only if it's non-empty.  No functional change intended.

Link: https://lore.kernel.org/r/20221208190341.1560157-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
2022-12-10 10:33:11 -06:00
Bjorn Helgaas
5c5fb3c3a7 PCI: Skip allocate_resource() if too little space available
pci_bus_alloc_from_region() allocates MMIO space by iterating through all
the resources available on the bus.  The available resource might be
reduced if the caller requires 32-bit space or we're avoiding BIOS or E820
areas.

Don't bother calling allocate_resource() if we need more space than is
available in this resource.  This prevents some pointless and annoying
messages about avoided areas.

Link: https://lore.kernel.org/r/20221208190341.1560157-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
2022-12-10 10:31:47 -06:00
Bjorn Helgaas
07eab0901e efi/x86: Remove EfiMemoryMappedIO from E820 map
Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped
by the OS so they can be accessed by EFI runtime services, but should have
no other significance to the OS (UEFI r2.10, sec 7.2).  However, most
bootloaders and EFI stubs convert EfiMemoryMappedIO regions to
E820_TYPE_RESERVED entries, which prevent Linux from allocating space from
them (see remove_e820_regions()).

Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI
host bridge windows, which means Linux can't allocate BAR space for
hot-added devices.

Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
problem.

Leave small (< 256KB) EfiMemoryMappedIO regions alone because on some
platforms, these describe non-window space that's included in host bridge
_CRS.  If we assign that space to PCI devices, they don't work.  On the
Lenovo X1 Carbon, this leads to suspend/resume failures.

The previous solution to the problem of allocating BARs in these regions
was to add pci_crs_quirks[] entries to disable E820 checking for these
machines (see d341838d77 ("x86/PCI: Disable E820 reserved region clipping
via quirks")):

  Acer   DMI_PRODUCT_NAME    Spin SP513-54N
  Clevo  DMI_BOARD_NAME      X170KM-G
  Lenovo DMI_PRODUCT_VERSION *IIL*

Florent reported the BAR allocation issue on the Clevo NL4XLU.  We could
add another quirk for the NL4XLU, but I hope this generic change can solve
it for many machines without having to add quirks.

This change has been tested on Clevo X170KM-G (Konrad) and Lenovo Ideapad
Slim 3 (Matt) and solves the problem even when overriding the existing
quirks by booting with "pci=use_e820".

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565     Clevo NL4XLU
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899    Ideapad Slim 3
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207    X1 Carbon
Link: https://lore.kernel.org/r/20221208190341.1560157-2-helgaas@kernel.org
Reported-by: Florent DELAHAYE <kernelorg@undead.fr>
Tested-by: Konrad J Hambrick <kjhambrick@gmail.com>
Tested-by: Matt Hansen <2lprbe78@duck.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
2022-12-10 10:31:42 -06:00
Bjorn Helgaas
d8d2b65a94 PCI/portdrv: Allow AER service only for Root Ports & RCECs
Previously portdrv allowed the AER service for any device with an AER
capability (assuming Linux had control of AER) even though the AER service
driver only attaches to Root Port and RCECs.

Because get_port_device_capability() included AER for non-RP, non-RCEC
devices, we tried to initialize the AER IRQ even though these devices
don't generate AER interrupts.

Intel DG1 and DG2 discrete graphics cards contain a switch leading to a
GPU.  The switch supports AER but not MSI, so initializing an AER IRQ
failed, and portdrv failed to claim the switch port at all.  The GPU itself
could be suspended, but the switch could not be put in a low-power state
because it had no driver.

Don't allow the AER service on non-Root Port, non-Root Complex Event
Collector devices.  This means we won't enable Bus Mastering if the device
doesn't require MSI, the AER service will not appear in sysfs, and the AER
service driver will not bind to the device.

Link: https://lore.kernel.org/r/20221207084105.84947-1-mika.westerberg@linux.intel.com
Link: https://lore.kernel.org/r/20221210002922.1749403-1-helgaas@kernel.org
Based-on-patch-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2022-12-10 10:26:40 -06:00
Kees Cook
e78e274eb2 NFSD: Avoid clashing function prototypes
When built with Control Flow Integrity, function prototypes between
caller and function declaration must match. These mismatches are visible
at compile time with the new -Wcast-function-type-strict in Clang[1].

There were 97 warnings produced by NFS. For example:

fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict]
        [OP_ACCESS]             = (nfsd4_dec)nfsd4_decode_access,
                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The enc/dec callbacks were defined as passing "void *" as the second
argument, but were being implicitly cast to a new type. Replace the
argument with union nfsd4_op_u, and perform explicit member selection
in the function body. There are no resulting binary differences.

Changes were made mechanically using the following Coccinelle script,
with minor by-hand fixes for members that didn't already match their
existing argument name:

@find@
identifier func;
type T, opsT;
identifier ops, N;
@@

 opsT ops[] = {
	[N] = (T) func,
 };

@already_void@
identifier find.func;
identifier name;
@@

 func(...,
-void
+union nfsd4_op_u
 *name)
 {
	...
 }

@proto depends on !already_void@
identifier find.func;
type T;
identifier name;
position p;
@@

 func@p(...,
 	T name
 ) {
	...
   }

@script:python get_member@
type_name << proto.T;
member;
@@

coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0])

@convert@
identifier find.func;
type proto.T;
identifier proto.name;
position proto.p;
identifier get_member.member;
@@

 func@p(...,
-	T name
+	union nfsd4_op_u *u
 ) {
+	T name = &u->member;
	...
   }

@cast@
identifier find.func;
type T, opsT;
identifier ops, N;
@@

 opsT ops[] = {
	[N] =
-	(T)
	func,
 };

Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:13 -05:00
Chuck Lever
4dd9daa912 SUNRPC: Fix crasher in unwrap_integ_data()
If a zero length is passed to kmalloc() it returns 0x10, which is
not a valid address. gss_verify_mic() subsequently crashes when it
attempts to dereference that pointer.

Instead of allocating this memory on every call based on an
untrusted size value, use a piece of dynamically-allocated scratch
memory that is always available.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10 11:01:13 -05:00
Chuck Lever
c65d9df0c4 SUNRPC: Make the svc_authenticate tracepoint conditional
Clean up: Simplify the tracepoint's only call site.

Also, I noticed that when svc_authenticate() returns SVC_COMPLETE,
it leaves rq_auth_stat set to an error value. That doesn't need to
be recorded in the trace log.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10 11:01:13 -05:00
Chuck Lever
9315564747 NFSD: Use only RQ_DROPME to signal the need to drop a reply
Clean up: NFSv2 has the only two usages of rpc_drop_reply in the
NFSD code base. Since NFSv2 is going away at some point, replace
these in order to simplify the "drop this reply?" check in
nfsd_dispatch().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10 11:01:13 -05:00
Chuck Lever
ad3d24c59d SUNRPC: Clean up xdr_write_pages()
Make it more evident how xdr_write_pages() updates the tail buffer
by using the convention of naming the iov pointer variable "tail".
I spent more than a couple of hours chasing through code to
understand this, so someone is likely to find this useful later.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10 11:01:13 -05:00
Chuck Lever
da522b5fe1 SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails
Fixes: 030d794bf4 ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication.")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10 11:01:13 -05:00
Dai Ngo
638593be55 NFSD: add CB_RECALL_ANY tracepoints
Add tracepoints to trace start and end of CB_RECALL_ANY operation.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
[ cel: added show_rca_mask() macro ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:12 -05:00
Dai Ngo
44df6f439a NFSD: add delegation reaper to react to low memory condition
The delegation reaper is called by nfsd memory shrinker's on
the 'count' callback. It scans the client list and sends the
courtesy CB_RECALL_ANY to the clients that hold delegations.

To avoid flooding the clients with CB_RECALL_ANY requests, the
delegation reaper sends only one CB_RECALL_ANY request to each
client per 5 seconds.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
[ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:12 -05:00
Dai Ngo
3959066b69 NFSD: add support for sending CB_RECALL_ANY
Add XDR encode and decode function for CB_RECALL_ANY.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:12 -05:00
Dai Ngo
a1049eb47f NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
Refactoring courtesy_client_reaper to generic low memory
shrinker so it can be used for other purposes.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:12 -05:00
Chuck Lever
247c01ff5f trace: Relocate event helper files
Steven Rostedt says:
> The include/trace/events/ directory should only hold files that
> are to create events, not headers that hold helper functions.
>
> Can you please move them out of include/trace/events/ as that
> directory is "special" in the creation of events.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-12-10 11:01:12 -05:00
Brian Foster
79a1d88a36 NFSD: pass range end to vfs_fsync_range() instead of count
_nfsd_copy_file_range() calls vfs_fsync_range() with an offset and
count (bytes written), but the former wants the start and end bytes
of the range to sync. Fix it up.

Fixes: eac0b17a77 ("NFSD add vfs_fsync after async copy is done")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:12 -05:00
Jeff Layton
9f27783b4d lockd: fix file selection in nlmsvc_cancel_blocked
We currently do a lock_to_openmode call based on the arguments from the
NLM_UNLOCK call, but that will always set the fl_type of the lock to
F_UNLCK, and the O_RDONLY descriptor is always chosen.

Fix it to use the file_lock from the block instead.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:11 -05:00
Jeff Layton
69efce009f lockd: ensure we use the correct file descriptor when unlocking
Shared locks are set on O_RDONLY descriptors and exclusive locks are set
on O_WRONLY ones. nlmsvc_unlock however calls vfs_lock_file twice, once
for each descriptor, but it doesn't reset fl_file. Ensure that it does.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:11 -05:00
Jeff Layton
75c7940d2a lockd: set missing fl_flags field when retrieving args
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:11 -05:00