commit 12a6d2940b upstream.
On s390 the modules loaded in memory have the text segment located after
the GOT and Relocation table. This can be seen with this output:
[root@m35lp76 perf]# fgrep qeth /proc/modules
qeth 151552 1 qeth_l2, Live 0x000003ff800b2000
...
[root@m35lp76 perf]# cat /sys/module/qeth/sections/.text
0x000003ff800b3990
[root@m35lp76 perf]#
There is an offset of 0x1990 bytes. The size of the qeth module is
151552 bytes (0x25000 in hex).
The location of the GOT/relocation table at the beginning of a module is
unique to s390.
commit 203d8a4aa6 ("perf s390: Fix 'start' address of module's map")
adjusts the start address of a module in the map structures, but does
not adjust the size of the modules. This leads to overlapping of module
maps as this example shows:
[root@m35lp76 perf] # ./perf report -D
0 0 0xfb0 [0xa0]: PERF_RECORD_MMAP -1/0: [0x3ff800b3990(0x25000)
@ 0]: x /lib/modules/.../qeth.ko.xz
0 0 0x1050 [0xb0]: PERF_RECORD_MMAP -1/0: [0x3ff800d85a0(0x8000)
@ 0]: x /lib/modules/.../ip6_tables.ko.xz
The module qeth.ko has an adjusted start address modified to b3990, but
its size is unchanged and the module ends at 0x3ff800d8990. This end
address overlaps with the next modules start address of 0x3ff800d85a0.
When the size of the leading GOT/Relocation table stored in the
beginning of the text segment (0x1990 bytes) is subtracted from module
qeth end address, there are no overlaps anymore:
0x3ff800d8990 - 0x1990 = 0x0x3ff800d7000
which is the same as
0x3ff800b2000 + 0x25000 = 0x0x3ff800d7000.
To fix this issue, also adjust the modules size in function
arch__fix_module_text_start(). Add another function parameter named size
and reduce the size of the module when the text segment start address is
changed.
Output after:
0 0 0xfb0 [0xa0]: PERF_RECORD_MMAP -1/0: [0x3ff800b3990(0x23670)
@ 0]: x /lib/modules/.../qeth.ko.xz
0 0 0x1050 [0xb0]: PERF_RECORD_MMAP -1/0: [0x3ff800d85a0(0x7a60)
@ 0]: x /lib/modules/.../ip6_tables.ko.xz
Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 203d8a4aa6 ("perf s390: Fix 'start' address of module's map")
Link: http://lkml.kernel.org/r/20190724122703.3996-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e45c48a9a4 ]
This patch adds the necessary intelligence to properly compute the value
of 'old' and 'head' when operating in snapshot mode. That way we can
get the latest information in the AUX buffer and be compatible with the
generic AUX ring buffer mechanic.
Tester notes:
> Leo, have you had the chance to test/review this one? Suzuki?
Sure. I applied this patch on the perf/core branch (with latest
commit 3e4fbf36c1e3 'perf augmented_raw_syscalls: Move reading
filename to the loop') and passed testing with below steps:
# perf record -e cs_etm/@tmc_etr0/ -S -m,64 --per-thread ./sort &
[1] 19097
Bubble sorting array of 30000 elements
# kill -USR2 19097
# kill -USR2 19097
# kill -USR2 19097
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.753 MB perf.data ]
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190605161633.12245-1-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6738028dd5 ]
Command 'perf record' and 'perf report' on a system without kernel
debuginfo packages uses /proc/kallsyms and /proc/modules to find
addresses for kernel and module symbols. On x86 this works for root and
non-root users.
On s390, when invoked as non-root user, many of the following warnings
are shown and module symbols are missing:
proc/{kallsyms,modules} inconsistency while looking for
"[sha1_s390]" module!
Command 'perf record' creates a list of module start addresses by
parsing the output of /proc/modules and creates a PERF_RECORD_MMAP
record for the kernel and each module. The following function call
sequence is executed:
machine__create_kernel_maps
machine__create_module
modules__parse
machine__create_module --> for each line in /proc/modules
arch__fix_module_text_start
Function arch__fix_module_text_start() is s390 specific. It opens
file /sys/module/<name>/sections/.text to extract the module's .text
section start address. On s390 the module loader prepends a header
before the first section, whereas on x86 the module's text section
address is identical the the module's load address.
However module section files are root readable only. For non-root the
read operation fails and machine__create_module() returns an error.
Command perf record does not generate any PERF_RECORD_MMAP record
for loaded modules. Later command perf report complains about missing
module maps.
To fix this function arch__fix_module_text_start() always returns
success. For root users there is no change, for non-root users
the module's load address is used as module's text start address
(the prepended header then counts as part of the text section).
This enable non-root users to use module symbols and avoid the
warning when perf report is executed.
Output before:
[tmricht@m83lp54 perf]$ ./perf report -D | fgrep MMAP
0 0x168 [0x50]: PERF_RECORD_MMAP ... x [kernel.kallsyms]_text
Output after:
[tmricht@m83lp54 perf]$ ./perf report -D | fgrep MMAP
0 0x168 [0x50]: PERF_RECORD_MMAP ... x [kernel.kallsyms]_text
0 0x1b8 [0x98]: PERF_RECORD_MMAP ... x /lib/modules/.../autofs4.ko.xz
0 0x250 [0xa8]: PERF_RECORD_MMAP ... x /lib/modules/.../sha_common.ko.xz
0 0x2f8 [0x98]: PERF_RECORD_MMAP ... x /lib/modules/.../des_generic.ko.xz
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: http://lkml.kernel.org/r/20190522144601.50763-4-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c6f709b9f ]
Users should never use 'pt=0', but if they do it may give a meaningless
error:
$ perf record -e intel_pt/pt=0/u uname
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for
event (intel_pt/pt=0/u).
Fix that by forcing 'pt=1'.
Committer testing:
# perf record -e intel_pt/pt=0/u uname
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (intel_pt/pt=0/u).
/bin/dmesg | grep -i perf may provide additional information.
# perf record -e intel_pt/pt=0/u uname
pt=0 doesn't make sense, forcing pt=1
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data ]
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/b7c5b4e5-9497-10e5-fd43-5f3e4a0fe51d@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9068533e4f ]
For powerpc64, perf will filter out the second entry in the callchain,
i.e. the LR value, if the return address of the function corresponding
to the probed location has already been saved on its caller's stack.
The state of the return address is determined using debug information.
At any point within a function, if the return address is already saved
somewhere, a DWARF expression can tell us about its location. If the
return address in still in LR only, no DWARF expression would exist.
Typically, the instructions in a function's prologue first copy the LR
value to R0 and then pushes R0 on to the stack. If LR has already been
copied to R0 but R0 is yet to be pushed to the stack, we can still get a
DWARF expression that says that the return address is in R0. This is
indicating that getting a DWARF expression for the return address does
not guarantee the fact that it has already been saved on the stack.
This can be observed on a powerpc64le system running Fedora 27 as shown
below.
# objdump -d /usr/lib64/libc-2.26.so | less
...
000000000015af20 <inet_pton>:
15af20: 0b 00 4c 3c addis r2,r12,11
15af24: e0 c1 42 38 addi r2,r2,-15904
15af28: a6 02 08 7c mflr r0
15af2c: f0 ff c1 fb std r30,-16(r1)
15af30: f8 ff e1 fb std r31,-8(r1)
15af34: 78 1b 7f 7c mr r31,r3
15af38: 78 23 83 7c mr r3,r4
15af3c: 78 2b be 7c mr r30,r5
15af40: 10 00 01 f8 std r0,16(r1)
15af44: c1 ff 21 f8 stdu r1,-64(r1)
15af48: 28 00 81 f8 std r4,40(r1)
...
# readelf --debug-dump=frames-interp /usr/lib64/libc-2.26.so | less
...
00027024 0000000000000024 00027028 FDE cie=00000000 pc=000000000015af20..000000000015af88
LOC CFA r30 r31 ra
000000000015af20 r1+0 u u u
000000000015af34 r1+0 c-16 c-8 r0
000000000015af48 r1+64 c-16 c-8 c+16
000000000015af5c r1+0 c-16 c-8 c+16
000000000015af78 r1+0 u u
...
# perf probe -x /usr/lib64/libc-2.26.so -a inet_pton+0x18
# perf record -e probe_libc:inet_pton -g ping -6 -c 1 ::1
# perf script
Before:
ping 2829 [005] 512917.460174: probe_libc:inet_pton: (7fff7e2baf38)
7fff7e2baf38 __GI___inet_pton+0x18 (/usr/lib64/libc-2.26.so)
7fff7e2705b4 getaddrinfo+0x164 (/usr/lib64/libc-2.26.so)
12f152d70 _init+0xbfc (/usr/bin/ping)
7fff7e1836a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so)
7fff7e183898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so)
0 [unknown] ([unknown])
After:
ping 2829 [005] 512917.460174: probe_libc:inet_pton: (7fff7e2baf38)
7fff7e2baf38 __GI___inet_pton+0x18 (/usr/lib64/libc-2.26.so)
7fff7e26fa54 gaih_inet.constprop.7+0xf44 (/usr/lib64/libc-2.26.so)
7fff7e2705b4 getaddrinfo+0x164 (/usr/lib64/libc-2.26.so)
12f152d70 _init+0xbfc (/usr/bin/ping)
7fff7e1836a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so)
7fff7e183898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so)
0 [unknown] ([unknown])
Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Maynard Johnson <maynard@us.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/66e848a7bdf2d43b39210a705ff6d828a0865661.1530724939.git.sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 354b064b8e ]
In some cases, a symbol may have multiple aliases. Attempting to add an
entry probe for such symbols results in a probe being added at an
incorrect location while it fails altogether for return probes. This is
only applicable for binaries with debug information.
During the arch-dependent post-processing, the offset from the start of
the symbol at which the probe is to be attached is determined and added
to the start address of the symbol to get the probe's location. In case
there are multiple aliases, this offset gets added multiple times for
each alias of the symbol and we end up with an incorrect probe location.
This can be verified on a powerpc64le system as shown below.
$ nm /lib/modules/$(uname -r)/build/vmlinux | grep "sys_open$"
...
c000000000414290 T __se_sys_open
c000000000414290 T sys_open
$ objdump -d /lib/modules/$(uname -r)/build/vmlinux | grep -A 10 "<__se_sys_open>:"
c000000000414290 <__se_sys_open>:
c000000000414290: 19 01 4c 3c addis r2,r12,281
c000000000414294: 70 c4 42 38 addi r2,r2,-15248
c000000000414298: a6 02 08 7c mflr r0
c00000000041429c: e8 ff a1 fb std r29,-24(r1)
c0000000004142a0: f0 ff c1 fb std r30,-16(r1)
c0000000004142a4: f8 ff e1 fb std r31,-8(r1)
c0000000004142a8: 10 00 01 f8 std r0,16(r1)
c0000000004142ac: c1 ff 21 f8 stdu r1,-64(r1)
c0000000004142b0: 78 23 9f 7c mr r31,r4
c0000000004142b4: 78 1b 7e 7c mr r30,r3
For both the entry probe and the return probe, the probe location
should be _text+4276888 (0xc000000000414298). Since another alias
exists for 'sys_open', the post-processing code will end up adding
the offset (8 for powerpc64le) twice and perf will attempt to add
the probe at _text+4276896 (0xc0000000004142a0) instead.
Before:
# perf probe -v -a sys_open
probe-definition(0): sys_open
symbol:sys_open file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_open address found : c000000000414290
Matched function: __se_sys_open [2ad03a0]
Probe point found: __se_sys_open+0
Found 1 probe_trace_events.
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Writing event: p:probe/sys_open _text+4276896
Added new event:
probe:sys_open (on sys_open)
...
# perf probe -v -a sys_open%return $retval
probe-definition(0): sys_open%return
symbol:sys_open file:(null) line:0 offset:0 return:1 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_open address found : c000000000414290
Matched function: __se_sys_open [2ad03a0]
Probe point found: __se_sys_open+0
Found 1 probe_trace_events.
Opening /sys/kernel/debug/tracing/README write=0
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Parsing probe_events: p:probe/sys_open _text+4276896
Group:probe Event:sys_open probe:p
Writing event: r:probe/sys_open__return _text+4276896
Failed to write event: Invalid argument
Error: Failed to add events. Reason: Invalid argument (Code: -22)
After:
# perf probe -v -a sys_open
probe-definition(0): sys_open
symbol:sys_open file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_open address found : c000000000414290
Matched function: __se_sys_open [2ad03a0]
Probe point found: __se_sys_open+0
Found 1 probe_trace_events.
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Writing event: p:probe/sys_open _text+4276888
Added new event:
probe:sys_open (on sys_open)
...
# perf probe -v -a sys_open%return $retval
probe-definition(0): sys_open%return
symbol:sys_open file:(null) line:0 offset:0 return:1 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_open address found : c000000000414290
Matched function: __se_sys_open [2ad03a0]
Probe point found: __se_sys_open+0
Found 1 probe_trace_events.
Opening /sys/kernel/debug/tracing/README write=0
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Parsing probe_events: p:probe/sys_open _text+4276888
Group:probe Event:sys_open probe:p
Writing event: r:probe/sys_open__return _text+4276888
Added new event:
probe:sys_open__return (on sys_open%return)
...
Reported-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Fixes: 99e608b595 ("perf probe ppc64le: Fix probe location when using DWARF")
Link: http://lkml.kernel.org/r/20180809161929.35058-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 143c99f6ac ]
For some cases, the callchain provided by the kernel may be empty. So,
the callchain ip filtering code will cause a crash if we do not check
whether the struct ip_callchain pointer is NULL before accessing any
members.
This can be observed on a powerpc64le system running Fedora 27 as shown
below.
# perf record -b -e cycles:u ls
Before:
# perf report --branch-history
perf: Segmentation fault
-------- backtrace --------
perf[0x1027615c]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff856304d8]
perf(arch_skip_callchain_idx+0x44)[0x10257c58]
perf[0x1017f2e4]
perf(thread__resolve_callchain+0x124)[0x1017ff5c]
perf(sample__resolve_callchain+0xf0)[0x10172788]
...
After:
# perf report --branch-history
Samples: 25 of event 'cycles:u', Event count (approx.): 2306870
Overhead Source:Line Symbol Shared Object
+ 11.60% _init+35736 [.] _init ls
+ 9.84% strcoll_l.c:137 [.] __strcoll_l libc-2.26.so
+ 9.16% memcpy.S:175 [.] __memcpy_power7 libc-2.26.so
+ 9.01% gconv_charset.h:54 [.] _nl_find_locale libc-2.26.so
+ 8.87% dl-addr.c:52 [.] _dl_addr libc-2.26.so
+ 8.83% _init+236 [.] _init ls
...
Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20180611104049.11048-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When building tools/perf/ it rightly complains about a number of .h
files being out of sync. Fix this up by syncing them properly with the
relevant in-kernel versions.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Perf uretprobe probes on GEP(Global Entry Point) which fails to record
all function calls via LEP(Local Entry Point). Fix that by probing on LEP.
Objdump:
00000000100005f0 <doit>:
100005f0: 02 10 40 3c lis r2,4098
100005f4: 00 7f 42 38 addi r2,r2,32512
100005f8: a6 02 08 7c mflr r0
100005fc: 10 00 01 f8 std r0,16(r1)
10000600: f8 ff e1 fb std r31,-8(r1)
Before applying patch:
$ cat /sys/kernel/debug/tracing/uprobe_events
r:probe_uprobe_test/doit /home/ravi/uprobe_test:0x00000000000005f0
After applying patch:
$ cat /sys/kernel/debug/tracing/uprobe_events
r:probe_uprobe_test/doit /home/ravi/uprobe_test:0x00000000000005f8
This is not the case with kretprobes because the kernel itself finds LEP
and probes on it.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1475576865-6562-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In 293d5b4394 ("perf probe: Support probing on offline cross-arch binary")
DWARF register tables were introduced for many architectures, with the one for
the "dx" register being broken for x86_64, which got noticed by the 'perf test
bpf' testcase, that has this difference from a successful run to one that
fails, with the aforementioned patch:
-Writing event: p:perf_bpf_probe/func _text+5197232 f_mode=+68(%di):x32 offset=%si:s64 orig=dx:s32
-Failed to write event: Invalid argument
-bpf_probe: failed to apply perf probe eventsFailed to add events selected by BPF
+Writing event: p:perf_bpf_probe/func _text+5197232 f_mode=+68(%di):x32 offset=%si:s64 orig=%dx:s32
Add the missing '%' to '%dx' to fix this.
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 293d5b4394 ("perf probe: Support probing on offline cross-arch binary")
Link: https://lkml.kernel.org/r/20160909145955.GC32585@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Support probing on offline cross-architecture binary by adding getting
the target machine arch from ELF and choose correct register string for
the machine.
Here is an example:
-----
$ perf probe --vmlinux=./vmlinux-arm --definition 'do_sys_open $params'
p:probe/do_sys_open do_sys_open+0 dfd=%r5:s32 filename=%r1:u32 flags=%r6:s32 mode=%r3:u16
-----
Here, we can get probe/do_sys_open from above and append it to to the target
machine's tracing/kprobe_events file in the tracefs mountput, usually
/sys/kernel/debug/tracing/kprobe_events (or /sys/kernel/tracing/kprobe_events).
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/147214229717.23638.6440579792548044658.stgit@devbox
[ Add definition for EM_AARCH64 to fix the build on at least centos 6, debian 7 & ubuntu 12.04.5 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In order to successfully decode Intel PT traces, context switch events
are needed from the moment the trace starts. Currently that is ensured
by using the 'immediate' flag which enables the switch event when it is
opened.
However, since commit 86c2786994 ("perf intel-pt: Add support for
PERF_RECORD_SWITCH") that might not always happen. When tracing
system-wide the context switch event is added to the tracking event
which was not set as 'immediate'. Change that so it is.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 86c2786994 ("perf intel-pt: Add support for PERF_RECORD_SWITCH")
Link: http://lkml.kernel.org/r/1471245784-22580-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Powerpc has Global Entry Point and Local Entry Point for functions. LEP
catches call from both the GEP and the LEP. Symbol table of ELF contains
GEP and Offset from which we can calculate LEP, but debuginfo does not
have LEP info.
Currently, perf prioritize symbol table over dwarf to probe on LEP for
ppc64le. But when user tries to probe with function parameter, we fall
back to using dwarf(i.e. GEP) and when function called via LEP, probe
will never hit.
For example:
$ objdump -d vmlinux
...
do_sys_open():
c0000000002eb4a0: e8 00 4c 3c addis r2,r12,232
c0000000002eb4a4: 60 00 42 38 addi r2,r2,96
c0000000002eb4a8: a6 02 08 7c mflr r0
c0000000002eb4ac: d0 ff 41 fb std r26,-48(r1)
$ sudo ./perf probe do_sys_open
$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_sys_open _text+3060904
$ sudo ./perf probe 'do_sys_open filename:string'
$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_sys_open _text+3060896 filename_string=+0(%gpr4):string
For second case, perf probed on GEP. So when function will be called via
LEP, probe won't hit.
$ sudo ./perf record -a -e probe:do_sys_open ls
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.195 MB perf.data ]
To resolve this issue, let's not prioritize symbol table, let perf
decide what it wants to use. Perf is already converting GEP to LEP when
it uses symbol table. When perf uses debuginfo, let it find LEP offset
form symbol table. This way we fall back to probe on LEP for all cases.
After patch:
$ sudo ./perf probe 'do_sys_open filename:string'
$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_sys_open _text+3060904 filename_string=+0(%gpr4):string
$ sudo ./perf record -a -e probe:do_sys_open ls
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.197 MB perf.data (11 samples) ]
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470723805-5081-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf fixes from Thomas Gleixner:
"This update contains:
- a fix for the bpf tools to use the new EM_BPF code
- a fix for the module parser of perf to retrieve the
proper text start address
- add str_error_c to libapi to avoid linking against
tools/lib/str_error_r.o"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools lib api: Add str_error_c to libapi
perf s390: Fix 'start' address of module's map
tools lib bpf: Use official ELF e_machine value
Pull libnvdimm updates from Dan Williams:
- Replace pcommit with ADR / directed-flushing.
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement
either ADR, or provide one or more flush addresses per nvdimm.
ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
to the memory controller on a power-fail event.
Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
A flush hint is an mmio address that when written and fenced assures
that all previous posted writes targeting a given dimm have been
flushed to media.
- On-demand ARS (address range scrub).
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the
media to refresh the bad block list, userspace can also request a
re-scrub at any time.
- Support for the Microsoft DSM (device specific method) command
format.
- Support for EDK2/OVMF virtual disk device memory ranges.
- Various fixes and cleanups across the subsystem.
* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
nfit: do an ARS scrub on hitting a latent media error
nfit: move to nfit/ sub-directory
nfit, libnvdimm: allow an ARS scrub to be triggered on demand
libnvdimm: register nvdimm_bus devices with an nd_bus driver
pmem: clarify a debug print in pmem_clear_poison
x86/insn: remove pcommit
Revert "KVM: x86: add pcommit support"
nfit, tools/testing/nvdimm/: unify shutdown paths
libnvdimm: move ->module to struct nvdimm_bus_descriptor
nfit: cleanup acpi_nfit_init calling convention
nfit: fix _FIT evaluation memory leak + use after free
tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
tools/testing/nvdimm: add virtual ramdisk range
acpi, nfit: treat virtual ramdisk SPA as pmem region
pmem: kill __pmem address space
pmem: kill wmb_pmem()
libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
fs/dax: remove wmb_pmem()
libnvdimm, pmem: flush posted-write queues on shutdown
...
At present, when creating module's map, perf gets 'start' address by
parsing '/proc/modules', but it's the module base address, it isn't the
start address of the '.text' section.
In most arches, it's OK. But for s390, it places 'GOT' and 'PLT'
relocations before '.text' section. So there exists an offset between
module base address and '.text' section, which will incur wrong symbol
resolution for modules.
Fix this bug by getting 'start' address of module's map from parsing
'/sys/module/[module name]/sections/.text', not from '/proc/modules'.
Signed-off-by: Song Shan Gong <gongss@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/1469070651-6447-2-git-send-email-gongss@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The tools so far have been using the strerror_r() GNU variant, that
returns a string, be it the buffer passed or something else.
But that, besides being tricky in cases where we expect that the
function using strerror_r() returns the error formatted in a provided
buffer (we have to check if it returned something else and copy that
instead), breaks the build on systems not using glibc, like Alpine
Linux, where musl libc is used.
So, introduce yet another wrapper, str_error_r(), that has the GNU
interface, but uses the portable XSI variant of strerror_r(), so that
users rest asured that the provided buffer is used and it is what is
returned.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Noticed by the build system, that emitted this warning:
Warning: x86_64's syscall_64.tbl differs from kernel
This was due to the wiring up of the recently added preadv2 & pwritev2
syscalls to the compat code, which hadn't been done by the patch
introducing those syscalls: 4babf2c5ef ("x86: wire up preadv2 and
pwritev2").
The patch doing the compat wiring was:
482dd2ef12 ("x86/syscalls: Wire up compat readv2/writev2 syscalls")
This just silences the perf build warning, as compat syscalls still
can't be supported in 'perf trace´ due to limitations in the
raw_syscalls:sys_{enter,exit} tracepoints it relies on.
Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-4dm8eoy0wslgtwqdhz64ods0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add few more triplets based on Fedora and Ubuntu binutils (cross tools).
Before applying patch on x86:
( Install binutils-powerpc64-linux-gnu.x86_64 )
$ perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc \
--objdump powerpc64-linux-gnu-objdump
After applying patch on x86:
$ perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc
I.e. it will find the right objdump from the environment data recorded
in the perf.data file + these triplets.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: http://lkml.kernel.org/r/1466769240-12376-7-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Support cross unwinding, i.e. collecting '--call-graph dwarf' perf.data files
in one machine and then doing analysis in another machine of a different
hardware architecture. This enables, for instance, to do:
perf record -a --call-graph dwarf
on a x86-32 or aarch64 system and then do 'perf report' on it on a
x86_64 workstation. (He Kuang)
- Fix crash in build_id_cache__kallsyms_path(), recent regression (Wang Nan)
Infrastructure changes:
- Make tools/lib/bpf use the IS_ERR return facility consistently and also stop
using the _get_ term for non-reference count methods (Arnaldo Carvalho de Melo)
- 'perf config' refactorings (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Tooling support for TopDown counters, recently added to the kernel (Andi Kleen)
- Show call graphs in 'perf script' when 1st event doesn't have it but some other has (He Kuang)
- Fix terminal cleanup when handling invalid .perfconfig files in 'perf top' (Taeung Song)
Build fixes:
- Respect CROSS_COMPILE for the linker in libapi (Lucas Stach)
Infrastructure changes:
- Fix perf_evlist__alloc_mmap() failure path (Wang Nan)
- Provide way to extract integer value from format_field (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add basic plumbing for TopDown in perf stat
TopDown is intended to replace the frontend cycles idle/ backend cycles
idle metrics in standard perf stat output. These metrics are not
reliable in many workloads, due to out of order effects.
This implements a new --topdown mode in perf stat (similar to
--transaction) that measures the pipe line bottlenecks using
standardized formulas. The measurement can be all done with 5 counters
(one fixed counter)
The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring
that describe the CPU pipeline behavior on a high level.
The full top down methology has many hierarchical metrics. This
implementation only supports level 1 which can be collected without
multiplexing. A full implementation of top down on top of perf is
available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools)
The current version works on Intel Core CPUs starting with Sandy Bridge,
and Atom CPUs starting with Silvermont. In principle the generic
metrics should be also implementable on other out of order CPUs.
TopDown level 1 uses a set of abstracted metrics which are generic to
out of order CPU cores (although some CPUs may not implement all of
them):
topdown-total-slots Available slots in the pipeline
topdown-slots-issued Slots issued into the pipeline
topdown-slots-retired Slots successfully retired
topdown-fetch-bubbles Pipeline gaps in the frontend
topdown-recovery-bubbles Pipeline gaps during recovery
from misspeculation
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
Add a new --topdown options to enable events. When --topdown is
specified set up events for all topdown events supported by the kernel.
Add topdown-* as a special case to the event parser, as is needed for
all events containing -.
The actual code to compute the metrics is in follow-on patches.
v2: Use standard sysctl read function.
v3: Move x86 specific code to arch/
v4: Enable --metric-only implicitly for topdown.
v5: Add --single-thread option to not force per core mode
v6: Fix output order of topdown metrics
v7: Allow combining with -d
v8: Remove --single-thread again
v9: Rename functions, adding arch_ and topdown_.
v10: Expand man page and describe TopDown better
Paste intro into commit description.
Print error when malloc fails.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>