Commit Graph

950968 Commits

Author SHA1 Message Date
Johannes Thumshirn
c965d6402f btrfs: handle errors from async submission
Btrfs' async submit mechanism is able to handle errors in the submission
path and the meta-data async submit function correctly passes the error
code to the caller.

In btrfs_submit_bio_start() and btrfs_submit_bio_start_direct_io() we're
not handling the errors returned by btrfs_csum_one_bio() correctly though
and simply call BUG_ON(). This is unnecessary as the caller of these two
functions - run_one_async_start - correctly checks for the return values
and sets the status of the async_submit_bio. The actual bio submission
will be handled later on by run_one_async_done only if
async_submit_bio::status is 0, so the data won't be written if we
encountered an error in the checksum process.

Simply return the error from btrfs_csum_one_bio() to the async submitters,
like it's done in btree_submit_bio_start().

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-19 18:26:14 +02:00
brookxu
27bc446e2d ext4: limit the length of per-inode prealloc list
In the scenario of writing sparse files, the per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.

After patching, we observed that the sys ratio of cpu has dropped, and
the system throughput has increased significantly. We created a process
to write the sparse file, and the running time of the process on the
fixed kernel was significantly reduced, as follows:

Running time on unfixed kernel:
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m2.051s
user    0m0.008s
sys     0m2.026s

Running time on fixed kernel:
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m0.471s
user    0m0.004s
sys     0m0.395s

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:36 -04:00
brookxu
66d5e0277e ext4: reorganize if statement of ext4_mb_release_context()
Reorganize the if statement of ext4_mb_release_context(), make it
easier to read.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/5439ac6f-db79-ad68-76c1-a4dda9aa0cc3@gmail.com
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:36 -04:00
brookxu
c55ee7d202 ext4: add mb_debug logging when there are lost chunks
Lost chunks are when some other process raced with the current thread
to grab a particular block allocation.  Add mb_debug log for
developers who wants to see how often this is happening for a
particular workload.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/0a165ac0-1912-aebd-8a0d-b42e7cd1aea1@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:36 -04:00
kyoungho koo
7ca4fcba92 ext4: Fix comment typo "the the".
I have found double typed comments "the the". So i modified it to
one "the"

Signed-off-by: kyoungho koo <rnrudgh@gmail.com>
Link: https://lore.kernel.org/r/20200424171620.GA11943@koo-Z370-HD3
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:35 -04:00
Shijie Luo
00a3fff071 jbd2: clean up checksum verification in do_one_pass()
Remove the unnecessary chksum_err and checksum_seen variables as well as
some redundant code to make the function easier to understand.

[ With changes suggested by jack@ and tytso@ ]

Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200819122955.33526-1-luoshijie1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:35 -04:00
Sameer Pujar
b90b925fd5 ALSA: hda: avoid reset of sdo_limit
By default 'sdo_limit' is initialized with a default value of '8'
as per spec. This is overridden in cases where a different value is
required. However this is getting reset when snd_hdac_bus_init_chip()
is called again, which happens during runtime PM cycle.

Avoid this reset by moving 'sdo_limit' setup to 'snd_hdac_bus_init()'
function which would be called only once.

Fixes: 67ae482a59 ("ALSA: hda: add member to store ratio for stripe control")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Link: https://lore.kernel.org/r/1597851130-6765-1-git-send-email-spujar@nvidia.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-08-19 17:37:06 +02:00
Imre Deak
4a4064ad79 drm/i915/tgl: Make sure TC-cold is blocked before enabling TC AUX power wells
The dependency between power wells is determined by the ordering of the
power well list: when enabling the power wells for a domain, this
happens walking the power well list forward, while disabling them
happens in the reverse direction. Accordingly a power well on the list
must follow any other power well it depends on.

Since the TC AUX power wells depend on TC-cold being blocked, move the
TC-cold off power well before all AUX power wells.

Fixes: 3c02934b24 ("drm/i915/tc/tgl: Implement TC cold sequences")
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200720232952.16228-1-imre.deak@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit b302a2e688)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19 15:23:43 +03:00
George Spelvin
c43a87f537 drm/i915/selftests: Avoid passing a random 0 into ilog2
igt_mm_config() calls ilog2() on the (pseudo)random 21-bit number
s>>12.  Once in 2 million seeds, this is zero and ilog2 summons
the nasal demons.

There was an attempt to handle this case with a max(), but that's
too late; ms could already be something bizarre.

Given that the low 12 bits of s and ms are always zero, it's a lot
simpler just to divide them by 4096, then everything fits into 32
bits, and we can easily generate a random number 1 <= s <= 0x1fffff.

Fixes: 14d1b9a624 ("drm/i915: buddy allocator")
Signed-off-by: George Spelvin <lkml@sdf.org>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325192429.GA8865@SDF.ORG
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 21118e8e56)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19 15:23:36 +03:00
Tianjia Zhang
c67f0c2831 drm/i915: Fix wrong return value in intel_atomic_check()
In the case of calling check_digital_port_conflicts() failed, a
negative error code -EINVAL should be returned.

Fixes: bf5da83e4b ("drm/i915: Move check_digital_port_conflicts() earier")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200802111535.5200-1-tianjia.zhang@linux.alibaba.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 66b51b801d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19 15:23:27 +03:00
Matt Roper
5fd73c5370 drm/i915: Update bw_buddy pagemask table
A recent bspec update removed the LPDDR4 single channel entry from the
buddy register table, but added a new four-channel entry.

Workaround 1409767108 hasn't been updated with any guidance for four
channel configurations, so we leave that alternate table unchanged for
now.

Bspec 49218
Fixes: 3fa01d642f ("drm/i915/tgl: Program BW_BUDDY registers during display init")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200612204734.3674650-1-matthew.d.roper@intel.com
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit ecb40d0826)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19 15:23:20 +03:00
Chris Wilson
b7c6646117 drm/i915/display: Check for an LPSP encoder before dereferencing
Avoid a GPF at

<1>[   20.177320] BUG: kernel NULL pointer dereference, address: 000000000000007c
<1>[   20.177322] #PF: supervisor read access in kernel mode
<1>[   20.177323] #PF: error_code(0x0000) - not-present page
<6>[   20.177324] PGD 0 P4D 0
<4>[   20.177327] Oops: 0000 [#1] PREEMPT SMP PTI
<4>[   20.177328] CPU: 1 PID: 944 Comm: debugfs_test Not tainted 5.8.0-rc7-CI-CI_DRM_8814+ #1
<4>[   20.177330] Hardware name: Dell Inc. XPS 13 9360/0823VW, BIOS 2.9.0 07/09/2018
<4>[   20.177372] RIP: 0010:i915_lpsp_capability_show+0x44/0xc0 [i915]
<4>[   20.177374] Code: 0f b6 81 ca 0d 00 00 3c 0b 74 77 76 19 3c 0c 75 44 83 7e 7c 01 7e 2f 48 c7 c6 d7 b9 47 a0 e8 43 df 06 e1 31 c0 c3 3c 09 72 2b <8b> 46 7c 85 c0 75 e6 8b 82 e4 00 00 00 89 c2 83 e2 fb 83 fa 0a 74
<4>[   20.177376] RSP: 0018:ffffc90000cebe38 EFLAGS: 00010246
<4>[   20.177377] RAX: 0000000000000009 RBX: ffff888267fe6a58 RCX: ffff888252d10000
<4>[   20.177378] RDX: ffff88824a9a4000 RSI: 0000000000000000 RDI: ffff888267fe6a30
<4>[   20.177379] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
<4>[   20.177380] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000cebf08
<4>[   20.177381] R13: 00000000ffffffff R14: 0000000000000001 R15: ffff888267fe6a30
<4>[   20.177383] FS:  00007f6f9c6b5e40(0000) GS:ffff888276480000(0000) knlGS:0000000000000000
<4>[   20.177384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   20.177385] CR2: 000000000000007c CR3: 0000000255f04006 CR4: 00000000003606e0
<4>[   20.177386] Call Trace:
<4>[   20.177390]  seq_read+0xcb/0x420

which is presumably from having no encoder attached at that time.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2175
Fixes: 8806211fe7 ("drm/i915: Add i915_lpsp_capability debugfs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Animesh Manna <animesh.manna@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729130912.30093-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit a22b1a9bb0)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19 15:23:13 +03:00
Chris Wilson
c499f6cb5e drm/i915: Copy default modparams to mock i915_device
Since we use the module parameters stored inside the drm_i915_device
itself, we need to ensure the mock i915_device also sets up the right
defaults.

Fixes: 8a25c4be58 ("drm/i915/params: switch to device specific parameters")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728150600.4509-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 98ef067453)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19 15:23:07 +03:00
Chris Wilson
df3ab3cb7e drm/i915: Provide the perf pmu.module
Rather than manually implement our own module reference counting for perf
pmu events, finally realise that there is a module parameter to struct
pmu for this very purpose.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716094643.31410-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 27e897beec)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19 15:23:00 +03:00
Thinh Nguyen
9a469bc9f3 usb: uas: Add quirk for PNY Pro Elite
PNY Pro Elite USB 3.1 Gen 2 device (SSD) doesn't respond to ATA_12
pass-through command (i.e. it just hangs). If it doesn't support this
command, it should respond properly to the host. Let's just add a quirk
to be able to move forward with other operations.

Cc: stable@vger.kernel.org
Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
Link: https://lore.kernel.org/r/2b0585228b003eedcc82db84697b31477df152e0.1597803605.git.thinhn@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19 14:12:37 +02:00
Heikki Krogerus
9ca325ffca tools: usb: move to tools buildsystem
Converting the Makefile to use the new tools buildsystem.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
[fixes builds with O=...]
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20200819071733.60028-1-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19 14:11:44 +02:00
Greg Kroah-Hartman
ab565f7eb1 Merge tag 'fixes-for-v5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:

USB: fixes for v5.9-rc

Three ZLP fixes on dwc3 and a resource leak fix on the TCM gadget

Signed-off-by: Felipe Balbi <balbi@kernel.org>

* tag 'fixes-for-v5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
  usb: dwc3: gadget: Handle ZLP for sg requests
  usb: dwc3: gadget: Fix handling ZLP
  usb: dwc3: gadget: Don't setup more than requested
  usb: gadget: f_tcm: Fix some resource leaks in some error paths
2020-08-19 14:09:37 +02:00
Jani Nikula
e9e3086b3d Merge tag 'gvt-next-fixes-2020-08-05' of https://github.com/intel/gvt-linux into drm-intel-fixes
gvt-next-fixes-2020-08-05

- Fix guest suspend/resume low performance handling of shadow ppgtt (Colin)
- Fix PV notifier handling for guest suspend/resume (Colin)

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200805080207.GY27035@zhen-hp.sh.intel.com
2020-08-19 13:03:54 +03:00
Mike Pozulp
e17f02d055 ALSA: hda/realtek: Add quirk for Samsung Galaxy Book Ion
The Galaxy Book Ion uses the same ALC298 codec as other Samsung laptops
which have the no headphone sound bug, like my Samsung Notebook. The
Galaxy Book owner confirmed that this patch fixes the bug.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207423
Signed-off-by: Mike Pozulp <pozulp.kernel@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200818165446.499821-1-pozulp.kernel@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-08-19 08:37:53 +02:00
Takashi Iwai
9e96716026 Merge tag 'asoc-fix-v5.9-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.9

A bunch of fixes that came in during the merge window, mostly for issues
that were uncovered by the changes to report errors on invalid register
access plus one important fix in that code itself.
2020-08-19 08:03:04 +02:00
Yu Kuai
e097eb7473 dmaengine: at_hdmac: add missing kfree() call in at_dma_xlate()
If memory allocation for 'atslave' succeed, at_dma_xlate() doesn't have a
corresponding kfree() in exception handling. Thus add kfree() for this
function implementation.

Fixes: bbe89c8e3d ("at_hdmac: move to generic DMA binding")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20200817115728.1706719-4-yukuai3@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-08-19 09:58:38 +05:30
Yu Kuai
3832b78b3e dmaengine: at_hdmac: add missing put_device() call in at_dma_xlate()
If of_find_device_by_node() succeed, at_dma_xlate() doesn't have a
corresponding put_device(). Thus add put_device() to fix the exception
handling for this function implementation.

Fixes: bbe89c8e3d ("at_hdmac: move to generic DMA binding")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20200817115728.1706719-3-yukuai3@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-08-19 09:58:38 +05:30
Yu Kuai
0cef8e2c5a dmaengine: at_hdmac: check return value of of_find_device_by_node() in at_dma_xlate()
The reurn value of of_find_device_by_node() is not checked, thus null
pointer dereference will be triggered if of_find_device_by_node()
failed.

Fixes: bbe89c8e3d ("at_hdmac: move to generic DMA binding")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20200817115728.1706719-2-yukuai3@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-08-19 09:58:37 +05:30
Dave Airlie
485d41b092 Merge tag 'amd-drm-fixes-5.9-2020-08-12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.9-2020-08-12:

amdgpu:
- Fix allocation size
- SR-IOV fixes
- Vega20 SMU feature state caching fix
- Fix custom pptable handling
- Arcturus golden settings update
- Several display fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200813033610.4008-1-alexander.deucher@amd.com
2020-08-19 13:56:28 +10:00
Dave Airlie
f2ea2578df Merge tag 'drm-misc-fixes-2020-08-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.9-rc1:
- Add missing dma_fence_put() in virtio_gpu_execbuffer_ioctl().
- Fix memory leak in virtio_gpu_cleanup_object().

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d50eb2e0-75c8-b724-006f-5e7b391961ff@linux.intel.com
2020-08-19 13:54:42 +10:00
Alexei Starovoitov
3708115614 Merge branch 'libbpf-minimize-feature-detection'
Andrii Nakryiko says:

====================
Get rid of two feature detectors: reallocarray and libelf-mmap. Optional
feature detections complicate libbpf Makefile and cause more troubles for
various applications that want to integrate libbpf as part of their build.

Patch #1 replaces all reallocarray() uses into libbpf-internal reallocarray()
implementation. Patches #2 and #3 makes sure we won't re-introduce
reallocarray() accidentally. Patch #2 also removes last use of
libbpf_internal.h header inside bpftool. There is still nlattr.h that's used
by both libbpf and bpftool, but that's left for a follow up patch to split.
Patch #4 removed libelf-mmap feature detector and all its uses, as it's
trivial to handle missing mmap support in libbpf, the way objtool has been
doing it for a while.

v1->v2 and v2->v3:
  - rebase to latest bpf-next (Alexei).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-08-18 18:39:12 -07:00
Andrii Nakryiko
22dd1ac91a tools: Remove feature-libelf-mmap feature detection
It's trivial to handle missing ELF_C_MMAP_READ support in libelf the way that
objtool has solved it in
("774bec3fddcc objtool: Add fallback from ELF_C_READ_MMAP to ELF_C_READ").

So instead of having an entire feature detector for that, just do what objtool
does for perf and libbpf. And keep their Makefiles a bit simpler.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200819013607.3607269-5-andriin@fb.com
2020-08-18 18:38:25 -07:00
Andrii Nakryiko
85367030a6 libbpf: Centralize poisoning and poison reallocarray()
Most of libbpf source files already include libbpf_internal.h, so it's a good
place to centralize identifier poisoning. So move kernel integer type
poisoning there. And also add reallocarray to a poison list to prevent
accidental use of it. libbpf_reallocarray() should be used universally
instead.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200819013607.3607269-4-andriin@fb.com
2020-08-18 18:38:25 -07:00
Andrii Nakryiko
7084566a23 tools/bpftool: Remove libbpf_internal.h usage in bpftool
Most netlink-related functions were unique to bpftool usage, so I moved them
into net.c. Few functions are still used by both bpftool and libbpf itself
internally, so I've copy-pasted them (libbpf_nl_get_link,
libbpf_netlink_open). It's a bit of duplication of code, but better separation
of libbpf as a library with public API and bpftool, relying on unexposed
functions in libbpf.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200819013607.3607269-3-andriin@fb.com
2020-08-18 18:38:25 -07:00
Andrii Nakryiko
029258d7b2 libbpf: Remove any use of reallocarray() in libbpf
Re-implement glibc's reallocarray() for libbpf internal-only use.
reallocarray(), unfortunately, is not available in all versions of glibc, so
requires extra feature detection and using reallocarray() stub from
<tools/libc_compat.h> and COMPAT_NEED_REALLOCARRAY. All this complicates build
of libbpf unnecessarily and is just a maintenance burden. Instead, it's
trivial to implement libbpf-specific internal version and use it throughout
libbpf.

Which is what this patch does, along with converting some realloc() uses that
should really have been reallocarray() in the first place.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200819013607.3607269-2-andriin@fb.com
2020-08-18 18:38:25 -07:00
Andrii Nakryiko
00b2e95325 selftests/bpf: Add test validating failure on ambiguous relocation value
Add test simulating ambiguous field size relocation, while fields themselves
are at the exact same offset.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818223921.2911963-5-andriin@fb.com
2020-08-18 18:12:46 -07:00
Andrii Nakryiko
353c788c20 libbpf: Improve relocation ambiguity detection
Split the instruction patching logic into relocation value calculation and
application of relocation to instruction. Using this, evaluate relocation
against each matching candidate and validate that all candidates agree on
relocated value. If not, report ambiguity and fail load.

This logic is necessary to avoid dangerous (however unlikely) accidental match
against two incompatible candidate types. Without this change, libbpf will
pick a random type as *the* candidate and apply potentially invalid
relocation.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818223921.2911963-4-andriin@fb.com
2020-08-18 18:12:45 -07:00
Andrii Nakryiko
28b93c6449 libbpf: Clean up and improve CO-RE reloc logging
Add logging of local/target type kind (struct/union/typedef/etc). Preserve
unresolved root type ID (for cases of typedef). Improve the format of CO-RE
reloc spec output format to contain only relevant and succinct info.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818223921.2911963-3-andriin@fb.com
2020-08-18 18:12:45 -07:00
Andrii Nakryiko
81ba088902 libbpf: Improve error logging for mismatched BTF kind cases
Instead of printing out integer value of BTF kind, print out a string
representation of a kind.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818223921.2911963-2-andriin@fb.com
2020-08-18 18:12:45 -07:00
Yonghong Song
00fa1d83a8 bpftool: Handle EAGAIN error code properly in pids collection
When the error code is EAGAIN, the kernel signals the user
space should retry the read() operation for bpf iterators.
Let us do it.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200818222312.2181675-1-yhs@fb.com
2020-08-18 17:36:23 -07:00
Yonghong Song
e60572b8d4 bpf: Avoid visit same object multiple times
Currently when traversing all tasks, the next tid
is always increased by one. This may result in
visiting the same task multiple times in a
pid namespace.

This patch fixed the issue by seting the next
tid as pid_nr_ns(pid, ns) + 1, similar to
funciton next_tgid().

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/bpf/20200818222310.2181500-1-yhs@fb.com
2020-08-18 17:36:23 -07:00
Yonghong Song
e679654a70 bpf: Fix a rcu_sched stall issue with bpf task/task_file iterator
In our production system, we observed rcu stalls when
'bpftool prog` is running.
  rcu: INFO: rcu_sched self-detected stall on CPU
  rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
  \x09(t=21031 jiffies g=2534773 q=179750)
  NMI backtrace for cpu 7
  CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G        W         5.8.0-00004-g68bfc7f8c1b4 #6
  Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
  Call Trace:
  <IRQ>
  dump_stack+0x57/0x70
  nmi_cpu_backtrace.cold+0x14/0x53
  ? lapic_can_unplug_cpu.cold+0x39/0x39
  nmi_trigger_cpumask_backtrace+0xb7/0xc7
  rcu_dump_cpu_stacks+0xa2/0xd0
  rcu_sched_clock_irq.cold+0x1ff/0x3d9
  ? tick_nohz_handler+0x100/0x100
  update_process_times+0x5b/0x90
  tick_sched_timer+0x5e/0xf0
  __hrtimer_run_queues+0x12a/0x2a0
  hrtimer_interrupt+0x10e/0x280
  __sysvec_apic_timer_interrupt+0x51/0xe0
  asm_call_on_stack+0xf/0x20
  </IRQ>
  sysvec_apic_timer_interrupt+0x6f/0x80
  asm_sysvec_apic_timer_interrupt+0x12/0x20
  RIP: 0010:task_file_seq_get_next+0x71/0x220
  Code: 00 00 8b 53 1c 49 8b 7d 00 89 d6 48 8b 47 20 44 8b 18 41 39 d3 76 75 48 8b 4f 20 8b 01 39 d0 76 61 41 89 d1 49 39 c1 48 19 c0 <48> 8b 49 08 21 d0 48 8d 04 c1 4c 8b 08 4d 85 c9 74 46 49 8b 41 38
  RSP: 0018:ffffc90006223e10 EFLAGS: 00000297
  RAX: ffffffffffffffff RBX: ffff888f0d172388 RCX: ffff888c8c07c1c0
  RDX: 00000000000f017b RSI: 00000000000f017b RDI: ffff888c254702c0
  RBP: ffffc90006223e68 R08: ffff888be2a1c140 R09: 00000000000f017b
  R10: 0000000000000002 R11: 0000000000100000 R12: ffff888f23c24118
  R13: ffffc90006223e60 R14: ffffffff828509a0 R15: 00000000ffffffff
  task_file_seq_next+0x52/0xa0
  bpf_seq_read+0xb9/0x320
  vfs_read+0x9d/0x180
  ksys_read+0x5f/0xe0
  do_syscall_64+0x38/0x60
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f8815f4f76e
  Code: c0 e9 f6 fe ff ff 55 48 8d 3d 76 70 0a 00 48 89 e5 e8 36 06 02 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 52 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
  RSP: 002b:00007fff8f9df578 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
  RAX: ffffffffffffffda RBX: 000000000170b9c0 RCX: 00007f8815f4f76e
  RDX: 0000000000001000 RSI: 00007fff8f9df5b0 RDI: 0000000000000007
  RBP: 00007fff8f9e05f0 R08: 0000000000000049 R09: 0000000000000010
  R10: 00007f881601fa40 R11: 0000000000000246 R12: 00007fff8f9e05a8
  R13: 00007fff8f9e05a8 R14: 0000000001917f90 R15: 000000000000e22e

Note that `bpftool prog` actually calls a task_file bpf iterator
program to establish an association between prog/map/link/btf anon
files and processes.

In the case where the above rcu stall occured, we had a process
having 1587 tasks and each task having roughly 81305 files.
This implied 129 million bpf prog invocations. Unfortunwtely none of
these files are prog/map/link/btf files so bpf iterator/prog needs
to traverse all these files and not able to return to user space
since there are no seq_file buffer overflow.

This patch fixed the issue in bpf_seq_read() to limit the number
of visited objects. If the maximum number of visited objects is
reached, no more objects will be visited in the current syscall.
If there is nothing written in the seq_file buffer, -EAGAIN will
return to the user so user can try again.

The maximum number of visited objects is set at 1 million.
In our Intel Xeon D-2191 2.3GHZ 18-core server, bpf_seq_read()
visiting 1 million files takes around 0.18 seconds.

We did not use cond_resched() since for some iterators, e.g.,
netlink iterator, where rcu read_lock critical section spans between
consecutive seq_ops->next(), which makes impossible to do cond_resched()
in the key while loop of function bpf_seq_read().

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/bpf/20200818222309.2181348-1-yhs@fb.com
2020-08-18 17:36:23 -07:00
Alexei Starovoitov
a12a625ce7 Merge branch 'libbpf-probing-improvements'
Andrii Nakryiko says:

====================
This patch set refactors libbpf feature probing to be done lazily on as-needed
basis, instead of proactively testing all possible features libbpf knows
about. This allows to scale such detections and mitigations better, without
issuing unnecessary syscalls on each bpf_object__load() call. It's also now
memoized globally, instead of per-bpf_object.

Building on that, libbpf will now detect availability of
bpf_probe_read_kernel() helper (which means also -user and -str variants), and
will sanitize BPF program code by replacing such references to generic
variants (bpf_probe_read[_str]()). This allows to migrate all BPF programs
into proper -kernel/-user probing helpers, without the fear of breaking them
for old kernels.

With that, update BPF_CORE_READ() and related macros to use
bpf_probe_read_kernel(), as it doesn't make much sense to do CO-RE relocations
against user-space types. And the only class of cases in which BPF program
might read kernel type from user-space are UAPI data structures which by
definition are fixed in their memory layout and don't need relocating. This is
exemplified by test_vmlinux test, which is fixed as part of this patch set as
well. BPF_CORE_READ() is useful for chainingg bpf_probe_read_{kernel,user}()
calls together even without relocation, so we might add user-space variants,
if there is a need.

While at making libbpf more useful for older kernels, also improve handling of
a complete lack of BTF support in kernel by not even attempting to load BTF
info into kernel. This eliminates annoying warning about lack of BTF support
in the kernel and map creation retry without BTF. If user is using features
that require kernel BTF support, it will still fail, of course.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-08-18 17:20:31 -07:00
Andrii Nakryiko
68b08647c7 libbpf: Detect minimal BTF support and skip BTF loading, if missing
Detect whether a kernel supports any BTF at all, and if not, don't even
attempt loading BTF to avoid unnecessary log messages like:

  libbpf: Error loading BTF: Invalid argument(22)
  libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818213356.2629020-8-andriin@fb.com
2020-08-18 17:16:15 -07:00
Andrii Nakryiko
70785cfb19 libbpf: Switch tracing and CO-RE helper macros to bpf_probe_read_kernel()
Now that libbpf can automatically fallback to bpf_probe_read() on old kernels
not yet supporting bpf_probe_read_kernel(), switch libbpf BPF-side helper
macros to use appropriate BPF helper for reading kernel data.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20200818213356.2629020-7-andriin@fb.com
2020-08-18 17:16:15 -07:00
Andrii Nakryiko
02f47faa25 selftests/bpf: Fix test_vmlinux test to use bpf_probe_read_user()
The test is reading UAPI kernel structure from user-space. So it doesn't need
CO-RE relocations and has to use bpf_probe_read_user().

Fixes: acbd06206b ("selftests/bpf: Add vmlinux.h selftest exercising tracing of syscalls")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818213356.2629020-6-andriin@fb.com
2020-08-18 17:16:15 -07:00
Andrii Nakryiko
109cea5a59 libbpf: Sanitize BPF program code for bpf_probe_read_{kernel, user}[_str]
Add BPF program code sanitization pass, replacing calls to BPF
bpf_probe_read_{kernel,user}[_str]() helpers with bpf_probe_read[_str](), if
libbpf detects that kernel doesn't support new variants.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818213356.2629020-5-andriin@fb.com
2020-08-18 17:16:15 -07:00
Andrii Nakryiko
bb180fb240 libbpf: Factor out common logic of testing and closing FD
Factor out common piece of logic that detects support for a feature based on
successfully created FD. Also take care of closing FD, if it was created.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818213356.2629020-4-andriin@fb.com
2020-08-18 17:16:15 -07:00
Andrii Nakryiko
47b6cb4d0a libbpf: Make kernel feature probing lazy
Turn libbpf's kernel feature probing into lazily-performed checks. This allows
to skip performing unnecessary feature checks, if a given BPF application
doesn't rely on a particular kernel feature. As we grow number of feature
probes, libbpf might perform less unnecessary syscalls and scale better with
number of feature probes long-term.

By decoupling feature checks from bpf_object, it's also possible to perform
feature probing from libbpf static helpers and low-level APIs, if necessary.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818213356.2629020-3-andriin@fb.com
2020-08-18 17:16:15 -07:00
Andrii Nakryiko
8d70823605 libbpf: Disable -Wswitch-enum compiler warning
That compilation warning is more annoying, than helpful.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200818213356.2629020-2-andriin@fb.com
2020-08-18 17:16:15 -07:00
Xu Wang
65bb2e0fc5 libbpf: Convert comma to semicolon
Replace a comma between expression statements by a semicolon.

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200818071611.21923-1-vulab@iscas.ac.cn
2020-08-18 17:11:00 -07:00
Daniel T. Lee
2bf8c7e735 samples: bpf: Fix broken bpf programs due to removed symbol
>From commit f1394b7988 ("block: mark blk_account_io_completion
static") symbol blk_account_io_completion() has been marked as static,
which makes it no longer possible to attach kprobe to this event.
Currently, there are broken samples due to this reason.

As a solution to this, attach kprobe events to blk_account_io_done()
to modify them to perform the same behavior as before.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200818051641.21724-1-danieltimlee@gmail.com
2020-08-18 17:10:03 -07:00
Colin Ian King
ad6641189c net: ipv4: remove duplicate "the the" phrase in Kconfig text
The Kconfig help text contains the phrase "the the" in the help
text. Fix this and reformat the block of help text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-18 16:02:16 -07:00
Colin Ian King
17340552ce net: mscc: ocelot: remove duplicate "the the" phrase in Kconfig text
The Kconfig help text contains the phrase "the the" in the help
text. Fix this.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-18 16:02:03 -07:00
David S. Miller
0df55a0336 Merge branch 'ethtool-netlink-bug-fixes'
Maxim Mikityanskiy says:

====================
ethtool-netlink bug fixes

This series contains a few bug fixes for ethtool-netlink. These bugs are
specific for the netlink interface, and the legacy ioctl interface is
not affected. These patches aim to have the same behavior in
ethtool-netlink as in the legacy ethtool.

Please also see the sibling series for the userspace tool.

v2 changes: Added Fixes tags.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-18 16:00:24 -07:00