Commit Graph

973754 Commits

Author SHA1 Message Date
Suren Baghdasaryan
6a0a705473 ANDROID: mm: hide get_each_object_track declaration when CONFIG_SLUB=n
struct track and enum track_item are undefined when CONFIG_SLUB=n.
get_each_object_track which uses these types should not be compiled
in this configuration. Add missing ifdefs to prevent compilation errors.

Fixes: ee8d2c7884 ("ANDROID: mm: add get_each_object_track function")
Bug: 177377077
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9ad15e6ef1572ba8f69b746ab837f051614c017c
2021-02-22 17:40:01 -08:00
Vlastimil Babka
10aaa1d5c7 FROMGIT: mm, compaction: make fast_isolate_freepages() stay within zone
Compaction always operates on pages from a single given zone when
isolating both pages to migrate and freepages.  Pageblock boundaries are
intersected with zone boundaries to be safe in case zone starts or ends in
the middle of pageblock.  The use of pageblock_pfn_to_page() protects
against non-contiguous pageblocks.

The functions fast_isolate_freepages() and fast_isolate_around() don't
currently protect the fast freepage isolation thoroughly enough against
these corner cases, and can result in freepage isolation operate outside
of zone boundaries:

- in fast_isolate_freepages() if we get a pfn from the first pageblock
  of a zone that starts in the middle of that pageblock, 'highest' can be
  a pfn outside of the zone.  If we fail to isolate anything in this
  function, we may then call fast_isolate_around() on a pfn outside of the
  zone and there effectively do a set_pageblock_skip(page_to_pfn(highest))
  which may currently hit a VM_BUG_ON() in some configurations

- fast_isolate_around() checks only the zone end boundary and not
  beginning, nor that the pageblock is contiguous (with
  pageblock_pfn_to_page()) so it's possible that we end up calling
  isolate_freepages_block() on a range of pfn's from two different zones
  and end up e.g.  isolating freepages under the wrong zone's lock.

This patch should fix the above issues.

Link: https://lkml.kernel.org/r/20210217173300.6394-1-vbabka@suse.cz
Fixes: 5a811889de ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Bug: 180548389
Change-Id: I97c31374640255219de3b852097501e3d935c8ce
(cherry picked from https://lore.kernel.org/mm-commits/20210217191436.shTJB%25akpm@linux-foundation.org/)
2021-02-22 22:51:57 +00:00
Quentin Perret
d498075b65 ANDROID: sched: time: Export symbols needed for schedutil module
Export symbols needed to allow building a schedutil-based vendor module
with GKI.

This is a small price to pay to give vendors the flexibility they need,
and avoids littering cpufreq_schedutil.c with many vendor hooks.

Bug: 170511085
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I8ff8bdb32df5d47124236819efba881c1a2a538d
(cherry picked from commit 34cd6916744b8b2d2107d2d5f10cbacb181e4f6c)
(cherry picked from commit 7587bc9dcf)
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-02-22 11:58:23 -08:00
Charan Teja Reddy
0d61a651e4 ANDROID: vmscan: Support multiple kswapd threads per node
Page replacement is handled in the Linux Kernel in one of two ways:

1) Asynchronously via kswapd
2) Synchronously, via direct reclaim

At page allocation time the allocating task is immediately given a page
from the zone free list allowing it to go right back to work doing
whatever it was doing; Probably directly or indirectly executing business
logic.

Just prior to satisfying the allocation, free pages is checked to see if
it has reached the zone low watermark and if so, kswapd is awakened.
Kswapd will start scanning pages looking for inactive pages to evict to
make room for new page allocations. The work of kswapd allows tasks to
continue allocating memory from their respective zone free list without
incurring any delay.

When the demand for free pages exceeds the rate that kswapd tasks can
supply them, page allocation works differently. Once the allocating task
finds that the number of free pages is at or below the zone min watermark,
the task will no longer pull pages from the free list. Instead, the task
will run the same CPU-bound routines as kswapd to satisfy its own
allocation by scanning and evicting pages. This is called a direct reclaim.

The time spent performing a direct reclaim can be substantial, often
taking tens to hundreds of milliseconds for small order0 allocations to
half a second or more for order9 huge-page allocations. In fact, kswapd is
not actually required on a linux system. It exists for the sole purpose of
optimizing performance by preventing direct reclaims.

When memory shortfall is sufficient to trigger direct reclaims, they can
occur in any task that is running on the system. A single aggressive
memory allocating task can set the stage for collateral damage to occur in
small tasks that rarely allocate additional memory. Consider the impact of
injecting an additional 100ms of latency when nscd allocates memory to
facilitate caching of a DNS query.

The presence of direct reclaims 10 years ago was a fairly reliable
indicator that too much was being asked of a Linux system. Kswapd was
likely wasting time scanning pages that were ineligible for eviction.
Adding RAM or reducing the working set size would usually make the problem
go away. Since then hardware has evolved to bring a new struggle for
kswapd. Storage speeds have increased by orders of magnitude while CPU
clock speeds stayed the same or even slowed down in exchange for more
cores per package. This presents a throughput problem for a single
threaded kswapd that will get worse with each generation of new hardware.

Test Details

NOTE: The tests below were run with shadow entries disabled. See the
associated patch and cover letter for details

The tests below were designed with the assumption that a kswapd bottleneck
is best demonstrated using filesystem reads. This way, the inactive list
will be full of clean pages, simplifying the analysis and allowing kswapd
to achieve the highest possible steal rate. Maximum steal rates for kswapd
are likely to be the same or lower for any other mix of page types on the
system.

Tests were run on a 2U Oracle X7-2L with 52 Intel Xeon Skylake 2GHz cores,
756GB of RAM and 8 x 3.6 TB NVMe Solid State Disk drives. Each drive has
an XFS file system mounted separately as /d0 through /d7. SSD drives
require multiple concurrent streams to show their potential, so I created
eleven 250GB zero-filled files on each drive so that I could test with
parallel reads.

The test script runs in multiple stages. At each stage, the number of dd
tasks run concurrently is increased by 2. I did not include all of the
test output for brevity.

During each stage dd tasks are launched to read from each drive in a round
robin fashion until the specified number of tasks for the stage has been
reached. Then iostat, vmstat and top are started in the background with 10
second intervals. After five minutes, all of the dd tasks are killed and
the iostat, vmstat and top output is parsed in order to report the
following:

CPU consumption
- sy - aggregate kernel mode CPU consumption from vmstat output. The value
       doesn't tend to fluctuate much so I just grab the highest value.
       Each sample is averaged over 10 seconds
- dd_cpu - for all of the dd tasks averaged across the top samples since
           there is a lot of variation.

Throughput
- in Kbytes
- Command is iostat -x -d 10 -g total

This first test performs reads using O_DIRECT in order to show the maximum
throughput that can be obtained using these drives. It also demonstrates
how rapidly throughput scales as the number of dd tasks are increased.

The dd command for this test looks like this:

Command Used: dd iflag=direct if=/d${i}/$n of=/dev/null bs=4M

Test #1: Direct IO
dd sy dd_cpu throughput
6  0  2.33   14726026.40
10 1  2.95   19954974.80
16 1  2.63   24419689.30
22 1  2.63   25430303.20
28 1  2.91   26026513.20
34 1  2.53   26178618.00
40 1  2.18   26239229.20
46 1  1.91   26250550.40
52 1  1.69   26251845.60
58 1  1.54   26253205.60
64 1  1.43   26253780.80
70 1  1.31   26254154.80
76 1  1.21   26253660.80
82 1  1.12   26254214.80
88 1  1.07   26253770.00
90 1  1.04   26252406.40

Throughput was close to peak with only 22 dd tasks. Very little system CPU
was consumed as expected as the drives DMA directly into the user address
space when using direct IO.

In this next test, the iflag=direct option is removed and we only run the
test until the pgscan_kswapd from /proc/vmstat starts to increment. At
that point metrics are parsed and reported and the pagecache contents are
dropped prior to the next test. Lather, rinse, repeat.

Test #2: standard file system IO, no page replacement
dd sy dd_cpu throughput
6  2  28.78  5134316.40
10 3  31.40  8051218.40
16 5  34.73  11438106.80
22 7  33.65  14140596.40
28 8  31.24  16393455.20
34 10 29.88  18219463.60
40 11 28.33  19644159.60
46 11 25.05  20802497.60
52 13 26.92  22092370.00
58 13 23.29  22884881.20
64 14 23.12  23452248.80
70 15 22.40  23916468.00
76 16 22.06  24328737.20
82 17 20.97  24718693.20
88 16 18.57  25149404.40
90 16 18.31  25245565.60

Each read has to pause after the buffer in kernel space is populated while
those pages are added to the pagecache and copied into the user address
space. For this reason, more parallel streams are required to achieve peak
throughput. The copy operation consumes substantially more CPU than direct
IO as expected.

The next test measures throughput after kswapd starts running. This is the
same test only we wait for kswapd to wake up before we start collecting
metrics. The script actually keeps track of a few things that were not
mentioned earlier. It tracks direct reclaims and page scans by watching
the metrics in /proc/vmstat. CPU consumption for kswapd is tracked the
same way it is tracked for dd.

Since the test is 100% reads, you can assume that the page steal rate for
kswapd and direct reclaims is almost identical to the scan rate.

Test #3: 1 kswapd thread per node
dd sy dd_cpu kswapd0 kswapd1 throughput  dr    pgscan_kswapd pgscan_direct
10 4  26.07  28.56   27.03   7355924.40  0     459316976     0
16 7  34.94  69.33   69.66   10867895.20 0     872661643     0
22 10 36.03  93.99   99.33   13130613.60 489   1037654473    11268334
28 10 30.34  95.90   98.60   14601509.60 671   1182591373    15429142
34 14 34.77  97.50   99.23   16468012.00 10850 1069005644    249839515
40 17 36.32  91.49   97.11   17335987.60 18903 975417728     434467710
46 19 38.40  90.54   91.61   17705394.40 25369 855737040     582427973
52 22 40.88  83.97   83.70   17607680.40 31250 709532935     724282458
58 25 40.89  82.19   80.14   17976905.60 35060 657796473     804117540
64 28 41.77  73.49   75.20   18001910.00 39073 561813658     895289337
70 33 45.51  63.78   64.39   17061897.20 44523 379465571     1020726436
76 36 46.95  57.96   60.32   16964459.60 47717 291299464     1093172384
82 39 47.16  55.43   56.16   16949956.00 49479 247071062     1134163008
88 42 47.41  53.75   47.62   16930911.20 51521 195449924     1180442208
90 43 47.18  51.40   50.59   16864428.00 51618 190758156     1183203901

In the previous test where kswapd was not involved, the system-wide kernel
mode CPU consumption with 90 dd tasks was 16%. In this test CPU consumption
with 90 tasks is at 43%. With 52 cores, and two kswapd tasks (one per NUMA
node), kswapd can only be responsible for a little over 4% of the increase.
The rest is likely caused by 51,618 direct reclaims that scanned 1.2
billion pages over the five minute time period of the test.

Same test, more kswapd tasks:

Test #4: 4 kswapd threads per node
dd sy dd_cpu kswapd0 kswapd1 throughput  dr    pgscan_kswapd pgscan_direct
10 5  27.09  16.65   14.17   7842605.60  0     459105291     0
16 10 37.12  26.02   24.85   11352920.40 15    920527796     358515
22 11 36.94  37.13   35.82   13771869.60 0     1132169011     0
28 13 35.23  48.43   46.86   16089746.00 0     1312902070     0
34 15 33.37  53.02   55.69   18314856.40 0     1476169080     0
40 19 35.90  69.60   64.41   19836126.80 0     1629999149     0
46 22 36.82  88.55   57.20   20740216.40 0     1708478106     0
52 24 34.38  93.76   68.34   21758352.00 0     1794055559     0
58 24 30.51  79.20   82.33   22735594.00 0     1872794397     0
64 26 30.21  97.12   76.73   23302203.60 176   1916593721     4206821
70 33 32.92  92.91   92.87   23776588.00 3575  1817685086     85574159
76 37 31.62  91.20   89.83   24308196.80 4752  1812262569     113981763
82 29 25.53  93.23   92.33   24802791.20 306   2032093122     7350704
88 43 37.12  76.18   77.01   25145694.40 20310 1253204719     487048202
90 42 38.56  73.90   74.57   22516787.60 22774 1193637495     545463615

By increasing the number of kswapd threads, throughput increased by ~50%
while kernel mode CPU utilization decreased or stayed the same, likely due
to a decrease in the number of parallel tasks at any given time doing page
replacement.

Signed-off-by: Buddy Lumpkin <buddy.lumpkin@oracle.com>
Bug: 171351667
Link: https://lore.kernel.org/lkml/1522661062-39745-1-git-send-email-buddy.lumpkin@oracle.com
[charante@codeaurora.org]: Changes made to select number of kswapds through uapi
Change-Id: I8425cab7f40cbeaf65af0ea118c1a9ac7da0930e
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2021-02-22 19:47:44 +00:00
Vijayanand Jitta
ee8d2c7884 ANDROID: mm: add get_each_object_track function
Add and export get_each_object_track which helps in
looping through all the slab objects of a page
and gets the track structure of each object, also
make track_item and track structure public, these
will be used by the minidump module to get slab
owner info.

Bug: 177377077
Change-Id: Ic207fd26a122a5f1b014f4929760d064f7af0225
Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>
2021-02-22 19:26:26 +00:00
Chiawei Wang
db158b4ae0 ANDROID: mm: Add vendor hook in pagecache_get_page()
Add a vendor hook for pagecache hit/miss and other
vendor specific functions.

Bug: 174088128
Bug: 172987241
Signed-off-by: Chiawei Wang <chiaweiwang@google.com>
Change-Id: Ie9f14a69a86b8ed81de766e44e30f2eba1d9bd84
2021-02-19 17:59:52 +00:00
Chiawei Wang
369de37804 ANDROID: mm: Add vendor hook in rmqueue()
Add a vendor hook for costly order page counting
and other vendor specific functions.

Bug: 174521902
Bug: 172987241
Signed-off-by: Chiawei Wang <chiaweiwang@google.com>
Change-Id: I89206727a462548cc3500b695d85c83ff003eec7
2021-02-19 17:59:30 +00:00
Alistair Delva
ea15862d66 ANDROID: GKI: Build in VIRTIO_FS
This allows us to keep filesystem symbols non-exported, without bringing
much into GKI.

Bug: 180710829
Change-Id: If3a4a87448be94826390fa23b0e241e2239d9550
Signed-off-by: Alistair Delva <adelva@google.com>
2021-02-19 15:34:47 +00:00
Eric Biggers
537d3bb974 ANDROID: dm: sync inline crypto support with patches going upstream
Replace the following patches with upstream versions
(well, almost upstream; as of 2021-02-12 they are queued for 5.12 at
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next):

	ANDROID-dm-add-support-for-passing-through-inline-crypto-support.patch
	ANDROID-dm-enable-may_passthrough_inline_crypto-on-some-targets.patch
	ANDROID-block-Introduce-passthrough-keyslot-manager.patch

Also, resolve conflicts with the following non-upstream patches for
hardware-wrapped key support.  Notably, we need to handle the field
blk_keyslot_manager::features in a few places:

	ANDROID-block-add-hardware-wrapped-key-support.patch
	ANDROID-dm-add-support-for-passing-through-derive_raw_secret.patch

Finally, update non-upstream device-mapper targets (dm-bow and
dm-default-key) to use the new way of specifying inline crypto
passthrough support (DM_TARGET_PASSES_CRYPTO) rather than the old way
(may_passthrough_inline_crypto).  These changes should be folded into:

	ANDROID-dm-bow-Add-dm-bow-feature.patch
	ANDROID-dm-add-dm-default-key-target-for-metadata-encryption.patch

Test: tested on db845c; verified that inline crypto support gets passed
      through over dm-linear.
Bug: 162257830
Change-Id: I5e3dea1aa09fc1215c90857b5b51d9e3720ef7db
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-19 10:48:51 +00:00
Pavankumar Kondeti
a56f081c5b ANDROID: sched: Add restricted vendor hooks in CFS scheduler
Add restricted vendor hooks in CFS scheduler class to allow
customizations in vendor modules.

Bug: 180668820
Change-Id: I69bd90e11220d7607b075a3aa687059deaa60439
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
2021-02-19 12:22:57 +05:30
John Stultz
cc048ffd7e ANDROID: drm: kirin: Remove dead code that was causing build failures
Somehow in forward porting commit 34ebaf13be ("ANDROID: drm:
kirin: Introduce kirin960"), which moves a bunch of code around
to be shared, a chunk of code removal was dropped (likely due
to some minor collision).

This caused build failures w allmodconfig that I missed.

This patch removes the dead code causing the trouble.

Bug: 180655267
Fixes: 34ebaf13be ("ANDROID: drm: kirin: Introduce kirin960")
Reported-by: Todd Kjos <tkjos@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Change-Id: I79a727115756f213502de0a4406cf3384c393ead
2021-02-19 01:32:16 +00:00
John Stultz
8adfa9950c ANDROID: adv7511: Add poweron delay to allow for EDID probing to work
For some reason on HiKey960 the edid probing doesn't work
properly unless we delay a bit at poweron.

Bug: 146450171
Signed-off-by: John Stultz <john.stultz@linaro.org>
Change-Id: Id8bcf9158d3060e065a6a9ec06bbe0323b73dc8e
2021-02-19 00:02:01 +00:00
John Stultz
795028f7e7 ANDROID: Add hikey960 build infrastructure file
Adds build.config.hikey960 and android/abi_gki_aarch64_hikey960 files

Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: Ice445cf09780b16059e5e4ef624ac30e300c6500
2021-02-18 23:59:34 +00:00
John Stultz
baae9497e8 ANDROID: Add hikey960 GKI config fragment
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: Ie6fe68f692ed57fd28cfca9f339d6392c33e9ff8
2021-02-18 23:52:29 +00:00
Youlin Wang
538e9699eb ANDROID: arm64: dts: hi3660-hikey960: Add i2s & sound device
Signed-off-by: Youlin Wang <wangyoulin1@hisilicon.com>
Signed-off-by: Tanglei Han <hantanglei@huawei.com>
Signed-off-by: Guangke Ji <jiguangke@huawei.com>
Signed-off-by: Feng Chen <puck.chen@hisilicon.com>
Signed-off-by: Kaihua Zhong <zhongkaihua@huawei.com>
Signed-off-by: Jun Chen <chenjun14@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Yiping Xu <xuyiping@hisilicon.com>
Signed-off-by: Pengcheng Li <lipengcheng8@huawei.com>
Cc: Wei Xu <xuwei5@hisilicon.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I746999c06398779a5380c043a1fc7349f373694c
2021-02-18 23:49:43 +00:00
Youlin Wang
8da70a67f3 ANDROID: ASoC: add hikey960-i2s DT bindings
Adds DT bindings documentation for
the hikey960-i2s driver and audio card device.

Signed-off-by: Youlin Wang <wangyoulin1@hisilicon.com>
Signed-off-by: Tanglei Han <hantanglei@huawei.com>
Signed-off-by: Guangke Ji <jiguangke@huawei.com>
Signed-off-by: Feng Chen <puck.chen@hisilicon.com>
Signed-off-by: Kaihua Zhong <zhongkaihua@huawei.com>
Signed-off-by: Jun Chen <chenjun14@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Yiping Xu <xuyiping@hisilicon.com>
Signed-off-by: Pengcheng Li <lipengcheng8@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: If9f8ce65c78e1e1e6b2e0439b590e7558763d631
2021-02-18 23:44:26 +00:00
Youlin Wang
b4b11198ed ANDROID: sound: Add hikey960 i2s audio driver
Add i2s driver for hisi3660 soc found on the hikey960 board.
Add conpile line in make file.
Technical support by Guangke Ji.

Signed-off-by: Youlin Wang <wangyoulin1@hisilicon.com>
Signed-off-by: Tanglei Han <hantanglei@huawei.com>
Signed-off-by: Guangke Ji <jiguangke@huawei.com>
Signed-off-by: Feng Chen <puck.chen@hisilicon.com>
Signed-off-by: Kaihua Zhong <zhongkaihua@huawei.com>
Signed-off-by: Jun Chen <chenjun14@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Yiping Xu <xuyiping@hisilicon.com>
Signed-off-by: Pengcheng Li <lipengcheng8@huawei.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I90e66ff47ebd2cd42fa233a4c4dbac6e74cd01d3
2021-02-18 23:41:44 +00:00
Shihui Zhao
121e60b7b1 ANDROID: arm64: dts: hi3660: enable gpu
enable gpu in the hi3660/hikey960 dts

Bug: 146450171
Signed-off-by: Shihui Zhao <zhaoshihui3@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Change-Id: I5e72a7dc14e760f442db5b9e55b0df26597cfbcd
2021-02-18 23:39:07 +00:00
cailiwei
ea6449d48a ANDROID: arm64: dts: hi3660: add display driver dts
Signed-off-by: Liwei Cai <cailiwei@hisilicon.com>
Signed-off-by: Xiubin Zhang <zhangxiubin1@huawei.com>
[jstultz: Cleanup unused hisifb bits]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: If9df111ad81475caec22fe75168544b010477e04
2021-02-18 23:36:49 +00:00
John Stultz
99ae6d076a ANDROID: arm64: dts: hikey960: Add CMA entry for DMA-BUF Heap/framebuffers
Add CMA entry, as the DRM driver requires phys contig
memory for framebuffers. These are normally allocated
out of DMA-BUF Heaps by gralloc for the graphics
framebuffer.

NOTE: On the 4gb boards, if we don't specify the address,
the cma region we use for the framebuffer and ion heap
may be placed in memory mapped above 32bits.

Due to addresses being masked to 32bits for DMA, this
resulted in crashes on newer 4gb boards.

Thus this patch forces CMA address to be lower in memory.

I've picked an address just past a previous CMA reserved
chunk, so we shouldn't be adding to any fragmentation.

Extra thanks to Ryan Grachek <ryan@edited.us> who
helped isolate the issue with the CMA buffers being over
the 32bit barrier.

Bug: 146450171
Change-Id: Ibb7ff1d85a9b93e41c440ffacf6a1ccf6aecb1ca
Signed-off-by: John Stultz <john.stultz@linaro.org>
2021-02-18 23:34:41 +00:00
John Stultz
a002be6ff0 ANDROID: drm: kirin960: Remove one mode-line that seems to be causing trouble
Alessio Balsini reported issues with his GeeekPi 7" monitor
after we added the wider mode support.

While this mode may work ok on HiKey, I suspect its just a bit
too far off for HiKey960.

So lets remove it for now.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I54221c44f2ee6af16d3772831a24c2872a6f738c
2021-02-18 23:32:31 +00:00
Vincent Donnefort
d586305741 ANDROID: drm: kirin: remove wait for VACTIVE IRQ
For each display cycle, the Kirin960 display IP will generate a VACTIVE
interrupt followed by a VBLANK. During a FBIOPAN ioctl, the driver will then
wait for the first one to then wait for the second one. This is an issue when
the CPU load is too low: the wait_event() function might trigger a transition
to a deep sleep state and then, waking up from that state will take too much
time to catch the VBLANK interrupt on time, the difference between those two
interrupts being only 60 us.

  * Ideal case:                   ACT                VBL
                                   +                  +
                                   v                  v
                    ---> wait(ACT) +------> wait(VBL) +-->

  * Our case:                     ACT VBL        ACT VBL
                                   +   +          +   +
                                   v   v          v   v
                    ---> wait(ACT) +------> wait(VBL) +-->

The wait for VACTIVE IRQ can safely be removed: there is no hardware access
performed between the VACTIVE and the VBLANK IRQs.

This behavior has been introduced from 4.11 with the following patch:

    a3fbb53f4 drm/atomic: Wait for vblank whenever a plane is added to state.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I66e276c08f04257135c3d05483ce70c58d5070b6
2021-02-18 23:30:28 +00:00
Xu YiPing
4dd668f97d ANDROID: drm: kirin: Add kirin960 dpe driver support
add kirin960 dpe driver support

Signed-off-by: Xu YiPing <xuyiping@hisilicon.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: Ifdf3ef522f929c23f7da49b3c692550a3b3d6c49
2021-02-18 23:28:15 +00:00
John Stultz
34ebaf13be ANDROID: drm: kirin: Introduce kirin960
Add initial kirin960 support files.

Signed-off-by: Xu YiPing <xuyiping@hisilicon.com>
[jstultz: Fold in some minor cleanups]
[jstultz: Folded in a export symbol fix by Greg Kroah-Hartman]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I934004f4a15e1bba341807f6c63c13bf6c661698
2021-02-18 23:25:56 +00:00
John Stultz
5cf9a844f6 ANDROID: dts: hi3660-hikey960: Add usb mux hub for hikey960
Add usb mux hub support to hikey960 dts

Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: Ia17f1177e0d92b1edd2304b5621f103d5757033d
2021-02-18 23:23:35 +00:00
Yu Chen
550c348963 ANDROID: dt-bindings: misc: Add bindings for HiSilicon usb hub and data role switch functionality on HiKey960
This patch adds binding documentation to support usb hub and usb
data role switch of Hisilicon HiKey960 Board.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
CC: ShuFan Lee <shufan_lee@richtek.com>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Chunfeng Yun <chunfeng.yun@mediatek.com>
Cc: Yu Chen <chenyu56@huawei.com>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Jun Li <lijun.kernel@gmail.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Guillaume Gardet <Guillaume.Gardet@arm.com>
Cc: Jack Pham <jackp@codeaurora.org>
Cc: linux-usb@vger.kernel.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Yu Chen <chenyu56@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I3f111b39b7a982b3489549076412a2f7c3c3d008
2021-02-18 23:21:21 +00:00
Siddharth Gupta
7bdc26c595 UPSTREAM: remoteproc: coredump: Add minidump functionality
This change adds a new kind of core dump mechanism which instead of dumping
entire program segments of the firmware, dumps sections of the remoteproc
memory which are sufficient to allow debugging the firmware. This function
thus uses section headers instead of program headers during creation of the
core dump elf.

Co-developed-by: Rishabh Bhatnagar <rishabhb@codeaurora.org>
Signed-off-by: Rishabh Bhatnagar <rishabhb@codeaurora.org>
Signed-off-by: Siddharth Gupta <sidgup@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Bug: 180426943
Change-Id: I4aecc8e9f0961294f087b521ab64adb78c344457
(cherry picked from commit abc72b6460)
Signed-off-by: Siddharth Gupta <quic_sidgup@quicinc.com>
2021-02-18 20:10:00 +00:00
Siddharth Gupta
abb52e261b UPSTREAM: remoteproc: core: Add ops to enable custom coredump functionality
Each remoteproc might have different requirements for coredumps and might
want to choose the type of dumps it wants to collect. This change allows
remoteproc drivers to specify their own custom dump function to be executed
in place of rproc_coredump. If the coredump op is not specified by the
remoteproc driver it will be set to rproc_coredump by default.

Signed-off-by: Siddharth Gupta <sidgup@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Bug: 180426943
Change-Id: I0ce93a43298172e8b5b9ebcd9c51b9beaa1d5eeb
(cherry picked from commit adf60a870e)
Signed-off-by: Siddharth Gupta <quic_sidgup@quicinc.com>
2021-02-18 20:04:36 +00:00
Neeraj Upadhyay
f52f343587 ANDROID: gic-v3: Update vendor hook to set affinity in GIC v3
GIC provides implementation specific registers, to configure
affinity of a  SPI. Update the existing affinity hook to allow
vendors to configure those implementation defined settings.

Bug: 180471389
Change-Id: I273035da65eaeb346c0d8b303a722f4d8d7918d6
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
2021-02-18 16:37:57 +00:00
Stephen Dickey
9393bb52f8 ANDROID: cpuhp/aarch32: keep last 32bit cpu active
It is possible that all the 32 bit CPUs are paused in
the system, which is not ideal for quickly launching
32 bit apps.

Detect if a pause operation is about to pause the
last 32 bit CPU, and prevent it from happening.

Bug: 175896474
Change-Id: I21b4dad7ba9f3ef9be460137098e6fb2c0e336e6
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
2021-02-18 13:11:30 +00:00
Greg Kroah-Hartman
b129c98dc6 Merge 5.10.17 into android12-5.10
Changes in 5.10.17
	objtool: Fix seg fault with Clang non-section symbols
	Revert "dts: phy: add GPIO number and active state used for phy reset"
	gpio: mxs: GPIO_MXS should not default to y unconditionally
	gpio: ep93xx: fix BUG_ON port F usage
	gpio: ep93xx: Fix single irqchip with multi gpiochips
	tracing: Do not count ftrace events in top level enable output
	tracing: Check length before giving out the filter buffer
	drm/i915: Fix overlay frontbuffer tracking
	arm/xen: Don't probe xenbus as part of an early initcall
	cgroup: fix psi monitor for root cgroup
	Revert "drm/amd/display: Update NV1x SR latency values"
	drm/i915/tgl+: Make sure TypeC FIA is powered up when initializing it
	drm/dp_mst: Don't report ports connected if nothing is attached to them
	dmaengine: move channel device_node deletion to driver
	tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
	tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
	soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1
	arm64: dts: rockchip: Fix PCIe DT properties on rk3399
	arm64: dts: qcom: sdm845: Reserve LPASS clocks in gcc
	ARM: OMAP2+: Fix suspcious RCU usage splats for omap_enter_idle_coupled
	arm64: dts: rockchip: remove interrupt-names property from rk3399 vdec node
	platform/x86: hp-wmi: Disable tablet-mode reporting by default
	arm64: dts: rockchip: Disable display for NanoPi R2S
	ovl: perform vfs_getxattr() with mounter creds
	cap: fix conversions on getxattr
	ovl: skip getxattr of security labels
	scsi: lpfc: Fix EEH encountering oops with NVMe traffic
	x86/split_lock: Enable the split lock feature on another Alder Lake CPU
	nvme-pci: ignore the subsysem NQN on Phison E16
	drm/amd/display: Fix DPCD translation for LTTPR AUX_RD_INTERVAL
	drm/amd/display: Add more Clock Sources to DCN2.1
	drm/amd/display: Release DSC before acquiring
	drm/amd/display: Fix dc_sink kref count in emulated_link_detect
	drm/amd/display: Free atomic state after drm_atomic_commit
	drm/amd/display: Decrement refcount of dc_sink before reassignment
	riscv: virt_addr_valid must check the address belongs to linear mapping
	bfq-iosched: Revert "bfq: Fix computation of shallow depth"
	ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL
	kallsyms: fix nonconverging kallsyms table with lld
	ARM: ensure the signal page contains defined contents
	ARM: kexec: fix oops after TLB are invalidated
	ubsan: implement __ubsan_handle_alignment_assumption
	Revert "lib: Restrict cpumask_local_spread to houskeeping CPUs"
	x86/efi: Remove EFI PGD build time checks
	lkdtm: don't move ctors to .rodata
	KVM: x86: cleanup CR3 reserved bits checks
	cgroup-v1: add disabled controller check in cgroup1_parse_param()
	dmaengine: idxd: fix misc interrupt completion
	ath9k: fix build error with LEDS_CLASS=m
	mt76: dma: fix a possible memory leak in mt76_add_fragment()
	drm/vc4: hvs: Fix buffer overflow with the dlist handling
	dmaengine: idxd: check device state before issue command
	bpf: Unbreak BPF_PROG_TYPE_KPROBE when kprobe is called via do_int3
	bpf: Check for integer overflow when using roundup_pow_of_two()
	netfilter: xt_recent: Fix attempt to update deleted entry
	selftests: netfilter: fix current year
	netfilter: nftables: fix possible UAF over chains from packet path in netns
	netfilter: flowtable: fix tcp and udp header checksum update
	xen/netback: avoid race in xenvif_rx_ring_slots_available()
	net: hdlc_x25: Return meaningful error code in x25_open
	net: ipa: set error code in gsi_channel_setup()
	hv_netvsc: Reset the RSC count if NVSP_STAT_FAIL in netvsc_receive()
	net: enetc: initialize the RFS and RSS memories
	selftests: txtimestamp: fix compilation issue
	net: stmmac: set TxQ mode back to DCB after disabling CBS
	ibmvnic: Clear failover_pending if unable to schedule
	netfilter: conntrack: skip identical origin tuple in same zone only
	scsi: scsi_debug: Fix a memory leak
	x86/build: Disable CET instrumentation in the kernel for 32-bit too
	net: dsa: felix: implement port flushing on .phylink_mac_link_down
	net: hns3: add a check for queue_id in hclge_reset_vf_queue()
	net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()
	net: hns3: add a check for index in hclge_get_rss_key()
	firmware_loader: align .builtin_fw to 8
	drm/sun4i: tcon: set sync polarity for tcon1 channel
	drm/sun4i: dw-hdmi: always set clock rate
	drm/sun4i: Fix H6 HDMI PHY configuration
	drm/sun4i: dw-hdmi: Fix max. frequency for H6
	clk: sunxi-ng: mp: fix parent rate change flag check
	i2c: stm32f7: fix configuration of the digital filter
	h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
	scripts: set proper OpenSSL include dir also for sign-file
	x86/pci: Create PCI/MSI irqdomain after x86_init.pci.arch_init()
	arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page
	rxrpc: Fix clearance of Tx/Rx ring when releasing a call
	udp: fix skb_copy_and_csum_datagram with odd segment sizes
	net: dsa: call teardown method on probe failure
	cpufreq: ACPI: Extend frequency tables to cover boost frequencies
	cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there
	net: gro: do not keep too many GRO packets in napi->rx_list
	net: fix iteration for sctp transport seq_files
	net/vmw_vsock: fix NULL pointer dereference
	net/vmw_vsock: improve locking in vsock_connect_timeout()
	net: watchdog: hold device global xmit lock during tx disable
	bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state
	switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT
	vsock/virtio: update credit only if socket is not closed
	vsock: fix locking in vsock_shutdown()
	net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
	net/qrtr: restrict user-controlled length in qrtr_tun_write_iter()
	ovl: expand warning in ovl_d_real()
	kcov, usb: only collect coverage from __usb_hcd_giveback_urb in softirq
	Linux 5.10.17

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id0300681f52b51d3f466f1e66ec3a6c25f65f4d3
2021-02-18 11:21:01 +01:00
Vinayak Menon
c3cbea9229 ANDROID: mm: avoid writing to read-only elements
Refactor speculative page fault handler to avoid
assigning to read-only elements of vmfault struct.

Change-Id: I1b5072d880a485948fad9591d7e8dc20e47d73a4
Bug: 171278850
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2021-02-17 22:35:38 +00:00
Greg Kroah-Hartman
13b6016e96 Linux 5.10.17
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Jason Self <jason@bluehome.net>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Ross Schmidt <ross.schm.dev@gmail.com>
Link: https://lore.kernel.org/r/20210215152719.459796636@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:30 +01:00
Andrey Konovalov
90ac1981ac kcov, usb: only collect coverage from __usb_hcd_giveback_urb in softirq
commit aee9ddb1d3 upstream.

Currently there's a KCOV remote coverage collection section in
__usb_hcd_giveback_urb(). Initially that section was added based on the
assumption that usb_hcd_giveback_urb() can only be called in interrupt
context as indicated by a comment before it. This is what happens when
syzkaller is fuzzing the USB stack via the dummy_hcd driver.

As it turns out, it's actually valid to call usb_hcd_giveback_urb() in task
context, provided that the caller turned off the interrupts; USB/IP does
exactly that. This can lead to a nested KCOV remote coverage collection
sections both trying to collect coverage in task context. This isn't
supported by KCOV, and leads to a WARNING.

Change __usb_hcd_giveback_urb() to only call kcov_remote_*() callbacks
when it's being executed in a softirq. As the result, the coverage from
USB/IP related usb_hcd_giveback_urb() calls won't be collected, but the
WARNING is fixed.

A potential future improvement would be to support nested remote coverage
collection sections, but this patch doesn't address that.

Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lore.kernel.org/r/f3a7a153f0719cb53ec385b16e912798bd3e4cf9.1602856358.git.andreyknvl@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:30 +01:00
Miklos Szeredi
e5c376c41a ovl: expand warning in ovl_d_real()
commit cef4cbff06 upstream.

There was a syzbot report with this warning but insufficient information...

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:30 +01:00
Sabyrzhan Tasbolatov
5706880025 net/qrtr: restrict user-controlled length in qrtr_tun_write_iter()
commit 2a80c15812 upstream.

syzbot found WARNING in qrtr_tun_write_iter [1] when write_iter length
exceeds KMALLOC_MAX_SIZE causing order >= MAX_ORDER condition.

Additionally, there is no check for 0 length write.

[1]
WARNING: mm/page_alloc.c:5011
[..]
Call Trace:
 alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267
 alloc_pages include/linux/gfp.h:547 [inline]
 kmalloc_order+0x2e/0xb0 mm/slab_common.c:837
 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853
 kmalloc include/linux/slab.h:557 [inline]
 kzalloc include/linux/slab.h:682 [inline]
 qrtr_tun_write_iter+0x8a/0x180 net/qrtr/tun.c:83
 call_write_iter include/linux/fs.h:1901 [inline]

Reported-by: syzbot+c2a7e5c5211605a90865@syzkaller.appspotmail.com
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Link: https://lore.kernel.org/r/20210202092059.1361381-1-snovitoll@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:30 +01:00
Sabyrzhan Tasbolatov
862d1c0edd net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
commit a11148e6fc upstream.

syzbot found WARNING in rds_rdma_extra_size [1] when RDS_CMSG_RDMA_ARGS
control message is passed with user-controlled
0x40001 bytes of args->nr_local, causing order >= MAX_ORDER condition.

The exact value 0x40001 can be checked with UIO_MAXIOV which is 0x400.
So for kcalloc() 0x400 iovecs with sizeof(struct rds_iovec) = 0x10
is the closest limit, with 0x10 leftover.

Same condition is currently done in rds_cmsg_rdma_args().

[1] WARNING: mm/page_alloc.c:5011
[..]
Call Trace:
 alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267
 alloc_pages include/linux/gfp.h:547 [inline]
 kmalloc_order+0x2e/0xb0 mm/slab_common.c:837
 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853
 kmalloc_array include/linux/slab.h:592 [inline]
 kcalloc include/linux/slab.h:621 [inline]
 rds_rdma_extra_size+0xb2/0x3b0 net/rds/rdma.c:568
 rds_rm_size net/rds/send.c:928 [inline]

Reported-by: syzbot+1bd2b07f93745fa38425@syzkaller.appspotmail.com
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Link: https://lore.kernel.org/r/20210201203233.1324704-1-snovitoll@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:30 +01:00
Stefano Garzarella
69e9fd9de1 vsock: fix locking in vsock_shutdown()
commit 1c5fae9c9a upstream.

In vsock_shutdown() we touched some socket fields without holding the
socket lock, such as 'state' and 'sk_flags'.

Also, after the introduction of multi-transport, we are accessing
'vsk->transport' in vsock_send_shutdown() without holding the lock
and this call can be made while the connection is in progress, so
the transport can change in the meantime.

To avoid issues, we hold the socket lock when we enter in
vsock_shutdown() and release it when we leave.

Among the transports that implement the 'shutdown' callback, only
hyperv_transport acquired the lock. Since the caller now holds it,
we no longer take it.

Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:30 +01:00
Stefano Garzarella
afe3170160 vsock/virtio: update credit only if socket is not closed
commit ce7536bc73 upstream.

If the socket is closed or is being released, some resources used by
virtio_transport_space_update() such as 'vsk->trans' may be released.

To avoid a use after free bug we should only update the available credit
when we are sure the socket is still open and we have the lock held.

Fixes: 06a8fc7836 ("VSOCK: Introduce virtio_vsock_common.ko")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210208144454.84438-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
Horatiu Vultur
ba3bcb35d7 switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT
commit 059d2a1004 upstream.

Now that MRP started to use also SWITCHDEV_ATTR_ID_PORT_STP_STATE to
notify HW, then SWITCHDEV_ATTR_ID_MRP_PORT_STAT is not used anywhere
else, therefore we can remove it.

Fixes: c284b54590 ("switchdev: mrp: Extend switchdev API to offload MRP")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
Horatiu Vultur
55ad30cb7f bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state
commit b2bdba1cbc upstream.

The function br_mrp_port_switchdev_set_state was called both with MRP
port state and STP port state, which is an issue because they don't
match exactly.

Therefore, update the function to be used only with STP port state and
use the id SWITCHDEV_ATTR_ID_PORT_STP_STATE.

The choice of using STP over MRP is that the drivers already implement
SWITCHDEV_ATTR_ID_PORT_STP_STATE and already in SW we update the port
STP state.

Fixes: 9a9f26e8f7 ("bridge: mrp: Connect MRP API with the switchdev API")
Fixes: fadd409136 ("bridge: switchdev: mrp: Implement MRP API for switchdev")
Fixes: 2f1a11ae11 ("bridge: mrp: Add MRP interface.")
Reported-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
Edwin Peer
e22b963d3e net: watchdog: hold device global xmit lock during tx disable
commit 3aa6bce9af upstream.

Prevent netif_tx_disable() running concurrently with dev_watchdog() by
taking the device global xmit lock. Otherwise, the recommended:

	netif_carrier_off(dev);
	netif_tx_disable(dev);

driver shutdown sequence can happen after the watchdog has already
checked carrier, resulting in possible false alarms. This is because
netif_tx_lock() only sets the frozen bit without maintaining the locks
on the individual queues.

Fixes: c3f26a269c ("netdev: Fix lockdep warnings in multiqueue configurations.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
Norbert Slusarek
bc21a88465 net/vmw_vsock: improve locking in vsock_connect_timeout()
commit 3d0bc44d39 upstream.

A possible locking issue in vsock_connect_timeout() was recognized by
Eric Dumazet which might cause a null pointer dereference in
vsock_transport_cancel_pkt(). This patch assures that
vsock_transport_cancel_pkt() will be called within the lock, so a race
condition won't occur which could result in vsk->transport to be set to NULL.

Fixes: 380feae0de ("vsock: cancel packets when failing to connect")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Norbert Slusarek <nslusarek@gmx.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/trinity-f8e0937a-cf0e-4d80-a76e-d9a958ba3ef1-1612535522360@3c-app-gmx-bap12
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
Norbert Slusarek
fcee53dc03 net/vmw_vsock: fix NULL pointer dereference
commit 5d1cbcc990 upstream.

In vsock_stream_connect(), a thread will enter schedule_timeout().
While being scheduled out, another thread can enter vsock_stream_connect()
as well and set vsk->transport to NULL. In case a signal was sent, the
first thread can leave schedule_timeout() and vsock_transport_cancel_pkt()
will be called right after. Inside vsock_transport_cancel_pkt(), a null
dereference will happen on transport->cancel_pkt.

Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Signed-off-by: Norbert Slusarek <nslusarek@gmx.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/trinity-c2d6cede-bfb1-44e2-85af-1fbc7f541715-1612535117028@3c-app-gmx-bap12
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
NeilBrown
c901381341 net: fix iteration for sctp transport seq_files
commit af8085f3a4 upstream.

The sctp transport seq_file iterators take a reference to the transport
in the ->start and ->next functions and releases the reference in the
->show function.  The preferred handling for such resources is to
release them in the subsequent ->next or ->stop function call.

Since Commit 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration
code and interface") there is no guarantee that ->show will be called
after ->next, so this function can now leak references.

So move the sctp_transport_put() call to ->next and ->stop.

Fixes: 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration code and interface")
Reported-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
Eric Dumazet
9e6ce473e9 net: gro: do not keep too many GRO packets in napi->rx_list
commit 8dc1c444df upstream.

Commit c80794323e ("net: Fix packet reordering caused by GRO and
listified RX cooperation") had the unfortunate effect of adding
latencies in common workloads.

Before the patch, GRO packets were immediately passed to
upper stacks.

After the patch, we can accumulate quite a lot of GRO
packets (depdending on NAPI budget).

My fix is counting in napi->rx_count number of segments
instead of number of logical packets.

Fixes: c80794323e ("net: Fix packet reordering caused by GRO and listified RX cooperation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: John Sperbeck <jsperbeck@google.com>
Tested-by: Jian Yang <jianyang@google.com>
Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20210204213146.4192368-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
Rafael J. Wysocki
18193e0983 cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there
commit d11a1d08a0 upstream.

If the maximum performance level taken for computing the
arch_max_freq_ratio value used in the x86 scale-invariance code is
higher than the one corresponding to the cpuinfo.max_freq value
coming from the acpi_cpufreq driver, the scale-invariant utilization
falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly
faster, which causes the schedutil governor to select a frequency
below cpuinfo.max_freq.  That frequency corresponds to a frequency
table entry below the maximum performance level necessary to get to
the "boost" range of CPU frequencies which prevents "boost"
frequencies from being used in some workloads.

While this issue is related to scale-invariance, it may be amplified
by commit db865272d9 ("cpufreq: Avoid configuring old governors as
default with intel_pstate") from the 5.10 development cycle which
made it extremely easy to default to schedutil even if the preferred
driver is acpi_cpufreq as long as intel_pstate is built too, because
the mere presence of the latter effectively removes the ondemand
governor from the defaults.  Distro kernels are likely to include
both intel_pstate and acpi_cpufreq on x86, so their users who cannot
use intel_pstate or choose to use acpi_cpufreq may easily be
affectecd by this issue.

If CPPC is available, it can be used to address this issue by
extending the frequency tables created by acpi_cpufreq to cover the
entire available frequency range (including "boost" frequencies) for
each CPU, but if CPPC is not there, acpi_cpufreq has no idea what
the maximum "boost" frequency is and the frequency tables created by
it cannot be extended in a meaningful way, so in that case make it
ask the arch scale-invariance code to to use the "nominal" performance
level for CPU utilization scaling in order to avoid the issue at hand.

Fixes: db865272d9 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
Rafael J. Wysocki
8a3fc32b32 cpufreq: ACPI: Extend frequency tables to cover boost frequencies
commit 3c55e94c0a upstream.

A severe performance regression on AMD EPYC processors when using
the schedutil scaling governor was discovered by Phoronix.com and
attributed to the following commits:

  41ea667227 ("x86, sched: Calculate frequency invariance for AMD
  systems")

  976df7e573 ("x86, sched: Use midpoint of max_boost and max_P for
  frequency invariance on AMD EPYC")

The source of the problem is that the maximum performance level taken
for computing the arch_max_freq_ratio value used in the x86 scale-
invariance code is higher than the one corresponding to the
cpuinfo.max_freq value coming from the acpi_cpufreq driver.

This effectively causes the scale-invariant utilization to fall below
100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so
the schedutil governor selects a frequency below cpuinfo.max_freq
then.  That frequency corresponds to a frequency table entry below
the maximum performance level necessary to get to the "boost" range
of CPU frequencies.

However, if the cpuinfo.max_freq value coming from acpi_cpufreq was
higher, the schedutil governor would select higher frequencies which
in turn would allow acpi_cpufreq to set more adequate performance
levels and to get to the "boost" range of CPU frequencies more often.

This issue affects any systems where acpi_cpufreq is used and the
"boost" (or "turbo") frequencies are enabled, not just AMD EPYC.
Moreover, commit db865272d9 ("cpufreq: Avoid configuring old
governors as default with intel_pstate") from the 5.10 development
cycle made it extremely easy to default to schedutil even if the
preferred driver is acpi_cpufreq as long as intel_pstate is built
too, because the mere presence of the latter effectively removes the
ondemand governor from the defaults.  Distro kernels are likely to
include both intel_pstate and acpi_cpufreq on x86, so their users
who cannot use intel_pstate or choose to use acpi_cpufreq may
easily be affectecd by this issue.

To address this issue, extend the frequency table constructed by
acpi_cpufreq for each CPU to cover the entire range of available
frequencies (including the "boost" ones) if CPPC is available and
indicates that "boost" (or "turbo") frequencies are enabled.  That
causes cpuinfo.max_freq to become the maximum "boost" frequency of
the given CPU (instead of the maximum frequency returned by the ACPI
_PSS object that corresponds to the "nominal" performance level).

Fixes: 41ea667227 ("x86, sched: Calculate frequency invariance for AMD systems")
Fixes: 976df7e573 ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC")
Fixes: db865272d9 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1
Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/
Reported-by: Michael Larabel <Michael@phoronix.com>
Diagnosed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Tested-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:29 +01:00
Vladimir Oltean
c930943a36 net: dsa: call teardown method on probe failure
commit 8fd54a73b7 upstream.

Since teardown is supposed to undo the effects of the setup method, it
should be called in the error path for dsa_switch_setup, not just in
dsa_switch_teardown.

Fixes: 5e3f847a02 ("net: dsa: Add teardown callback for drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210204163351.2929670-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:28 +01:00
Willem de Bruijn
46a831d1cc udp: fix skb_copy_and_csum_datagram with odd segment sizes
commit 52cbd23a11 upstream.

When iteratively computing a checksum with csum_block_add, track the
offset "pos" to correctly rotate in csum_block_add when offset is odd.

The open coded implementation of skb_copy_and_csum_datagram did this.
With the switch to __skb_datagram_iter calling csum_and_copy_to_iter,
pos was reinitialized to 0 on each call.

Bring back the pos by passing it along with the csum to the callback.

Changes v1->v2
  - pass csum value, instead of csump pointer (Alexander Duyck)

Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/
Fixes: 950fcaecd5 ("datagram: consolidate datagram copy to iter helpers")
Reported-by: Oliver Graute <oliver.graute@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:28 +01:00