linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-08 20:07:46 +09:00

Author	SHA1	Message	Date
Tao Huang	2cf392ecb9	Merge tag 'android13-5.10-2023-02_r1' of https://android.googlesource.com/kernel/common android13-5.10 February 2023 release 1 Artifacts: https://ci.android.com/builds/submitted/9611411/kernel_aarch64/latest * tag 'android13-5.10-2023-02_r1': (5234 commits) ANDROID: GKI: rockchip: add symbols for drm hdcp BACKPORT: PCI: dwc: Support multiple ATU memory regions ANDROID: cpuidle-psci: Fix suspicious RCU usage ANDROID: Update the ABI representation ANDROID: fix up struct task_struct ABI change in 5.10.162 ANDROID: struct io_uring ABI preservation hack for 5.10.162 changes ANDROID: add flags variable back to struct proto_ops UPSTREAM: io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups UPSTREAM: eventfd: provide a eventfd_signal_mask() helper UPSTREAM: eventpoll: add EPOLL_URING_WAKE poll wakeup flag UPSTREAM: Revert "proc: don't allow async path resolution of /proc/self components" UPSTREAM: Revert "proc: don't allow async path resolution of /proc/thread-self components" UPSTREAM: net: remove cmsg restriction from io_uring based send/recvmsg calls UPSTREAM: task_work: unconditionally run task_work from get_signal() UPSTREAM: signal: kill JOBCTL_TASK_WORK UPSTREAM: io_uring: import 5.15-stable io_uring UPSTREAM: task_work: add helper for more targeted task_work canceling UPSTREAM: kernel: don't call do_exit() for PF_IO_WORKER threads UPSTREAM: kernel: stop masking signals in create_io_thread() UPSTREAM: x86/process: setup io_threads more like normal user space threads ... Change-Id: I470528ec3aa69693220e7c0e5b52077b60aebe81 Conflicts: Makefile drivers/gpu/drm/bridge/analogix/analogix_dp_core.c drivers/gpu/drm/bridge/synopsys/dw-hdmi.c drivers/gpu/drm/rockchip/analogix_dp-rockchip.c drivers/gpu/drm/rockchip/rockchip_drm_vop.c drivers/media/usb/uvc/uvc_driver.c drivers/media/usb/uvc/uvcvideo.h drivers/mmc/core/mmc.c drivers/pinctrl/pinctrl-rockchip.c drivers/regulator/core.c drivers/usb/dwc3/core.h drivers/usb/dwc3/gadget.c drivers/usb/gadget/function/f_hid.c drivers/usb/gadget/function/f_uvc.c drivers/usb/gadget/function/uvc.h drivers/usb/gadget/function/uvc_configfs.c drivers/usb/gadget/function/uvc_queue.c drivers/usb/gadget/function/uvc_v4l2.c drivers/usb/gadget/function/uvc_video.c drivers/usb/host/xhci.h drivers/usb/storage/unusual_uas.h drivers/usb/typec/altmodes/displayport.c mm/cma.c sound/core/pcm_dmaengine.c include/linux/stmmac.h sound/drivers/aloop.c	2023-03-23 21:08:05 +08:00
Linus Torvalds	82c8448b4e	UPSTREAM: proc: avoid integer type confusion in get_proc_long commit `e6cfaf34be` upstream. proc_get_long() is passed a size_t, but then assigns it to an 'int' variable for the length. Let's not do that, even if our IO paths are limited to MAX_RW_COUNT (exactly because of these kinds of type errors). So do the proper test in the rigth type. Bug: 261488859 Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: Icb7df4e5d9061d8a2c854b3f7cccaa753d6ea540 Signed-off-by: Lee Jones <joneslee@google.com>	2023-01-25 08:55:46 +00:00
Linus Torvalds	62445f069c	UPSTREAM: proc: proc_skip_spaces() shouldn't think it is working on C strings commit `bce9332220` upstream. proc_skip_spaces() seems to think it is working on C strings, and ends up being just a wrapper around skip_spaces() with a really odd calling convention. Instead of basing it on skip_spaces(), it should have looked more like proc_skip_char(), which really is the exact same function (except it skips a particular character, rather than whitespace). So use that as inspiration, odd coding and all. Now the calling convention actually makes sense and works for the intended purpose. Bug: 261488859 Reported-and-tested-by: Kyle Zeng <zengyhkyle@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: Idda5e84344778ff8794bc21c981ba3da01e6a63b Signed-off-by: Lee Jones <joneslee@google.com>	2023-01-25 08:55:46 +00:00
Greg Kroah-Hartman	42f48afb44	Merge branch 'android13-5.10' into branch 'android13-5.10-lts' Sync up with android13-5.10 for the following commits: `800870b6d4` ANDROID: Update the ABI representation `b6a23be181` ANDROID: Fix for kernelci !CONFIG_SMP break-breaks `9bc66fe57c` ANDROID: fuse-bpf: set error_in to ENOENT in negative lookup `92fc848ef5` ANDROID: fuse-bpf: Add ability to run ranges of tests to fuse_test `cd9914280a` BACKPORT: NFC: netlink: fix sleep in atomic bug when firmware download timeout `e56825d048` ANDROID: KVM: arm64: Initialize ptr auth in protected mode `ab9c52146f` ANDROID: cgroup: Add vendor hook for rebuild_root_domains_bypass `8015dd49c0` FROMGIT: KVM: arm64: Ignore kvm-arm.mode if !is_hyp_mode_available() `5495c19c30` ANDROID: Update the ABI symbol list and xml `9c24cb8704` UPSTREAM: wifi: mac80211_hwsim: use 32-bit skb cookie `80c59100da` UPSTREAM: wifi: mac80211_hwsim: add back erroneously removed cast `9fafd34f1d` UPSTREAM: wifi: mac80211_hwsim: fix race condition in pending packet `d91e7b80d8` ANDROID: Update the ABI representation `14e1028389` ANDROID: sched: Fix off-by-one with cpupri MAX_RT_PRIO evaluation `7a6ea55aa0` Revert "ANDROID: workqueue: add vendor hook for wq lockup information" `7b19b0064b` UPSTREAM: kernel/irq: export irq_gc_set_wake `1856a68952` ANDROID: Update the ABI representation `1bd5344779` ANDROID: fuse-bpf: Add test for lookup postfilter `494e7075c9` ANDROID: fuse-bpf: readddir postfilter fixes `8483cc3a75` ANDROID: Enable BUILD_GKI_CERTIFICATION_TOOLS for x86_64 GKI `f813694424` ANDROID: force struct cgroup_taskset to be defined in KMI `3dc6e416a1` ANDROID: force struct blk_mq_alloc_data to be defined in KMI `af4d4153ca` BACKPORT: erofs: fix use-after-free of on-stack io[] `aec8f79a0f` ANDROID: GKI: db845c: Update symbols list and ABI `7b87b9ddb4` ANDROID: kleaf: Explicit list of ABI files. `d25aa0dbae` FROMLIST: f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT `5625e355a9` ANDROID: incfs: Add check for ATTR_KILL_SUID and ATTR_MODE in incfs_setattr `0cf7d9ce9f` Revert "UPSTREAM: scsi: ufs: core: Reduce the power mode change timeout" `1d61c5b5a0` Revert "FROMLIST: scsi: ufs: Fix deadlocks between power management and error handler" `dd18c291f9` BACKPORT: UPSTREAM: kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22} `f68d040c31` FROMLIST: scsi: ufs: Fix deadlocks between power management and error handler `32934b542c` UPSTREAM: ASoC: hdmi-codec: make hdmi_codec_controls static `46a21348d6` UPSTREAM: ASoC: hdmi-codec: Add a prepare hook `21e97dfa19` UPSTREAM: ASoC: hdmi-codec: Add iec958 controls `9e9d26699d` UPSTREAM: ASoC: hdmi-codec: Rework to support more controls `8de9ae8605` UPSTREAM: ALSA: iec958: Split status creation and fill `92c209708a` UPSTREAM: ALSA: doc: Clarify IEC958 controls iface `6cc06d03bf` UPSTREAM: ASoC: hdmi-codec: remove unused spk_mask member `004a44b913` UPSTREAM: ASoC: hdmi-codec: remove useless initialization `a7633aa2d9` UPSTREAM: ASoC: codec: hdmi-codec: Support IEC958 encoded PCM format `c584eb99bb` UPSTREAM: ASoC: hdmi-codec: Fix return value in hdmi_codec_set_jack() `f5a1cb7fd6` UPSTREAM: ASoC: hdmi-codec: Add RX support `6140082c11` UPSTREAM: ASoC: hdmi-codec: Get ELD in before reporting plugged event `185f60f2bb` ANDROID: add forward declaration vm_unmapped_area_info `d1b29856ca` ANDROID: net: export symbol for tracepoint_consume_skb `952141fb92` BACKPORT: dm verity: set DM_TARGET_IMMUTABLE feature flag `b5fe8c470e` BACKPORT: pipe: Fix missing lock in pipe_resize_ring() `790fa51f7b` BACKPORT: KVM: x86: avoid calling x86 emulator without a decoded instruction `ee742bccf6` ANDROID: power: fix vendor hook in suspend.c `7108d9d0e5` ANDROID: remove inclusions from hook definition headers `d6ab8e3ba2` Revert "ANDROID: arm64: smp: fix Lockdep warning: RCU used illegally from idle CPU." `aa381a5c71` ANDROID: remove CONFIG_TRACEPOINTS from hook definition headers `aff2309034` BACKPORT: watchqueue: make sure to serialize 'wqueue->defunct' properly `66047fb431` ANDROID: Update the ABI representation `d451b4eee2` ANDROID: Update the ABI representation `3d35c6b91d` UPSTREAM: scsi: ufs: core: Reduce the power mode change timeout `5c6d73ac2c` BACKPORT: scsi: ufs: core: Increase fDeviceInit poll frequency `2208908824` FROMGIT: f2fs: increase the limit for reserve_root `7af4b3ca30` FROMGIT: f2fs: complete checkpoints during remount `7a04671177` FROMGIT: f2fs: flush pending checkpoints when freezing super `f18d40369c` FROMGIT: f2fs: remove gc_urgent_high_limited for cleanup `68f703b19f` FROMGIT: f2fs: fix wrong continue condition in GC `8ecc3b8d53` BACKPORT: f2fs: handle decompress only post processing in softirq `23d664773f` BACKPORT: f2fs: introduce memory mode `1dd8074b61` ANDROID: Update the ABI representation `9bc5a118ef` Revert "ANDROID: usb: host: export additional xhci symbols for ring management" `3743e36578` Revert "ANDROID: GKI: signal: Export for __lock_task_sighand" `7219ca326a` Revert "ANDROID: Sched: Add restricted vendor hooks for scheduler" `4e709a85e5` ANDROID: fix kernelci issue for allnoconfig builds `909d582d3a` ANDROID: sched: Introducing PELT multiplier `9cfe2646f7` Revert "ANDROID: vendor_hooks: FPSIMD save/restore by using vendor_hooks" `c7afbeb17e` Revert "ANDROID: mm: export zone_watermark_ok" `e09aff6074` ANDROID: softirq: Add EXPORT_SYMBOL_GPL for softirq and tasklet `dd04e189df` ANDROID: Update the ABI representation `e3b7e41f06` ANDROID: vendor_hooks:vendor hook for __alloc_pages_slowpath. `b5bf2997c3` FROMLIST: xfrm: Ensure policy checked for nested ESP tunnels `970e02667c` FROMLIST: xfrm: Skip checking of already-verified secpath entries `039f38f9aa` Revert "ANDROID: mm: add vendor hook for vmpressure" `fc6f47b6fc` Revert "ANDROID: module: Add vendor hook" `f509b285d7` ANDROID: Update the ABI representation `b8762fa265` BACKPORT: mm: don't be stuck to rmap lock on reclaim path `737a5314c9` ANDROID: power: Add vendor hook for suspend `19b9be6d35` ANDROID: vendor_hooks:vendor hook for mmput `4a84a59cb8` ANDROID: vendor_hooks:vendor hook for pidfd_open `571f9fff87` ANDROID: Update the ABI representation `a48ad117ec` BACKPORT: f2fs: do not set compression bit if kernel doesn't support `406e9b3d0b` BACKPORT: f2fs: do not count ENOENT for error case `0d59b2578a` BACKPORT: f2fs: avoid infinite loop to flush node pages `6d2d344c5f` BACKPORT: f2fs: replace congestion_wait() calls with io_schedule_timeout() `ffe2cbbff9` BACKPORT: f2fs: fix wrong condition check when failing metapage read `9f4fae40a9` UPSTREAM: arm64: perf: Support new DT compatibles `be08fd28ca` UPSTREAM: arm64: perf: Simplify registration boilerplate `96dc76e1b1` UPSTREAM: arm64: perf: Support Denver and Carmel PMUs `5ac3e909a4` UPSTREAM: arm64: perf: add support for Cortex-A78 `913113f05f` UPSTREAM: binder: fix redefinition of seq_file attributes `0c79c40888` BACKPORT: drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu `0a21a3eb9f` BACKPORT: usb: gadget: rndis: prevent integer overflow in rndis_set_response() `d9d8680e9f` BACKPORT: KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID `2f9fed9ce8` BACKPORT: Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put `bc80ea8a42` BACKPORT: io_uring: always grab file table for deferred statx `0380da7fd6` FROMGIT: io_uring: Use original task for req identity in io_identity_cow() `19bb609b45` FROMLIST: binder: fix UAF of ref->proc caused by race condition `999976097d` ANDROID: binder: fix pending prio state for early exit `b5a6bcf9dc` ANDROID: Remove all but top-level OWNERS `254dfc7e98` ANDROID: Update the ABI representation `feb89f3850` ANDROID: fix kernelci error in fs/fuse/dir.c `3821e5b25c` ANDROID: power: add a vendor hook to log unfrozen tasks `f2cf53322f` ANDROID: fuse-bpf: Fix RCU/reference issue `1f44e4411f` UPSTREAM: exfat: reduce block requests when zeroing a cluster `885349f53d` FROMGIT: arm64: fix oops in concurrently setting insn_emulation sysctls `eb4344203d` FROMLIST: scsi: ufs: Fix a race condition related to device management commands `561c270725` ANDROID: vendor_hooks: tune reclaim scan type for specified mem_cgroup `a6b9536c10` ANDROID: KVM: arm64: Increase size of FF-A buffer `094905c877` ANDROID: fuse-bpf: Always call revalidate for backing `a8b1cff534` ANDROID: fuse-bpf: Adjust backing handle funcs `a06f77a0dd` ANDROID: fuse-bpf: Fix revalidate error path and backing handling `329650e3b9` ANDROID: fuse: Don't use readdirplus w/ nodeid 0 `55f267ee04` ANDROID: fuse-bpf: Fix use of get_fuse_inode `81a1ae6b43` ANDROID: mm: unlock the page on speculative fault retry `2957657ac3` ANDROID: power: Add vendor hook for suspend `ace01eaf6b` FROMGIT: Binder: add TF_UPDATE_TXN to replace outdated txn `f6acdedf61` ANDROID: GKI: forward declare struct tcpci_data in vendor hooks `037c2b81ac` ANDROID: Fix warning for undeclared struct acr_info `825e1059b5` ANDROID: KVM: arm64: Free shadow data vCPUs memcache And track more new symbols that were added to the 'android13-5.10' branch: 25 symbol(s) added 'GKI_struct_blk_mq_alloc_data' 'GKI_struct_cgroup_taskset' '__bitmap_xor' '__traceiter_android_vh_early_resume_begin' '__traceiter_android_vh_resume_end' '__traceiter_android_vh_try_to_freeze_todo_logging' '__tracepoint_android_vh_early_resume_begin' '__tracepoint_android_vh_resume_end' '__tracepoint_android_vh_try_to_freeze_todo_logging' '__xa_insert' 'dev_base_lock' 'devm_fwnode_gpiod_get_index' 'devm_gpiod_get_array_optional' 'drm_atomic_bridge_chain_disable' 'drm_mode_parse_command_line_for_connector' 'init_user_ns' 'iommu_dma_enable_best_fit_algo' 'kobject_rename' 'nf_register_net_hooks' 'nf_unregister_net_hooks' 'ns_capable_noaudit' 'regulator_set_active_discharge_regmap' 'snd_pcm_create_iec958_consumer_default' 'snd_pcm_fill_iec958_consumer' 'snd_pcm_fill_iec958_consumer_hw_params' Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifb26d1c3de6f1ad3c3e6afc37d8acd61bdd8ac14	2022-10-05 15:42:29 +02:00
Tao Huang	8e8c0e1ce1	Merge commit 'a81422efff2f' It contains the following 82894 commits: UPSTREAM: usb: dwc3: gadget: Move null pinter check to proper place UPSTREAM: dma-buf: call dma_buf_stats_setup after dmabuf is in valid list clk: rockchip: rk3588: Add CLK_SET_RATE_PARENT for i2s5/6 frac clk media: rockchip: isp: fix vicap fast stream on and off dt-bindings: update SPDX-License-Identifier for rockchip clock header dt-bindings: update SPDX-License-Identifier for rockchip power header ASoC: es7202: fix es7202 read & write error media: i2c: sc430cs support get real fps media: i2c: sc430cs fixed compile error media: i2c: sc4238 support get channel info media: i2c: sc4238 support get real fps media: i2c: sc4238 fixed compile error media: i2c: sc2310 support get channel info media: i2c: sc2310 support get real fps media: i2c: sc2310 fixed compile error media: i2c: sc2239 support get real fps media: i2c: sc2239 fixed compile error media: i2c: sc2232 support get real fps media: i2c: sc2232 fixed compile error media: i2c: sc210iot support get real fps ... Conflicts: drivers/android/Kconfig drivers/dma-buf/dma-buf.c drivers/irqchip/irq-gic-v3-its.c drivers/usb/dwc3/gadget.c sound/soc/rockchip/rockchip_i2s.c Signed-off-by: Tao Huang <huangtao@rock-chips.com> Change-Id: I0aa6721d035488a1368205c0437ea2c6452c1bb0	2022-09-08 16:17:54 +08:00
JianMin Liu	909d582d3a	ANDROID: sched: Introducing PELT multiplier The new sysctl sched_pelt_multiplier allows a user to set a clock multiplier x2 or x4 (x1 being the default). This clock multiplier artificially speed-up PELT ramp up/down similarly to a faster half-life. Indeed, if we write PELT as a first order filter: y(t) = G * (1 - exp(t/tau)) Then we can see that multiplying the time by a constant X, is the same as dividing the time constant tau by X. y(t) = G * (1 - exp((tX)/tau)) y(t) = G (1 - exp(t/(tau/X))) Tau being half-life*ln(2), multiplying the PELT time is the same as dividing the half-life: - x1: 32ms half-life - x2: 16ms half-life - x4: 8ms half-life Internally, a new clock is created: rq->clock_task_mult. It sits in the clock hierarchy between rq->clock_task and rq->clock_pelt. Bug: 177593580 Bug: 237219700 Change-Id: I67e6ca7994bebea22bf75732ee11d2b10e0d6b7e Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: JianMin Liu <jian-min.liu@mediatek.com> (cherry picked from commit `4442801a43`)	2022-08-31 16:45:35 +00:00
Greg Kroah-Hartman	d28684281b	Merge 5.10.132 into android13-5.10-lts Changes in 5.10.132 ALSA: hda - Add fixup for Dell Latitidue E5430 ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model ALSA: hda/realtek: Fix headset mic for Acer SF313-51 ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671 ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221 ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue fix race between exit_itimers() and /proc/pid/timers mm: split huge PUD on wp_huge_pud fallback tracing/histograms: Fix memory leak problem net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer ip: fix dflt addr selection for connected nexthop ARM: 9213/1: Print message about disabled Spectre workarounds only once ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction wifi: mac80211: fix queue selection for mesh/OCB interfaces cgroup: Use separate src/dst nodes when preloading css_sets for migration btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents drm/panfrost: Put mapping instead of shmem obj on panfrost_mmu_map_fault_addr() error drm/panfrost: Fix shrinker list corruption by madvise IOCTL fs/remap: constrain dedupe of EOF blocks nilfs2: fix incorrect masking of permission flags for symlinks sh: convert nommu io{re,un}map() to static inline functions Revert "evm: Fix memleak in init_desc" ext4: fix race condition between ext4_write and ext4_convert_inline_data ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count spi: amd: Limit max transfer and message size ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle ARM: 9210/1: Mark the FDT_FIXED sections as shareable net/mlx5e: kTLS, Fix build time constant test in TX net/mlx5e: kTLS, Fix build time constant test in RX net/mlx5e: Fix capability check for updating vnic env counters drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector() ima: Fix a potential integer overflow in ima_appraise_measurement ASoC: sgtl5000: Fix noise on shutdown/remove ASoC: tas2764: Add post reset delays ASoC: tas2764: Fix and extend FSYNC polarity handling ASoC: tas2764: Correct playback volume range ASoC: tas2764: Fix amp gain register offset & default ASoC: Intel: Skylake: Correct the ssp rate discovery in skl_get_ssp_clks() ASoC: Intel: Skylake: Correct the handling of fmt_config flexible array net: stmmac: dwc-qos: Disable split header for Tegra194 sysctl: Fix data races in proc_dointvec(). sysctl: Fix data races in proc_douintvec(). sysctl: Fix data races in proc_dointvec_minmax(). sysctl: Fix data races in proc_douintvec_minmax(). sysctl: Fix data races in proc_doulongvec_minmax(). sysctl: Fix data races in proc_dointvec_jiffies(). tcp: Fix a data-race around sysctl_tcp_max_orphans. inetpeer: Fix data-races around sysctl. net: Fix data-races around sysctl_mem. cipso: Fix data-races around sysctl. icmp: Fix data-races around sysctl. ipv4: Fix a data-race around sysctl_fib_sync_mem. ARM: dts: at91: sama5d2: Fix typo in i2s1 node ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero drm/i915/selftests: fix a couple IS_ERR() vs NULL tests drm/i915/gt: Serialize TLB invalidates with GT resets sysctl: Fix data-races in proc_dointvec_ms_jiffies(). icmp: Fix a data-race around sysctl_icmp_ratelimit. icmp: Fix a data-race around sysctl_icmp_ratemask. raw: Fix a data-race around sysctl_raw_l3mdev_accept. ipv4: Fix data-races around sysctl_ip_dynaddr. nexthop: Fix data-races around nexthop_compat_mode. net: ftgmac100: Hold reference returned by of_get_child_by_name() ima: force signature verification when CONFIG_KEXEC_SIG is configured ima: Fix potential memory leak in ima_init_crypto() sfc: fix use after free when disabling sriov seg6: fix skb checksum evaluation in SRH encapsulation/insertion seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors seg6: bpf: fix skb checksum in bpf_push_seg6_encap() sfc: fix kernel panic when creating VF net: atlantic: remove deep parameter on suspend/resume functions net: atlantic: remove aq_nic_deinit() when resume KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() net/tls: Check for errors in tls_device_init mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE virtio_mmio: Add missing PM calls to freeze/restore virtio_mmio: Restore guest page size on resume netfilter: br_netfilter: do not skip all hooks with 0 priority scsi: hisi_sas: Limit max hw sectors for v3 HW cpufreq: pmac32-cpufreq: Fix refcount leak bug platform/x86: hp-wmi: Ignore Sanitization Mode event net: tipc: fix possible refcount leak in tipc_sk_create() NFC: nxp-nci: don't print header length mismatch on i2c error nvme-tcp: always fail a request when sending it failed nvme: fix regression when disconnect a recovering ctrl net: sfp: fix memory leak in sfp_probe() ASoC: ops: Fix off by one in range control validation pinctrl: aspeed: Fix potential NULL dereference in aspeed_pinmux_set_mux() ASoC: SOF: Intel: hda-loader: Clarify the cl_dsp_init() flow ASoC: wm5110: Fix DRE control ASoC: dapm: Initialise kcontrol data for mux/demux controls ASoC: cs47l15: Fix event generation for low power mux control ASoC: madera: Fix event generation for OUT1 demux ASoC: madera: Fix event generation for rate controls irqchip: or1k-pic: Undefine mask_ack for level triggered hardware x86: Clear .brk area at early boot soc: ixp4xx/npe: Fix unused match warning ARM: dts: stm32: use the correct clock source for CEC on stm32mp151 Revert "can: xilinx_can: Limit CANFD brp to 2" nvme-pci: phison e16 has bogus namespace ids signal handling: don't use BUG_ON() for debugging USB: serial: ftdi_sio: add Belimo device ids usb: typec: add missing uevent when partner support PD usb: dwc3: gadget: Fix event pending check tty: serial: samsung_tty: set dma burst_size to 1 vt: fix memory overlapping when deleting chars in the buffer serial: 8250: fix return error code in serial8250_request_std_resource() serial: stm32: Clear prev values before setting RTS delays serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle serial: 8250: Fix PM usage_count for console handover x86/pat: Fix x86_has_pat_wp() Linux 5.10.132 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I03edd74e43a2343810b3fbc8c551176640bad04a	2022-08-10 17:53:04 +02:00
Liang Chen	f2f9c7b160	sched/pelt: add sysctl node to set pelt halflife period Default 32ms and only support 8ms/32ms: echo 8 > /proc/sys/kernel/sched_pelt_period echo 32 > /proc/sys/kernel/sched_pelt_period Change-Id: I04bfa571c4f8bf6b8a16075b7553caf275cd9586 Signed-off-by: Liang Chen <cl@rock-chips.com>	2022-07-28 14:24:59 +08:00
Muchun Song	31e16a5e11	mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE [ Upstream commit `43b5240ca6` ] "numa_stat" should not be included in the scope of CONFIG_HUGETLB_PAGE, if CONFIG_HUGETLB_PAGE is not configured even if CONFIG_NUMA is configured, "numa_stat" is missed form /proc. Move it out of CONFIG_HUGETLB_PAGE to fix it. Fixes: `4518085e12` ("mm, sysctl: make NUMA stats configurable") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:20:13 +02:00
Kuniyuki Iwashima	b8871d9186	sysctl: Fix data-races in proc_dointvec_ms_jiffies(). [ Upstream commit `7d1025e559` ] A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec_ms_jiffies() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec_ms_jiffies() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:20:09 +02:00
Kuniyuki Iwashima	609ce7ff75	sysctl: Fix data races in proc_dointvec_jiffies(). [ Upstream commit `e877820877` ] A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec_jiffies() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec_jiffies() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:20:07 +02:00
Kuniyuki Iwashima	a5ee448d38	sysctl: Fix data races in proc_doulongvec_minmax(). [ Upstream commit `c31bcc8fb8` ] A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_doulongvec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_doulongvec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:20:07 +02:00
Kuniyuki Iwashima	e3a2144b3b	sysctl: Fix data races in proc_douintvec_minmax(). [ Upstream commit `2d3b559df3` ] A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_douintvec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_douintvec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: `61d9b56a89` ("sysctl: add unsigned int range support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:20:07 +02:00
Kuniyuki Iwashima	71ddde27c2	sysctl: Fix data races in proc_dointvec_minmax(). [ Upstream commit `f613d86d01` ] A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:20:06 +02:00
Kuniyuki Iwashima	d5d54714e3	sysctl: Fix data races in proc_douintvec(). [ Upstream commit `4762b532ec` ] A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_douintvec() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_douintvec() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: `e7d316a02f` ("sysctl: handle error writing UINT_MAX to u32 fields") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:20:06 +02:00
Kuniyuki Iwashima	80cc28a4b4	sysctl: Fix data races in proc_dointvec(). [ Upstream commit `1f1be04b4d` ] A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:20:06 +02:00
Greg Kroah-Hartman	b2a024ac7f	Merge `d04937ae94` ("x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT") into android13-5.10 Steps on the way to 5.10.105 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie2249241cd53eea62126d9ee140a3d4e5a9012d8	2022-03-16 13:24:43 +01:00
Josh Poimboeuf	bd02dc4329	UPSTREAM: x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting commit `44a3918c82` upstream. With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable to Spectre v2 BHB-based attacks. When both are enabled, print a warning message and report it in the 'spectre_v2' sysfs vulnerabilities file. Bug: 215557547 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [fllinden@amazon.com: backported to 5.10] Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie9ee8c137729aadb4f9ef2be346a86d71eca363b	2022-03-14 14:43:53 +01:00
Daniel Borkmann	f27f62fecd	UPSTREAM: bpf: Add kconfig knob for disabling unpriv bpf by default commit `08389d8882` upstream. Add a kconfig knob which allows for unprivileged bpf to be disabled by default. If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2. This still allows a transition of 2 -> {0,1} through an admin. Similarly, this also still keeps 1 -> {1} behavior intact, so that once set to permanently disabled, it cannot be undone aside from a reboot. We've also added extra2 with max of 2 for the procfs handler, so that an admin still has a chance to toggle between 0 <-> 2. Either way, as an additional alternative, applications can make use of CAP_BPF that we added a while ago. Bug: 215557547 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net Cc: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `8c15bfb36a`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6d80853f0bd2c8618d956d967681c97b931a6137	2022-03-14 14:42:39 +01:00
Josh Poimboeuf	afc2d635b5	x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting commit `44a3918c82` upstream. With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable to Spectre v2 BHB-based attacks. When both are enabled, print a warning message and report it in the 'spectre_v2' sysfs vulnerabilities file. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [fllinden@amazon.com: backported to 5.10] Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-11 12:11:49 +01:00
Suren Baghdasaryan	c730e7dbba	ANDROID: Fix "one_thousand" defined but not used warning Fix the following warning issued when CONFIG_PERF_EVENTS is not defined: kernel/sysctl.c:124:12: error: ‘one_thousand’ defined but not used [-Werror=unused-variable] These definitions in upstream has been changed [1] and therefore the issue does not exist there. [1] https://lore.kernel.org/all/20211124220801.ip01WsWPQ%25akpm@linux-foundation.org/ Fixes: `8fded571e7` ("FROMGIT: mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%") Bug: 194652782 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I5539a2d0d27a126f7405455a8cf08c23b80d2e0b	2022-01-06 22:49:35 +00:00
Greg Kroah-Hartman	42c469a083	Merge 5.10.90 into android13-5.10 Changes in 5.10.90 Input: i8042 - add deferred probe support Input: i8042 - enable deferred probe quirk for ASUS UM325UA tomoyo: Check exceeded quota early in tomoyo_domain_quota_is_ok(). tomoyo: use hwight16() in tomoyo_domain_quota_is_ok() parisc: Clear stale IIR value on instruction access rights trap platform/x86: apple-gmux: use resource_size() with res memblock: fix memblock_phys_alloc() section mismatch error recordmcount.pl: fix typo in s390 mcount regex selinux: initialize proto variable in selinux_ip_postroute_compat() scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write() net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources net/mlx5e: Wrap the tx reporter dump callback to extract the sq net/mlx5e: Fix ICOSQ recovery flow for XSK udp: using datalen to cap ipv6 udp max gso segments selftests: Calculate udpgso segment count without header adjustment sctp: use call_rcu to free endpoint net/smc: fix using of uninitialized completions net: usb: pegasus: Do not drop long Ethernet frames net: ag71xx: Fix a potential double free in error handling paths net: lantiq_xrx200: fix statistics of received bytes NFC: st21nfca: Fix memory leak in device probe and remove net/smc: improved fix wait on already cleared link net/smc: don't send CDC/LLC message if link not ready net/smc: fix kernel panic caused by race of smc_sock igc: Fix TX timestamp support for non-MSI-X platforms ionic: Initialize the 'lif->dbid_inuse' bitmap net/mlx5e: Fix wrong features assignment in case of error selftests/net: udpgso_bench_tx: fix dst ip argument net/ncsi: check for error return from call to nla_put_u32 fsl/fman: Fix missing put_device() call in fman_port_probe i2c: validate user data in compat ioctl nfc: uapi: use kernel size_t to fix user-space builds uapi: fix linux/nfc.h userspace compilation errors drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled drm/amdgpu: add support for IP discovery gc_info table v2 xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set. usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear. usb: mtu3: add memory barrier before set GPD's HWO usb: mtu3: fix list_head check warning usb: mtu3: set interval of FS intr and isoc endpoint binder: fix async_free_space accounting for empty parcels scsi: vmw_pvscsi: Set residual data length conditionally Input: appletouch - initialize work before device registration Input: spaceball - fix parsing of movement data packets net: fix use-after-free in tw_timer_handler perf script: Fix CPU filtering of a script's switch events bpf: Add kconfig knob for disabling unpriv bpf by default Linux 5.10.90 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I277de1626e4275a3c3b1294a2a106e473a62de6c	2022-01-06 08:31:22 +01:00
Suren Baghdasaryan	8fded571e7	FROMGIT: mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30% For embedded systems with low total memory, having to run applications with relatively large memory requirements, 10% max limitation for watermark_scale_factor poses an issue of triggering direct reclaim every time such application is started. This results in slow application startup times and bad end-user experience. By increasing watermark_scale_factor max limit we allow vendors more flexibility to choose the right level of kswapd aggressiveness for their device and workload requirements. Link: https://lkml.kernel.org/r/20211124193604.2758863-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Antti Palosaari <crope@iki.fi> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: Fengfei Xi <xi.fengfei@h3c.com> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 4e36dc369cc7581ac19a7523303e682a53e52e59 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master) Bug: 194652782 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I3e926c8b222933a10c79068d22a1407ff3181824	2022-01-05 18:30:26 +00:00
Suren Baghdasaryan	8ecc974abe	Revert "ANDROID: add extra free kbytes tunable" This reverts commit `92501cb670`. Removing out-of-tree patch. Bug: 109664768 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I31d831dad0716381b1e477f1f11c758135a5fde5	2022-01-05 18:30:19 +00:00
Daniel Borkmann	8c15bfb36a	bpf: Add kconfig knob for disabling unpriv bpf by default commit `08389d8882` upstream. Add a kconfig knob which allows for unprivileged bpf to be disabled by default. If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2. This still allows a transition of 2 -> {0,1} through an admin. Similarly, this also still keeps 1 -> {1} behavior intact, so that once set to permanently disabled, it cannot be undone aside from a reboot. We've also added extra2 with max of 2 for the procfs handler, so that an admin still has a chance to toggle between 0 <-> 2. Either way, as an additional alternative, applications can make use of CAP_BPF that we added a while ago. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net Cc: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-05 12:40:34 +01:00
Charan Teja Reddy	71fdbce075	FROMLIST: mm: compaction: support triggering of proactive compaction by user The proactive compaction[1] gets triggered for every 500msec and run compaction on the node for COMPACTION_HPAGE_ORDER (usually order-9) pages based on the value set to sysctl.compaction_proactiveness. Triggering the compaction for every 500msec in search of COMPACTION_HPAGE_ORDER pages is not needed for all applications, especially on the embedded system usecases which may have few MB's of RAM. Enabling the proactive compaction in its state will endup in running almost always on such systems. Other side, proactive compaction can still be very much useful for getting a set of higher order pages in some controllable manner(controlled by using the sysctl.compaction_proactiveness). Thus on systems where enabling the proactive compaction always may proove not required, can trigger the same from user space on write to its sysctl interface. As an example, say app launcher decide to launch the memory heavy application which can be launched fast if it gets more higher order pages thus launcher can prepare the system in advance by triggering the proactive compaction from userspace. This triggering of proactive compaction is done on a write to sysctl.compaction_proactiveness by user. [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=facdaa917c4d5a376d09d25865f5a863f906234a Bug: 186387247 Link: https://lore.kernel.org/patchwork/patch/1438211/ Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Change-Id: Ie5208e274b9d7e7354471bb98ff1f10becf93595	2021-06-17 14:15:58 -07:00
Greg Kroah-Hartman	3ccfc59f82	Merge 5.10.24 into android12-5.10-lts Changes in 5.10.24 uapi: nfnetlink_cthelper.h: fix userspace compilation error powerpc/perf: Fix handling of privilege level checks in perf interrupt context powerpc/pseries: Don't enforce MSI affinity with kdump ethernet: alx: fix order of calls on resume crypto: mips/poly1305 - enable for all MIPS processors ath9k: fix transmitting to stations in dynamic SMPS mode net: Fix gro aggregation for udp encaps with zero csum net: check if protocol extracted by virtio_net_hdr_set_proto is correct net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 net: l2tp: reduce log level of messages in receive path, add counter instead can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership can: flexcan: assert FRZ bit in flexcan_chip_freeze() can: flexcan: enable RX FIFO after FRZ/HALT valid can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE) tcp: add sanity tests to TCP_QUEUE_SEQ netfilter: nf_nat: undo erroneous tcp edemux lookup netfilter: x_tables: gpf inside xt_find_revision() net: always use icmp{,v6}_ndo_send from ndo_start_xmit net: phy: fix save wrong speed and duplex problem if autoneg is on selftests/bpf: Use the last page in test_snprintf_btf on s390 selftests/bpf: No need to drop the packet when there is no geneve opt selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier samples, bpf: Add missing munmap in xdpsock libbpf: Clear map_info before each bpf_obj_get_info_by_fd ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning. ibmvnic: always store valid MAC address mt76: dma: do not report truncated frames to mac80211 powerpc/603: Fix protection of user pages mapped with PROT_NONE mount: fix mounting of detached mounts onto targets that reside on shared mounts cifs: return proper error code in statfs(2) Revert "mm, slub: consider rest of partial list if acquire_slab() fails" docs: networking: drop special stable handling net: dsa: tag_rtl4_a: fix egress tags sh_eth: fix TRSCER mask for SH771x net: enetc: don't overwrite the RSS indirection table when initializing net: enetc: take the MDIO lock only once per NAPI poll cycle net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets net: enetc: don't disable VLAN filtering in IFF_PROMISC mode net: enetc: force the RGMII speed and duplex instead of operating in inband mode net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr net: enetc: keep RX ring consumer index in sync with hardware net: ethernet: mtk-star-emac: fix wrong unmap in RX handling net/mlx4_en: update moderation when config reset net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10 nexthop: Do not flush blackhole nexthops when loopback goes down net: sched: avoid duplicates in classes dump net: mscc: ocelot: properly reject destination IP keys in VCAP IS1 net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10 net: usb: qmi_wwan: allow qmimux add/del with master up netdevsim: init u64 stats for 32bit hardware cipso,calipso: resolve a number of problems with the DOI refcounts net: stmmac: Fix VLAN filter delete timeout issue in Intel mGBE SGMII stmmac: intel: Fixes clock registration error seen for multiple interfaces net: lapbether: Remove netif_start_queue / netif_stop_queue net: davicom: Fix regulator not turned off on failed probe net: davicom: Fix regulator not turned off on driver removal net: enetc: allow hardware timestamping on TX queues with tc-etf enabled net: qrtr: fix error return code of qrtr_sendmsg() s390/qeth: fix memory leak after failed TX Buffer allocation r8169: fix r8168fp_adjust_ocp_cmd function ixgbe: fail to create xfrm offload of IPsec tunnel mode SA tools/resolve_btfids: Fix build error with older host toolchains perf build: Fix ccache usage in $(CC) when generating arch errno table net: stmmac: stop each tx channel independently net: stmmac: fix watchdog timeout during suspend/resume stress test net: stmmac: fix wrongly set buffer2 valid when sph unsupport ethtool: fix the check logic of at least one channel for RX/TX net: phy: make mdio_bus_phy_suspend/resume as __maybe_unused selftests: forwarding: Fix race condition in mirror installation mlxsw: spectrum_ethtool: Add an external speed to PTYS register perf traceevent: Ensure read cmdlines are null terminated. perf report: Fix -F for branch & mem modes net: hns3: fix query vlan mask value error for flow director net: hns3: fix bug when calculating the TCAM table info s390/cio: return -EFAULT if copy_to_user() fails again bnxt_en: reliably allocate IRQ table on reset to avoid crash gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk gpiolib: acpi: Allow to find GpioInt() resource by name and index gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2 gpio: fix gpio-device list corruption drm/compat: Clear bounce structures drm/amd/display: Add a backlight module option drm/amdgpu/display: use GFP_ATOMIC in dcn21_validate_bandwidth_fp() drm/amd/display: Fix nested FPU context in dcn21_validate_bandwidth() drm/amd/pm: bug fix for pcie dpm drm/amdgpu/display: simplify backlight setting drm/amdgpu/display: don't assert in set backlight function drm/amdgpu/display: handle aux backlight in backlight_get_brightness drm/shmem-helper: Check for purged buffers in fault handler drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff drm: Use USB controller's DMA mask when importing dmabufs drm: meson_drv add shutdown function drm/shmem-helpers: vunmap: Don't put pages for dma-buf drm/i915: Wedge the GPU if command parser setup fails s390/cio: return -EFAULT if copy_to_user() fails s390/crypto: return -EFAULT if copy_to_user() fails qxl: Fix uninitialised struct field head.surface_id sh_eth: fix TRSCER mask for R7S9210 media: usbtv: Fix deadlock on suspend media: rkisp1: params: fix wrong bits settings media: v4l: vsp1: Fix uif null pointer access media: v4l: vsp1: Fix bru null pointer access media: rc: compile rc-cec.c into rc-core cifs: fix credit accounting for extra channel net: hns3: fix error mask definition of flow director s390/qeth: don't replace a fully completed async TX buffer s390/qeth: remove QETH_QDIO_BUF_HANDLED_DELAYED state s390/qeth: improve completion of pending TX buffers s390/qeth: fix notification for pending buffers during teardown net: dsa: implement a central TX reallocation procedure net: dsa: tag_ksz: don't allocate additional memory for padding/tagging net: dsa: trailer: don't allocate additional memory for padding/tagging net: dsa: tag_qca: let DSA core deal with TX reallocation net: dsa: tag_ocelot: let DSA core deal with TX reallocation net: dsa: tag_mtk: let DSA core deal with TX reallocation net: dsa: tag_lan9303: let DSA core deal with TX reallocation net: dsa: tag_edsa: let DSA core deal with TX reallocation net: dsa: tag_brcm: let DSA core deal with TX reallocation net: dsa: tag_dsa: let DSA core deal with TX reallocation net: dsa: tag_gswip: let DSA core deal with TX reallocation net: dsa: tag_ar9331: let DSA core deal with TX reallocation net: dsa: tag_mtk: fix 802.1ad VLAN egress enetc: Fix unused var build warning for CONFIG_OF net: enetc: initialize RFS/RSS memories for unused ports too ath11k: peer delete synchronization with firmware ath11k: start vdev if a bss peer is already created ath11k: fix AP mode for QCA6390 i2c: rcar: faster irq code to minimize HW race condition i2c: rcar: optimize cacheline to minimize HW race condition scsi: ufs: WB is only available on LUN #0 to #7 udf: fix silent AED tagLocation corruption iommu/vt-d: Clear PRQ overflow only when PRQ is empty mmc: mxs-mmc: Fix a resource leak in an error handling path in 'mxs_mmc_probe()' mmc: mediatek: fix race condition between msdc_request_timeout and irq mmc: sdhci-iproc: Add ACPI bindings for the RPi Platform: OLPC: Fix probe error handling powerpc/pci: Add ppc_md.discover_phbs() spi: stm32: make spurious and overrun interrupts visible powerpc: improve handling of unrecoverable system reset powerpc/perf: Record counter overflow always if SAMPLE_IP is unset HID: logitech-dj: add support for the new lightspeed connection iteration powerpc/64: Fix stack trace not displaying final frame iommu/amd: Fix performance counter initialization clk: qcom: gdsc: Implement NO_RET_PERIPH flag sparc32: Limit memblock allocation to low memory sparc64: Use arch_validate_flags() to validate ADI flag Input: applespi - don't wait for responses to commands indefinitely. PCI: xgene-msi: Fix race in installing chained irq handler PCI: mediatek: Add missing of_node_put() to fix reference leak drivers/base: build kunit tests without structleak plugin PCI/LINK: Remove bandwidth notification ext4: don't try to processed freed blocks until mballoc is initialized kbuild: clamp SUBLEVEL to 255 PCI: Fix pci_register_io_range() memory leak i40e: Fix memory leak in i40e_probe kasan: fix memory corruption in kasan_bitops_tags test s390/smp: __smp_rescan_cpus() - move cpumask away from stack drivers/base/memory: don't store phys_device in memory blocks sysctl.c: fix underflow value setting risk in vm_table scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling scsi: target: core: Add cmd length set before cmd complete scsi: target: core: Prevent underflow for service actions clk: qcom: gpucc-msm8998: Add resets, cxc, fix flags on gpu_gx_gdsc mmc: sdhci: Update firmware interface API ARM: 9029/1: Make iwmmxt.S support Clang's integrated assembler ARM: assembler: introduce adr_l, ldr_l and str_l macros ARM: efistub: replace adrl pseudo-op with adr_l macro invocation ALSA: usb: Add Plantronics C320-M USB ctrl msg delay quirk ALSA: hda/hdmi: Cancel pending works before suspend ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5 ALSA: hda/ca0132: Add Sound BlasterX AE-5 Plus support ALSA: hda: Drop the BATCH workaround for AMD controllers ALSA: hda: Flush pending unsolicited events before suspend ALSA: hda: Avoid spurious unsol event handling during S3/S4 ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar ALSA: usb-audio: Apply the control quirk to Plantronics headsets ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend() ALSA: usb-audio: fix NULL ptr dereference in usb_audio_probe ALSA: usb-audio: fix use after free in usb_audio_disconnect Revert `95ebabde38` ("capabilities: Don't allow writing ambiguous v3 file capabilities") block: Discard page cache of zone reset target range block: Try to handle busy underlying device on discard arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL arm64: mte: Map hotplugged memory as Normal Tagged arm64: perf: Fix 64-bit event counter read truncation s390/dasd: fix hanging DASD driver unbind s390/dasd: fix hanging IO request during DASD driver unbind software node: Fix node registration xen/events: reset affinity of 2-level event when tearing it down mmc: mmci: Add MMC_CAP_NEED_RSP_BUSY for the stm32 variants mmc: core: Fix partition switch time for eMMC mmc: cqhci: Fix random crash when remove mmc module/card cifs: do not send close in compound create+close requests Goodix Fingerprint device is not a modem USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe() USB: gadget: u_ether: Fix a configfs return code usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot usb: gadget: f_uac1: stop playback on function disable usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement usb: dwc3: qcom: add URS Host support for sdm845 ACPI boot usb: dwc3: qcom: add ACPI device id for sc8180x usb: dwc3: qcom: Honor wakeup enabled/disabled state USB: usblp: fix a hang in poll() if disconnected usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM usb: xhci: do not perform Soft Retry for some xHCI hosts xhci: Improve detection of device initiated wake signal. usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state USB: serial: io_edgeport: fix memory leak in edge_startup USB: serial: ch341: add new Product ID USB: serial: cp210x: add ID for Acuity Brands nLight Air Adapter USB: serial: cp210x: add some more GE USB IDs usbip: fix stub_dev to check for stream socket usbip: fix vhci_hcd to check for stream socket usbip: fix vudc to check for stream socket usbip: fix stub_dev usbip_sockfd_store() races leading to gpf usbip: fix vhci_hcd attach_store() races leading to gpf usbip: fix vudc usbip_sockfd_store races leading to gpf Revert "serial: max310x: rework RX interrupt handling" misc/pvpanic: Export module FDT device table misc: fastrpc: restrict user apps from sending kernel RPC messages staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan() staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan() staging: rtl8712: unterminated string leads to read overflow staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data() staging: ks7010: prevent buffer overflow in ks_wlan_set_scan() staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd staging: rtl8192e: Fix possible buffer overflow in _rtl92e_wx_set_scan staging: comedi: addi_apci_1032: Fix endian problem for COS sample staging: comedi: addi_apci_1500: Fix endian problem for command sample staging: comedi: adv_pci1710: Fix endian problem for AI command data staging: comedi: das6402: Fix endian problem for AI command data staging: comedi: das800: Fix endian problem for AI command data staging: comedi: dmm32at: Fix endian problem for AI command data staging: comedi: me4000: Fix endian problem for AI command data staging: comedi: pcl711: Fix endian problem for AI command data staging: comedi: pcl818: Fix endian problem for AI command data sh_eth: fix TRSCER mask for R7S72100 cpufreq: qcom-hw: fix dereferencing freed memory 'data' cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init() arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory SUNRPC: Set memalloc_nofs_save() for sync tasks NFS: Don't revalidate the directory permissions on a lookup failure NFS: Don't gratuitously clear the inode cache when lookup failed NFSv4.2: fix return value of _nfs4_get_security_label() block: rsxx: fix error return code of rsxx_pci_probe() nvme-fc: fix racing controller reset and create association configfs: fix a use-after-free in __configfs_open_file arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds perf/core: Flush PMU internal buffers for per-CPU events perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event() powerpc/64s/exception: Clean up a missed SRR specifier seqlock,lockdep: Fix seqcount_latch_init() stop_machine: mark helpers __always_inline include/linux/sched/mm.h: use rcu_dereference in in_vfork() zram: fix return value on writeback_store linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* sched/membarrier: fix missing local execution of ipi_sync_rq_state() efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table powerpc/64s: Fix instruction encoding for lis in ppc_function_entry() powerpc: Fix inverted SET_FULL_REGS bitop powerpc: Fix missing declaration of [en/dis]able_kernel_vsx() binfmt_misc: fix possible deadlock in bm_register_write x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2 x86/sev-es: Introduce ip_within_syscall_gap() helper x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack x86/entry: Move nmi entry/exit into common code x86/sev-es: Correctly track IRQ states in runtime #VC handler x86/sev-es: Use __copy_from_user_inatomic() x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls KVM: x86: Ensure deadline timer has truly expired before posting its IRQ KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged KVM: arm64: Fix range alignment when walking page tables KVM: arm64: Avoid corrupting vCPU context register in guest exit KVM: arm64: nvhe: Save the SPE context early KVM: arm64: Reject VM creation when the default IPA size is unsupported KVM: arm64: Fix exclusive limit for IPA size mm/userfaultfd: fix memory corruption due to writeprotect mm/madvise: replace ptrace attach requirement for process_madvise KVM: arm64: Ensure I-cache isolation between vcpus of a same VM mm/page_alloc.c: refactor initialization of struct page for holes in memory layout xen/events: don't unmask an event channel when an eoi is pending xen/events: avoid handling the same event on two cpus at the same time KVM: arm64: Fix nVHE hyp panic host context restore RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size Linux 5.10.24 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie53a3c1963066a18d41357b6be41cff00690bd40	2021-03-19 09:42:56 +01:00
Lin Feng	f49bdac3e7	sysctl.c: fix underflow value setting risk in vm_table [ Upstream commit `3b3376f222` ] Apart from subsystem specific .proc_handler handler, all ctl_tables with extra1 and extra2 members set should use proc_dointvec_minmax instead of proc_dointvec, or the limit set in extra* never work and potentially echo underflow values(negative numbers) is likely make system unstable. Especially vfs_cache_pressure and zone_reclaim_mode, -1 is apparently not a valid value, but we can set to them. And then kernel may crash. # echo -1 > /proc/sys/vm/vfs_cache_pressure Link: https://lkml.kernel.org/r/20201223105535.2875-1-linf@wangsu.com Signed-off-by: Lin Feng <linf@wangsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:25 +01:00
Greg Kroah-Hartman	ee385f5df9	Merge 5.9-rc6 into android-mainline Linux 5.9-rc6 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3bccdbb773bfc2c604742e6ff5983bf0b61ba0b5	2020-09-21 12:13:45 +02:00
Linus Torvalds	5ef64cc898	mm: allow a controlled amount of unfairness in the page lock Commit `2a9127fcf2` ("mm: rewrite wait_on_page_bit_common() logic") made the page locking entirely fair, in that if a waiter came in while the lock was held, the lock would be transferred to the lockers strictly in order. That was intended to finally get rid of the long-reported watchdog failures that involved the page lock under extreme load, where a process could end up waiting essentially forever, as other page lockers stole the lock from under it. It also improved some benchmarks, but it ended up causing huge performance regressions on others, simply because fair lock behavior doesn't end up giving out the lock as aggressively, causing better worst-case latency, but potentially much worse average latencies and throughput. Instead of reverting that change entirely, this introduces a controlled amount of unfairness, with a sysctl knob to tune it if somebody needs to. But the default value should hopefully be good for any normal load, allowing a few rounds of lock stealing, but enforcing the strict ordering before the lock has been stolen too many times. There is also a hint from Matthieu Baerts that the fair page coloring may end up exposing an ABBA deadlock that is hidden by the usual optimistic lock stealing, and while the unfairness doesn't fix the fundamental issue (and I'm still looking at that), it avoids it in practice. The amount of unfairness can be modified by writing a new value to the 'sysctl_page_lock_unfairness' variable (default value of 5, exposed through /proc/sys/vm/page_lock_unfairness), but that is hopefully something we'd use mainly for debugging rather than being necessary for any deep system tuning. This whole issue has exposed just how critical the page lock can be, and how contended it gets under certain locks. And the main contention doesn't really seem to be anything related to IO (which was the origin of this lock), but for things like just verifying that the page file mapping is stable while faulting in the page into a page table. Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/ Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1 Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/ Reported-and-tested-by: Michael Larabel <Michael@michaellarabel.com> Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-17 10:26:41 -07:00
Greg Kroah-Hartman	3d3ef2a059	Merge 5.9-rc4 into android-mainline Linux 5.9-rc4 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3d041935cae5e8f3421edcdee4892f17e2c776ad	2020-09-07 09:24:58 +02:00
Tobias Klauser	7787b6fc93	bpf, sysctl: Let bpf_stats_handler take a kernel pointer buffer Commit `32927393dc` ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of bpf_stats_handler to match ctl_table.proc_handler which fixes the following sparse warning: kernel/sysctl.c:226:49: warning: incorrect type in argument 3 (different address spaces) kernel/sysctl.c:226:49: expected void * kernel/sysctl.c:226:49: got void [noderef] __user buffer kernel/sysctl.c:2640:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces)) kernel/sysctl.c:2640:35: expected int ( [usertype] proc_handler )( ... ) kernel/sysctl.c:2640:35: got int ( * )( ... ) Fixes: `32927393dc` ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/bpf/20200824142047.22043-1-tklauser@distanz.ch	2020-08-24 21:11:40 -07:00
Greg Kroah-Hartman	418b4bd4a0	Merge `dc06fe51d2` ("Merge tag 'rtc-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux") into android-mainline Steps on the way to 5.9-rc1. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iceded779988ff472863b7e1c54e22a9fa6383a30	2020-08-13 09:09:55 +02:00
Nitin Gupta	d34c0a7599	mm: use unsigned types for fragmentation score Proactive compaction uses per-node/zone "fragmentation score" which is always in range [0, 100], so use unsigned type of these scores as well as for related constants. Signed-off-by: Nitin Gupta <nigupta@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200618010319.13159-1-nigupta@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:56 -07:00
Nitin Gupta	facdaa917c	mm: proactive compaction For some applications, we need to allocate almost all memory as hugepages. However, on a running system, higher-order allocations can fail if the memory is fragmented. Linux kernel currently does on-demand compaction as we request more hugepages, but this style of compaction incurs very high latency. Experiments with one-time full memory compaction (followed by hugepage allocations) show that kernel is able to restore a highly fragmented memory state to a fairly compacted memory state within <1 sec for a 32G system. Such data suggests that a more proactive compaction can help us allocate a large fraction of memory as hugepages keeping allocation latencies low. For a more proactive compaction, the approach taken here is to define a new sysctl called 'vm.compaction_proactiveness' which dictates bounds for external fragmentation which kcompactd tries to maintain. The tunable takes a value in range [0, 100], with a default of 20. Note that a previous version of this patch [1] was found to introduce too many tunables (per-order extfrag{low, high}), but this one reduces them to just one sysctl. Also, the new tunable is an opaque value instead of asking for specific bounds of "external fragmentation", which would have been difficult to estimate. The internal interpretation of this opaque value allows for future fine-tuning. Currently, we use a simple translation from this tunable to [low, high] "fragmentation score" thresholds (low=100-proactiveness, high=low+10%). The score for a node is defined as weighted mean of per-zone external fragmentation. A zone's present_pages determines its weight. To periodically check per-node score, we reuse per-node kcompactd threads, which are woken up every 500 milliseconds to check the same. If a node's score exceeds its high threshold (as derived from user-provided proactiveness value), proactive compaction is started until its score reaches its low threshold value. By default, proactiveness is set to 20, which implies threshold values of low=80 and high=90. This patch is largely based on ideas from Michal Hocko [2]. See also the LWN article [3]. Performance data ================ System: x64_64, 1T RAM, 80 CPU threads. Kernel: 5.6.0-rc3 + this patch echo madvise \| sudo tee /sys/kernel/mm/transparent_hugepage/enabled echo madvise \| sudo tee /sys/kernel/mm/transparent_hugepage/defrag Before starting the driver, the system was fragmented from a userspace program that allocates all memory and then for each 2M aligned section, frees 3/4 of base pages using munmap. The workload is mainly anonymous userspace pages, which are easy to move around. I intentionally avoided unmovable pages in this test to see how much latency we incur when hugepage allocations hit direct compaction. 1. Kernel hugepage allocation latencies With the system in such a fragmented state, a kernel driver then allocates as many hugepages as possible and measures allocation latency: (all latency values are in microseconds) - With vanilla 5.6.0-rc3 percentile latency –––––––––– ––––––– 5 7894 10 9496 25 12561 30 15295 40 18244 50 21229 60 27556 75 30147 80 31047 90 32859 95 33799 Total 2M hugepages allocated = 383859 (749G worth of hugepages out of 762G total free => 98% of free memory could be allocated as hugepages) - With 5.6.0-rc3 + this patch, with proactiveness=20 sysctl -w vm.compaction_proactiveness=20 percentile latency –––––––––– ––––––– 5 2 10 2 25 3 30 3 40 3 50 4 60 4 75 4 80 4 90 5 95 429 Total 2M hugepages allocated = 384105 (750G worth of hugepages out of 762G total free => 98% of free memory could be allocated as hugepages) 2. JAVA heap allocation In this test, we first fragment memory using the same method as for (1). Then, we start a Java process with a heap size set to 700G and request the heap to be allocated with THP hugepages. We also set THP to madvise to allow hugepage backing of this heap. /usr/bin/time java -Xms700G -Xmx700G -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch The above command allocates 700G of Java heap using hugepages. - With vanilla 5.6.0-rc3 17.39user 1666.48system 27:37.89elapsed - With 5.6.0-rc3 + this patch, with proactiveness=20 8.35user 194.58system 3:19.62elapsed Elapsed time remains around 3:15, as proactiveness is further increased. Note that proactive compaction happens throughout the runtime of these workloads. The situation of one-time compaction, sufficient to supply hugepages for following allocation stream, can probably happen for more extreme proactiveness values, like 80 or 90. In the above Java workload, proactiveness is set to 20. The test starts with a node's score of 80 or higher, depending on the delay between the fragmentation step and starting the benchmark, which gives more-or-less time for the initial round of compaction. As t he benchmark consumes hugepages, node's score quickly rises above the high threshold (90) and proactive compaction starts again, which brings down the score to the low threshold level (80). Repeat. bpftrace also confirms proactive compaction running 20+ times during the runtime of this Java benchmark. kcompactd threads consume 100% of one of the CPUs while it tries to bring a node's score within thresholds. Backoff behavior ================ Above workloads produce a memory state which is easy to compact. However, if memory is filled with unmovable pages, proactive compaction should essentially back off. To test this aspect: - Created a kernel driver that allocates almost all memory as hugepages followed by freeing first 3/4 of each hugepage. - Set proactiveness=40 - Note that proactive_compact_node() is deferred maximum number of times with HPAGE_FRAG_CHECK_INTERVAL_MSEC of wait between each check (=> ~30 seconds between retries). [1] https://patchwork.kernel.org/patch/11098289/ [2] https://lore.kernel.org/linux-mm/20161230131412.GI13301@dhcp22.suse.cz/ [3] https://lwn.net/Articles/817905/ Signed-off-by: Nitin Gupta <nigupta@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Nitin Gupta <ngupta@nitingupta.dev> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Link: http://lkml.kernel.org/r/20200616204527.19185-1-nigupta@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:56 -07:00
Greg Kroah-Hartman	a17a563d16	Merge `449dc8c970` ("Merge tag 'for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply") into android-mainline Merges along the way to 5.9-rc1 resolves conflicts in: Documentation/ABI/testing/sysfs-class-power drivers/power/supply/power_supply_sysfs.c fs/crypto/inline_crypt.c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia087834f54fb4e5269d68c3c404747ceed240701	2020-08-08 13:07:20 +02:00
Feng Tang	56f3547bfa	mm: adjust vm_committed_as_batch according to vm overcommit policy When checking a performance change for will-it-scale scalability mmap test [1], we found very high lock contention for spinlock of percpu counter 'vm_committed_as': 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; Actually this heavy lock contention is not always necessary. The 'vm_committed_as' needs to be very precise when the strict OVERCOMMIT_NEVER policy is set, which requires a rather small batch number for the percpu counter. So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Also add a sysctl handler to adjust it when the policy is reconfigured. Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. We tested with test platforms in 0day (server, desktop and laptop), and 80%+ platforms shows improvements with that test. And whether it shows improvements depends on if the test mmap size is bigger than the batch number computed. And if the lift is 16X, 1/3 of the platforms will show improvements, though it should help the mmap/unmap usage generally, as Michal Hocko mentioned: : I believe that there are non-synthetic worklaods which would benefit from : a larger batch. E.g. large in memory databases which do large mmaps : during startups from multiple threads. [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/ Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Qian Cai <cai@lca.pw> Cc: Kees Cook <keescook@chromium.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: kernel test robot <rong.a.chen@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-07 11:33:26 -07:00
Greg Kroah-Hartman	00d6a8a7ee	Merge `e4cbce4d13` ("Merge tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline Baby steps for 5.9-rc1 Resolves some kernel/sched/ merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I88cf5411ac7251f9795d9c50cb18b0df5bf0bcd6	2020-08-07 14:17:39 +02:00
Qais Yousef	13685c4a08	sched/uclamp: Add a new sysctl to control RT default boost value RT tasks by default run at the highest capacity/performance level. When uclamp is selected this default behavior is retained by enforcing the requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum value. This is also referred to as 'the default boost value of RT tasks'. See commit `1a00d99997` ("sched/uclamp: Set default clamps for RT tasks"). On battery powered devices, it is desired to control this default (currently hardcoded) behavior at runtime to reduce energy consumed by RT tasks. For example, a mobile device manufacturer where big.LITTLE architecture is dominant, the performance of the little cores varies across SoCs, and on high end ones the big cores could be too power hungry. Given the diversity of SoCs, the new knob allows manufactures to tune the best performance/power for RT tasks for the particular hardware they run on. They could opt to further tune the value when the user selects a different power saving mode or when the device is actively charging. The runtime aspect of it further helps in creating a single kernel image that can be run on multiple devices that require different tuning. Keep in mind that a lot of RT tasks in the system are created by the kernel. On Android for instance I can see over 50 RT tasks, only a handful of which created by the Android framework. To control the default behavior globally by system admins and device integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default to change the default boost value of the RT tasks. I anticipate this to be mostly in the form of modifying the init script of a particular device. To avoid polluting the fast path with unnecessary code, the approach taken is to synchronously do the update by traversing all the existing tasks in the system. This could race with a concurrent fork(), which is dealt with by introducing sched_post_fork() function which will ensure the racy fork will get the right update applied. Tested on Juno-r2 in combination with the RT capacity awareness [1]. By default an RT task will go to the highest capacity CPU and run at the maximum frequency, which is particularly energy inefficient on high end mobile devices because the biggest core[s] are 'huge' and power hungry. With this patch the RT task can be controlled to run anywhere by default, and doesn't cause the frequency to be maximum all the time. Yet any task that really needs to be boosted can easily escape this default behavior by modifying its requested uclamp.min value (p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall. [1] `804d402fb6`: ("sched/rt: Make RT capacity-aware") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com	2020-07-29 13:51:47 +02:00
Greg Kroah-Hartman	a253db8915	Merge `ad57a1022f` ("Merge tag 'exfat-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat") into android-mainline Steps on the way to 5.8-rc1. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I4bc42f572167ea2f815688b4d1eb6124b6d260d4	2020-06-24 17:54:12 +02:00
Greg Kroah-Hartman	1ec3464acb	Merge `ee01c4d72a` ("Merge branch 'akpm' (patches from Andrew)") into android-mainline Steps along the way to 5.8-rc1. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6cca4fa48322228c8182201d68dc05f9b72cfc50	2020-06-22 15:13:57 +02:00
Greg Kroah-Hartman	8a8d41512f	Merge `cb8e59cc87` ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") into android-mainline Steps along the way to 5.8-rc1. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I280c0a50b5e137596b1c327759c6a18675908179	2020-06-22 14:58:18 +02:00
Greg Kroah-Hartman	035f08016d	Merge `039aeb9deb` ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm") into android-mainline Baby steps on the way to 5.8-rc1. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5962e12546d3d215c73c3d74b00ad6263d96f64e	2020-06-20 09:49:29 +02:00
Peter Zijlstra	b4098bfc5e	sched/deadline: Impose global limits on sched_attr::sched_period Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726161357.397880775@infradead.org	2020-06-15 14:10:04 +02:00
Rafael Aquini	e77132e758	kernel/sysctl.c: ignore out-of-range taint bits introduced via kernel.tainted Users with SYS_ADMIN capability can add arbitrary taint flags to the running kernel by writing to /proc/sys/kernel/tainted or issuing the command 'sysctl -w kernel.tainted=...'. This interface, however, is open for any integer value and this might cause an invalid set of flags being committed to the tainted_mask bitset. This patch introduces a simple way for proc_taint() to ignore any eventual invalid bit coming from the user input before committing those bits to the kernel tainted_mask. Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Link: http://lkml.kernel.org/r/20200512223946.888020-1-aquini@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-08 11:05:56 -07:00
Guilherme G. Piccoli	60c958d8df	panic: add sysctl to dump all CPUs backtraces on oops event Usually when the kernel reaches an oops condition, it's a point of no return; in case not enough debug information is available in the kernel splat, one of the last resorts would be to collect a kernel crash dump and analyze it. The problem with this approach is that in order to collect the dump, a panic is required (to kexec-load the crash kernel). When in an environment of multiple virtual machines, users may prefer to try living with the oops, at least until being able to properly shutdown their VMs / finish their important tasks. This patch implements a way to collect a bit more debug details when an oops event is reached, by printing all the CPUs backtraces through the usage of NMIs (on architectures that support that). The sysctl added (and documented) here was called "oops_all_cpu_backtrace", and when set will (as the name suggests) dump all CPUs backtraces. Far from ideal, this may be the last option though for users that for some reason cannot panic on oops. Most of times oopses are clear enough to indicate the kernel portion that must be investigated, but in virtual environments it's possible to observe hypervisor/KVM issues that could lead to oopses shown in other guests CPUs (like virtual APIC crashes). This patch hence aims to help debug such complex issues without resorting to kdump. Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-08 11:05:56 -07:00
Guilherme G. Piccoli	0ec9dc9bcb	kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected Commit `401c636a0e` ("kernel/hung_task.c: show all hung tasks before panic") introduced a change in that we started to show all CPUs backtraces when a hung task is detected _and_ the sysctl/kernel parameter "hung_task_panic" is set. The idea is good, because usually when observing deadlocks (that may lead to hung tasks), the culprit is another task holding a lock and not necessarily the task detected as hung. The problem with this approach is that dumping backtraces is a slightly expensive task, specially printing that on console (and specially in many CPU machines, as servers commonly found nowadays). So, users that plan to collect a kdump to investigate the hung tasks and narrow down the deadlock definitely don't need the CPUs backtrace on dmesg/console, which will delay the panic and pollute the log (crash tool would easily grab all CPUs traces with 'bt -a' command). Also, there's the reciprocal scenario: some users may be interested in seeing the CPUs backtraces but not have the system panic when a hung task is detected. The current approach hence is almost as embedding a policy in the kernel, by forcing the CPUs backtraces' dump (only) on hung_task_panic. This patch decouples the panic event on hung task from the CPUs backtraces dump, by creating (and documenting) a new sysctl called "hung_task_all_cpu_backtrace", analog to the approach taken on soft/hard lockups, that have both a panic and an "all_cpu_backtrace" sysctl to allow individual control. The new mechanism for dumping the CPUs backtraces on hung task detection respects "hung_task_warnings" by not dumping the traces in case there's no warnings left. Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: http://lkml.kernel.org/r/20200327223646.20779-1-gpiccoli@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-08 11:05:56 -07:00
Rafael Aquini	db38d5c106	kernel: add panic_on_taint Analogously to the introduction of panic_on_warn, this patch introduces a kernel option named panic_on_taint in order to provide a simple and generic way to stop execution and catch a coredump when the kernel gets tainted by any given flag. This is useful for debugging sessions as it avoids having to rebuild the kernel to explicitly add calls to panic() into the code sites that introduce the taint flags of interest. For instance, if one is interested in proceeding with a post-mortem analysis at the point a given code path is hitting a bad page (i.e. unaccount_page_cache_page(), or slab_bug()), a coredump can be collected by rebooting the kernel with 'panic_on_taint=0x20' amended to the command line. Another, perhaps less frequent, use for this option would be as a means for assuring a security policy case where only a subset of taints, or no single taint (in paranoid mode), is allowed for the running system. The optional switch 'nousertaint' is handy in this particular scenario, as it will avoid userspace induced crashes by writes to sysctl interface /proc/sys/kernel/tainted causing false positive hits for such policies. [akpm@linux-foundation.org: tweak kernel-parameters.txt wording] Suggested-by: Qian Cai <cai@lca.pw> Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Bunk <bunk@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-08 11:05:56 -07:00
Linus Torvalds	ee01c4d72a	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "More mm/ work, plenty more to come Subsystems affected by this patch series: slub, memcg, gup, kasan, pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs, thp, mmap, kconfig" * akpm: (131 commits) arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined riscv: support DEBUG_WX mm: add DEBUG_WX support drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() powerpc/mm: drop platform defined pmd_mknotpresent() mm: thp: don't need to drain lru cache when splitting and mlocking THP hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs sparc32: register memory occupied by kernel as memblock.memory include/linux/memblock.h: fix minor typo and unclear comment mm, mempolicy: fix up gup usage in lookup_node tools/vm/page_owner_sort.c: filter out unneeded line mm: swap: memcg: fix memcg stats for huge pages mm: swap: fix vmstats for huge pages mm: vmscan: limit the range of LRU type balancing mm: vmscan: reclaim writepage is IO cost mm: vmscan: determine anon/file pressure balance at the reclaim root mm: balance LRU lists based on relative thrashing mm: only count actual rotations as LRU reclaim cost ...	2020-06-03 20:24:15 -07:00
Johannes Weiner	c843966c55	mm: allow swappiness that prefers reclaiming anon over the file workingset With the advent of fast random IO devices (SSDs, PMEM) and in-memory swap devices such as zswap, it's possible for swap to be much faster than filesystems, and for swapping to be preferable over thrashing filesystem caches. Allow setting swappiness - which defines the rough relative IO cost of cache misses between page cache and swap-backed pages - to reflect such situations by making the swap-preferred range configurable. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/20200520232525.798933-4-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:48 -07:00

1 2 3 4 5 ...

686 Commits