linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 19:30:30 +09:00

Author	SHA1	Message	Date
Connor O'Brien	fe90216264	ANDROID: proc: Add /proc/uid directory Add support for reporting per-uid information through procfs, roughly following the approach used for per-tid and per-tgid directories in fs/proc/base.c. This also entails some new tracking of which uids have been used, to avoid losing information when the last task with a given uid exits. Bug: 72339335 Bug: 127641090 Test: ls /proc/uid/; compare with UIDs in /proc/uid_time_in_state Change-Id: I0908f0c04438b11ceb673d860e58441bf503d478 Signed-off-by: Connor O'Brien <connoro@google.com> [AmitP: Fix proc_fill_cache() now that upstream commit `0168b9e38c` ("procfs: switch instantiate_t to d_splice_alias()"), switched instantiate() callback to d_splice_alias()] Signed-off-by: Amit Pundir <amit.pundir@linaro.org> [astrachan: Folded 97b7790f505e ("ANDROID: proc: fix undefined behavior in proc_uid_base_readdir") into this change] Signed-off-by: Alistair Strachan <astrachan@google.com>	2019-03-06 15:59:21 +00:00
Connor O'Brien	406d53f0c7	ANDROID: cpufreq: track per-task time in state Add time in state data to task structs, and create /proc/<pid>/time_in_state files to show how long each individual task has run at each frequency. Create a CONFIG_CPU_FREQ_TIMES option to enable/disable this tracking. Bug: 72339335 Bug: 127641090 Test: Read /proc/<pid>/time_in_state Change-Id: Ia6456754f4cb1e83b2bc35efa8fbe9f8696febc8 Signed-off-by: Connor O'Brien <connoro@google.com> [astrachan: Folded the following changes into this patch: a6d3de6a7fba ("ANDROID: Reduce use of #ifdef CONFIG_CPU_FREQ_TIMES") b89ada5d9c09 ("ANDROID: Fix massive cpufreq_times memory leaks")] Signed-off-by: Alistair Strachan <astrachan@google.com>	2019-03-06 15:57:25 +00:00
Greg Kroah-Hartman	36d178b3bc	Merge 4.19.27 into android-4.19 Changes in 4.19.27 irq/matrix: Split out the CPU selection code into a helper irq/matrix: Spread managed interrupts on allocation genirq/matrix: Improve target CPU selection for managed interrupts. mac80211: Change default tx_sk_pacing_shift to 7 scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached drm/msm: Unblock writer if reader closes file ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field ALSA: compress: prevent potential divide by zero bugs ASoC: Variable "val" in function rt274_i2c_probe() could be uninitialized clk: tegra: dfll: Fix a potential Oop in remove() clk: sysfs: fix invalid JSON in clk_dump clk: vc5: Abort clock configuration without upstream clock thermal: int340x_thermal: Fix a NULL vs IS_ERR() check usb: dwc3: gadget: synchronize_irq dwc irq in suspend usb: dwc3: gadget: Fix the uninitialized link_state when udc starts usb: gadget: Potential NULL dereference on allocation error selftests: rtc: rtctest: fix alarm tests selftests: rtc: rtctest: add alarm test on minute boundary genirq: Make sure the initial affinity is not empty x86/mm/mem_encrypt: Fix erroneous sizeof() ASoC: rt5682: Fix PLL source register definitions ASoC: dapm: change snprintf to scnprintf for possible overflow ASoC: imx-audmux: change snprintf to scnprintf for possible overflow selftests/vm/gup_benchmark.c: match gup struct to kernel phy: ath79-usb: Fix the power on error path phy: ath79-usb: Fix the main reset name to match the DT binding selftests: seccomp: use LDLIBS instead of LDFLAGS selftests: gpio-mockup-chardev: Check asprintf() for error irqchip/gic-v3-mbi: Fix uninitialized mbi_lock ARC: fix __ffs return value to avoid build warnings ARC: show_regs: lockdep: avoid page allocator... drivers: thermal: int340x_thermal: Fix sysfs race condition staging: rtl8723bs: Fix build error with Clang when inlining is disabled mac80211: fix miscounting of ttl-dropped frames sched/wait: Fix rcuwait_wake_up() ordering sched/wake_q: Fix wakeup ordering for wake_q futex: Fix (possible) missed wakeup locking/rwsem: Fix (possible) missed wakeup drm/amd/powerplay: OD setting fix on Vega10 tty: serial: qcom_geni_serial: Allow mctrl when flow control is disabled serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling drm/sun4i: hdmi: Fix usage of TMDS clock staging: android: ion: Support cpu access during dma_buf_detach direct-io: allow direct writes to empty inodes writeback: synchronize sync(2) against cgroup writeback membership switches scsi: lpfc: nvme: avoid hang / use-after-free when destroying localport scsi: lpfc: nvmet: avoid hang / use-after-free when destroying targetport scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state() net: altera_tse: fix connect_local_phy error path hv_netvsc: Fix ethtool change hash key error hv_netvsc: Refactor assignments of struct netvsc_device_info hv_netvsc: Fix hash key value reset after other ops nvme-rdma: fix timeout handler nvme-multipath: drop optimization for static ANA group IDs drm/msm: Fix A6XX support for opp-level net: usb: asix: ax88772_bind return error when hw_reset fail net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP ibmveth: Do not process frames after calling napi_reschedule mac80211: don't initiate TDLS connection if station is not associated to AP mac80211: Add attribute aligned(2) to struct 'action' cfg80211: extend range deviation for DMG svm: Fix AVIC incomplete IPI emulation KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1 kvm: selftests: Fix region overlap check in kvm_util mmc: spi: Fix card detection during probe mmc: tmio_mmc_core: don't claim spurious interrupts mmc: tmio: fix access width of Block Count Register mmc: core: Fix NULL ptr crash from mmc_should_fail_request mmc: cqhci: fix space allocated for transfer descriptor mmc: cqhci: Fix a tiny potential memory leak on error condition mmc: sdhci-esdhc-imx: correct the fix of ERR004536 mm: enforce min addr even if capable() in expand_downwards() drm: Block fb changes for async plane updates hugetlbfs: fix races and page leaks during migration MIPS: fix truncation in __cmpxchg_small for short values MIPS: BCM63XX: provide DMA masks for ethernet devices MIPS: eBPF: Fix icache flush end address x86/uaccess: Don't leak the AC flag into __put_user() value evaluation Linux 4.19.27 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-03-05 18:07:53 +01:00
Xie Yongji	9ad6216e8c	locking/rwsem: Fix (possible) missed wakeup [ Upstream commit `e158488be2` ] Because wake_q_add() can imply an immediate wakeup (cmpxchg failure case), we must not rely on the wakeup being delayed. However, commit: `e38513905e` ("locking/rwsem: Rework zeroing reader waiter->task") relies on exactly that behaviour in that the wakeup must not happen until after we clear waiter->task. [ peterz: Added changelog. ] Signed-off-by: Xie Yongji <xieyongji@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `e38513905e` ("locking/rwsem: Rework zeroing reader waiter->task") Link: https://lkml.kernel.org/r/1543495830-2644-1-git-send-email-xieyongji@baidu.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-03-05 17:58:49 +01:00
Peter Zijlstra	2368e6d3bc	futex: Fix (possible) missed wakeup [ Upstream commit `b061c38bef` ] We must not rely on wake_q_add() to delay the wakeup; in particular commit: `1d0dcb3ad9` ("futex: Implement lockless wakeups") moved wake_q_add() before smp_store_release(&q->lock_ptr, NULL), which could result in futex_wait() waking before observing ->lock_ptr == NULL and going back to sleep again. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `1d0dcb3ad9` ("futex: Implement lockless wakeups") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-03-05 17:58:49 +01:00
Peter Zijlstra	653a1dbcb0	sched/wake_q: Fix wakeup ordering for wake_q [ Upstream commit `4c4e373156` ] Notable cmpxchg() does not provide ordering when it fails, however wake_q_add() requires ordering in this specific case too. Without this it would be possible for the concurrent wakeup to not observe our prior state. Andrea Parri provided: C wake_up_q-wake_q_add { int next = 0; int y = 0; } P0(int next, int y) { int r0; /* in wake_up_q() / WRITE_ONCE(next, 1); /* node->next = NULL / smp_mb(); / implied by wake_up_process() / r0 = READ_ONCE(y); } P1(int next, int y) { int r1; /* in wake_q_add() / WRITE_ONCE(y, 1); /* wake_cond = true */ smp_mb__before_atomic(); r1 = cmpxchg_relaxed(next, 1, 2); } exists (0:r0=0 /\ 1:r1=0) This "exists" clause cannot be satisfied according to the LKMM: Test wake_up_q-wake_q_add Allowed States 3 0:r0=0; 1:r1=1; 0:r0=1; 1:r1=0; 0:r0=1; 1:r1=1; No Witnesses Positive: 0 Negative: 3 Condition exists (0:r0=0 /\ 1:r1=0) Observation wake_up_q-wake_q_add Never 0 3 Reported-by: Yongji Xie <elohimes@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-03-05 17:58:49 +01:00
Prateek Sood	5024f0a29a	sched/wait: Fix rcuwait_wake_up() ordering [ Upstream commit `6dc080eeb2` ] For some peculiar reason rcuwait_wake_up() has the right barrier in the comment, but not in the code. This mistake has been observed to cause a deadlock in the following situation: P1 P2 percpu_up_read() percpu_down_write() rcu_sync_is_idle() // false rcu_sync_enter() ... __percpu_up_read() [S] ,- __this_cpu_dec(*sem->read_count) \| smp_rmb(); [L] \| task = rcu_dereference(w->task) // NULL \| \| [S] w->task = current \| smp_mb(); \| [L] readers_active_check() // fail `-> <store happens here> Where the smp_rmb() (obviously) fails to constrain the store. [ peterz: Added changelog. ] Signed-off-by: Prateek Sood <prsood@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `8f95c90ceb` ("sched/wait, RCU: Introduce rcuwait machinery") Link: https://lkml.kernel.org/r/1543590656-7157-1-git-send-email-prsood@codeaurora.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-03-05 17:58:49 +01:00
Srinivas Ramana	17fab8914f	genirq: Make sure the initial affinity is not empty [ Upstream commit `bddda606ec` ] If all CPUs in the irq_default_affinity mask are offline when an interrupt is initialized then irq_setup_affinity() can set an empty affinity mask for a newly allocated interrupt. Fix this by falling back to cpu_online_mask in case the resulting affinity mask is zero. Signed-off-by: Srinivas Ramana <sramana@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-msm@vger.kernel.org Link: https://lkml.kernel.org/r/1545312957-8504-1-git-send-email-sramana@codeaurora.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-03-05 17:58:47 +01:00
Long Li	765c30b318	genirq/matrix: Improve target CPU selection for managed interrupts. [ Upstream commit `e8da8794a7` ] On large systems with multiple devices of the same class (e.g. NVMe disks, using managed interrupts), the kernel can affinitize these interrupts to a small subset of CPUs instead of spreading them out evenly. irq_matrix_alloc_managed() tries to select the CPU in the supplied cpumask of possible target CPUs which has the lowest number of interrupt vectors allocated. This is done by searching the CPU with the highest number of available vectors. While this is correct for non-managed CPUs it can select the wrong CPU for managed interrupts. Under certain constellations this results in affinitizing the managed interrupts of several devices to a single CPU in a set. The book keeping of available vectors works the following way: 1) Non-managed interrupts: available is decremented when the interrupt is actually requested by the device driver and a vector is assigned. It's incremented when the interrupt and the vector are freed. 2) Managed interrupts: Managed interrupts guarantee vector reservation when the MSI/MSI-X functionality of a device is enabled, which is achieved by reserving vectors in the bitmaps of the possible target CPUs. This reservation decrements the available count on each possible target CPU. When the interrupt is requested by the device driver then a vector is allocated from the reserved region. The operation is reversed when the interrupt is freed by the device driver. Neither of these operations affect the available count. The reservation persist up to the point where the MSI/MSI-X functionality is disabled and only this operation increments the available count again. For non-managed interrupts the available count is the correct selection criterion because the guaranteed reservations need to be taken into account. Using the allocated counter could lead to a failing allocation in the following situation (total vector space of 10 assumed): CPU0 CPU1 available: 2 0 allocated: 5 3 <--- CPU1 is selected, but available space = 0 managed reserved: 3 7 while available yields the correct result. For managed interrupts the available count is not the appropriate selection criterion because as explained above the available count is not affected by the actual vector allocation. The following example illustrates that. Total vector space of 10 assumed. The starting point is: CPU0 CPU1 available: 5 4 allocated: 2 3 managed reserved: 3 3 Allocating vectors for three non-managed interrupts will result in affinitizing the first two to CPU0 and the third one to CPU1 because the available count is adjusted with each allocation: CPU0 CPU1 available: 5 4 <- Select CPU0 for 1st allocation --> allocated: 3 3 available: 4 4 <- Select CPU0 for 2nd allocation --> allocated: 4 3 available: 3 4 <- Select CPU1 for 3rd allocation --> allocated: 4 4 But the allocation of three managed interrupts starting from the same point will affinitize all of them to CPU0 because the available count is not affected by the allocation (see above). So the end result is: CPU0 CPU1 available: 5 4 allocated: 5 3 Introduce a "managed_allocated" field in struct cpumap to track the vector allocation for managed interrupts separately. Use this information to select the target CPU when a vector is allocated for a managed interrupt, which results in more evenly distributed vector assignments. The above example results in the following allocations: CPU0 CPU1 managed_allocated: 0 0 <- Select CPU0 for 1st allocation --> allocated: 3 3 managed_allocated: 1 0 <- Select CPU1 for 2nd allocation --> allocated: 3 4 managed_allocated: 1 1 <- Select CPU0 for 3rd allocation --> allocated: 4 4 The allocation of non-managed interrupts is not affected by this change and is still evaluating the available count. The overall distribution of interrupt vectors for both types of interrupts might still not be perfectly even depending on the number of non-managed and managed interrupts in a system, but due to the reservation guarantee for managed interrupts this cannot be avoided. Expose the new field in debugfs as well. [ tglx: Clarified the background of the problem in the changelog and described it independent of NVME ] Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Kelley <mikelley@microsoft.com> Link: https://lkml.kernel.org/r/20181106040000.27316-1-longli@linuxonhyperv.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-03-05 17:58:45 +01:00
Dou Liyang	8cae7757e8	irq/matrix: Spread managed interrupts on allocation [ Upstream commit `76f99ae5b5` ] Linux spreads out the non managed interrupt across the possible target CPUs to avoid vector space exhaustion. Managed interrupts are treated differently, as for them the vectors are reserved (with guarantee) when the interrupt descriptors are initialized. When the interrupt is requested a real vector is assigned. The assignment logic uses the first CPU in the affinity mask for assignment. If the interrupt has more than one CPU in the affinity mask, which happens when a multi queue device has less queues than CPUs, then doing the same search as for non managed interrupts makes sense as it puts the interrupt on the least interrupt plagued CPU. For single CPU affine vectors that's obviously a NOOP. Restructre the matrix allocation code so it does the 'best CPU' search, add the sanity check for an empty affinity mask and adapt the call site in the x86 vector management code. [ tglx: Added the empty mask check to the core and improved change log ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180908175838.14450-2-dou_liyang@163.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-03-05 17:58:45 +01:00
Dou Liyang	2948b8875d	irq/matrix: Split out the CPU selection code into a helper [ Upstream commit `8ffe4e61c0` ] Linux finds the CPU which has the lowest vector allocation count to spread out the non managed interrupts across the possible target CPUs, but does not do so for managed interrupts. Split out the CPU selection code into a helper function for reuse. No functional change. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180908175838.14450-1-dou_liyang@163.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-03-05 17:58:45 +01:00
Greg Kroah-Hartman	c97d2b535c	Merge 4.19.26 into android-4.19 Changes in 4.19.26 ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction tracing: Fix number of entries in trace header MIPS: eBPF: Always return sign extended 32b values gpio: MT7621: use a per instance irq_chip structure gpio: pxa: avoid attempting to set pin direction via pinctrl on MMP2 mac80211: Restore vif beacon interval if start ap fails mac80211: Use linked list instead of rhashtable walk for mesh tables mac80211: Free mpath object when rhashtable insertion fails libceph: handle an empty authorize reply ceph: avoid repeatedly adding inode to mdsc->snap_flush_list numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES proc, oom: do not report alien mms when setting oom_score_adj ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5 ALSA: hda/realtek: Disable PC beep in passthrough on alc285 KEYS: allow reaching the keys quotas exactly backlight: pwm_bl: Fix devicetree parsing with auto-generated brightness tables mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells pvcalls-front: read all data before closing the connection pvcalls-front: don't try to free unallocated rings pvcalls-front: properly allocate sk pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read mfd: twl-core: Fix section annotations on {,un}protect_pm_master mfd: db8500-prcmu: Fix some section annotations mfd: mt6397: Do not call irq_domain_remove if PMIC unsupported mfd: ab8500-core: Return zero in get_register_interruptible() mfd: bd9571mwv: Add volatile register to make DVFS work mfd: qcom_rpm: write fw_version to CTRL_REG mfd: wm5110: Add missing ASRC rate register mfd: axp20x: Add AC power supply cell for AXP813 mfd: axp20x: Re-align MFD cell entries mfd: axp20x: Add supported cells for AXP803 mfd: cros_ec_dev: Add missing mfd_remove_devices() call in remove mfd: tps65218: Use devm_regmap_add_irq_chip and clean up error path in probe() mfd: mc13xxx: Fix a missing check of a register-read failure xen/pvcalls: remove set but not used variable 'intf' qed: Fix qed_chain_set_prod() for PBL chains with non power of 2 page count qed: Fix qed_ll2_post_rx_buffer_notify_fw() by adding a write memory barrier net: hns: Fix use after free identified by SLUB debug bpf: Fix [::] -> [::1] rewrite in sys_sendmsg selftests/bpf: Test [::] -> [::1] rewrite in sys_sendmsg in test_sock_addr watchdog: mt7621_wdt/rt2880_wdt: Fix compilation problem net/mlx4: Get rid of page operation after dma_alloc_coherent MIPS: ath79: Enable OF serial ports in the default config xprtrdma: Double free in rpcrdma_sendctxs_create() mlxsw: spectrum_acl: Add cleanup after C-TCAM update error condition selftests: forwarding: Add a test for VLAN deletion netfilter: nf_tables: fix leaking object reference count scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param scsi: isci: initialize shost fully before calling scsi_add_host() include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR MIPS: jazz: fix 64bit build netfilter: nft_flow_offload: Fix reverse route lookup bpf: correctly set initial window on active Fast Open sender pvcalls-front: Avoid get_free_pages(GFP_KERNEL) under spinlock bpf: fix panic in stack_map_get_build_id() on i386 and arm32 netfilter: nft_flow_offload: fix interaction with vrf slave device RDMA/mthca: Clear QP objects during their allocation powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool. acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id() net: stmmac: Fix PCI module removal leak net: stmmac: dwxgmac2: Only clear interrupts that are active net: stmmac: Check if CBS is supported before configuring net: stmmac: Fix the logic of checking if RX Watchdog must be enabled net: stmmac: Prevent RX starvation in stmmac_napi_poll() isdn: i4l: isdn_tty: Fix some concurrency double-free bugs scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes scsi: ufs: Fix system suspend status scsi: qedi: Add ep_state for login completion on un-reachable targets scsi: ufs: Fix geometry descriptor size scsi: cxgb4i: add wait_for_completion() netfilter: nft_flow_offload: fix checking method of conntrack helper always clear the X2APIC_ENABLE bit for PV guest drm/meson: add missing of_node_put drm/amdkfd: Don't assign dGPUs to APU topology devices drm/amd/display: fix PME notification not working in RV desktop vhost: return EINVAL if iovecs size does not match the message size drm/sun4i: backend: add missing of_node_puts pvcalls-front: fix potential null dereference selftests: tc-testing: drop test on missing tunnel key id selftests: tc-testing: fix tunnel_key failure if dst_port is unspecified selftests: tc-testing: fix parsing of ife type afs: Don't set vnode->cb_s_break in afs_validate() afs: Fix key refcounting in file locking code bpf: don't assume build-id length is always 20 bytes bpf: zero out build_id for BPF_STACK_BUILD_ID_IP selftests/bpf: retry tests that expect build-id atm: he: fix sign-extension overflow on large shift hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table leds: lp5523: fix a missing check of return value of lp55xx_read bpf: bpf_setsockopt: reset sock dst on SO_MARK changes dpaa_eth: NETIF_F_LLTX requires to do our own update of trans_start mlxsw: pci: Return error on PCI reset timeout net: bridge: Mark FDB entries that were added by user as such mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky selftests: forwarding: Add a test case for externally learned FDB entries net/mlx5e: Fix wrong (zero) TX drop counter indication for representor isdn: avm: Fix string plus integer warning from Clang batman-adv: fix uninit-value in batadv_interface_tx() inet_diag: fix reporting cgroup classid and fallback to priority ipv6: propagate genlmsg_reply return code net: ena: fix race between link up and device initalization net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames net/mlx5e: Don't overwrite pedit action when multiple pedit used net/packet: fix 4gb buffer limit due to overflow check net: sfp: do not probe SFP module before we're attached sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate team: avoid complex list operations in team_nl_cmd_options_set() Revert "socket: fix struct ifreq size in compat ioctl" Revert "kill dev_ifsioc()" net: socket: fix SIOCGIFNAME in compat net: socket: make bond ioctls go through compat_ifreq_ioctl() geneve: should not call rt6_lookup() when ipv6 was disabled sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() net_sched: fix a race condition in tcindex_destroy() net_sched: fix a memory leak in cls_tcindex net_sched: fix two more memory leaks in cls_tcindex net/mlx5e: XDP, fix redirect resources availability check RDMA/srp: Rework SCSI device reset handling KEYS: user: Align the payload buffer KEYS: always initialize keyring_index_key::desc_len parisc: Fix ptrace syscall number modification ARCv2: Enable unaligned access in early ASM code ARC: U-boot: check arguments paranoidly ARC: define ARCH_SLAB_MINALIGN = 8 drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime drm/i915/fbdev: Actually configure untiled displays drm/amd/display: Fix MST reboot/poweroff sequence mac80211: allocate tailroom for forwarded mesh packets kvm: x86: Return LA57 feature based on hardware capability net: validate untrusted gso packets without csum offload net: avoid false positives in untrusted gso validation staging: erofs: fix a bug when appling cache strategy staging: erofs: complete error handing of z_erofs_do_read_page staging: erofs: replace BUG_ON with DBG_BUGON in data.c staging: erofs: drop multiref support temporarily staging: erofs: remove the redundant d_rehash() for the root dentry staging: erofs: atomic_cond_read_relaxed on ref-locked workgroup staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}' staging: erofs: add a full barrier in erofs_workgroup_unfreeze staging: erofs: {dir,inode,super}.c: rectify BUG_ONs staging: erofs: unzip_{pagevec.h,vle.c}: rectify BUG_ONs staging: erofs: unzip_vle_lz4.c,utils.c: rectify BUG_ONs Revert "bridge: do not add port to router list when receives query with source 0.0.0.0" netfilter: nf_tables: fix flush after rule deletion in the same batch netfilter: nft_compat: use-after-free when deleting targets netfilter: ipv6: Don't preserve original oif for loopback address netfilter: nfnetlink_osf: add missing fmatch check netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put() udlfb: handle unplug properly pinctrl: max77620: Use define directive for max77620_pinconf_param values net: phylink: avoid resolving link state too early Linux 4.19.26 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-02-27 10:23:18 +01:00
Stanislav Fomichev	5c6fdd877e	bpf: zero out build_id for BPF_STACK_BUILD_ID_IP [ Upstream commit `4af396ae48` ] When returning BPF_STACK_BUILD_ID_IP from stack_map_get_build_id_offset, make sure that build_id field is empty. Since we are using percpu free list, there is a possibility that we might reuse some previous bpf_stack_build_id with non-zero build_id. Fixes: `615755a77b` ("bpf: extend stackmap to save binary_build_id+offset instead of address") Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-27 10:08:56 +01:00
Stanislav Fomichev	c4555b9f28	bpf: don't assume build-id length is always 20 bytes [ Upstream commit `0b698005a9` ] Build-id length is not fixed to 20, it can be (`man ld` /--build-id): * 128-bit (uuid) * 160-bit (sha1) * any length specified in ld --build-id=0xhexstring To fix the issue of missing BPF_STACK_BUILD_ID_VALID for shorter build-ids, assume that build-id is somewhere in the range of 1 .. 20. Set the remaining bytes to zero. v2: * don't introduce new "len = min(BPF_BUILD_ID_SIZE, nhdr->n_descsz)", we already know that nhdr->n_descsz <= BPF_BUILD_ID_SIZE if we enter this 'if' condition Fixes: `615755a77b` ("bpf: extend stackmap to save binary_build_id+offset instead of address") Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-27 10:08:56 +01:00
Song Liu	2f3480e340	bpf: fix panic in stack_map_get_build_id() on i386 and arm32 [ Upstream commit `beaf3d1901` ] As Naresh reported, test_stacktrace_build_id() causes panic on i386 and arm32 systems. This is caused by page_address() returns NULL in certain cases. This patch fixes this error by using kmap_atomic/kunmap_atomic instead of page_address. Fixes: `615755a77b` (" bpf: extend stackmap to save binary_build_id+offset instead of address") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-27 10:08:54 +01:00
Quentin Perret	b5e57dbb5a	tracing: Fix number of entries in trace header commit `9e7382153f` upstream. The following commit `441dae8f2f` ("tracing: Add support for display of tgid in trace output") removed the call to print_event_info() from print_func_help_header_irq() which results in the ftrace header not reporting the number of entries written in the buffer. As this wasn't the original intent of the patch, re-introduce the call to print_event_info() to restore the orginal behaviour. Link: http://lkml.kernel.org/r/20190214152950.4179-1-quentin.perret@arm.com Acked-by: Joel Fernandes <joelaf@google.com> Cc: stable@vger.kernel.org Fixes: `441dae8f2f` ("tracing: Add support for display of tgid in trace output") Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-27 10:08:49 +01:00
Greg Kroah-Hartman	cca7d2df6d	Merge 4.19.24 into android-4.19 Changes in 4.19.24 dt-bindings: eeprom: at24: add "atmel,24c2048" compatible string eeprom: at24: add support for 24c2048 blk-mq: fix a hung issue when fsync ARM: 8789/1: signal: copy registers using __copy_to_user() ARM: 8790/1: signal: always use __copy_to_user to save iwmmxt context ARM: 8791/1: vfp: use __copy_to_user() when saving VFP state ARM: 8792/1: oabi-compat: copy oabi events using __copy_to_user() ARM: 8793/1: signal: replace __put_user_error with __put_user ARM: 8794/1: uaccess: Prevent speculative use of the current addr_limit ARM: 8795/1: spectre-v1.1: use put_user() for __put_user() ARM: 8796/1: spectre-v1,v1.1: provide helpers for address sanitization ARM: 8797/1: spectre-v1.1: harden __copy_to_user ARM: 8810/1: vfp: Fix wrong assignement to ufp_exc ARM: make lookup_processor_type() non-__init ARM: split out processor lookup ARM: clean up per-processor check_bugs method call ARM: add PROC_VTABLE and PROC_TABLE macros ARM: spectre-v2: per-CPU vtables to work around big.Little systems ARM: ensure that processor vtables is not lost after boot ARM: fix the cockup in the previous patch drm/amdgpu/sriov:Correct pfvf exchange logic ACPI: NUMA: Use correct type for printing addresses on i386-PAE perf report: Fix wrong iteration count in --branch-history perf test shell: Use a fallback to get the pathname in vfs_getname tools uapi: fix RISC-V 64-bit support riscv: fix trace_sys_exit hook cpufreq: check if policy is inactive early in __cpufreq_get() drm/bridge: tc358767: add bus flags drm/bridge: tc358767: add defines for DP1_SRCCTRL & PHY_2LANE drm/bridge: tc358767: fix single lane configuration drm/bridge: tc358767: fix initial DP0/1_SRCCTRL value drm/bridge: tc358767: reject modes which require too much BW drm/bridge: tc358767: fix output H/V syncs nvme-pci: use the same attributes when freeing host_mem_desc_bufs. nvme-pci: fix out of bounds access in nvme_cqe_pending nvme-multipath: zero out ANA log buffer nvme: pad fake subsys NQN vid and ssvid with zeros drm/amdgpu: set WRITE_BURST_LENGTH to 64B to workaround SDMA1 hang ARM: dts: da850-evm: Correct the audio codec regulators ARM: dts: da850-evm: Correct the sound card name ARM: dts: da850-lcdk: Correct the audio codec regulators ARM: dts: da850-lcdk: Correct the sound card name ARM: dts: kirkwood: Fix polarity of GPIO fan lines gpio: pl061: handle failed allocations drm/nouveau: Don't disable polling in fallback mode drm/nouveau/falcon: avoid touching registers if engine is off cifs: Limit memory used by lock request calls to a page kvm: sev: Fail KVM_SEV_INIT if already initialized CIFS: Do not assume one credit for async responses gpio: mxc: move gpio noirq suspend/resume to syscore phase Revert "Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G" Input: elan_i2c - add ACPI ID for touchpad in Lenovo V330-15ISK ARM: OMAP5+: Fix inverted nirq pin interrupts with irq_set_type perf/core: Fix impossible ring-buffer sizes warning perf/x86: Add check_period PMU callback ALSA: hda - Add quirk for HP EliteBook 840 G5 ALSA: usb-audio: Fix implicit fb endpoint setup by quirk ASoC: hdmi-codec: fix oops on re-probe tools uapi: fix Alpha support riscv: Add pte bit to distinguish swap from invalid x86/kvm/nVMX: read from MSR_IA32_VMX_PROCBASED_CTLS2 only when it is available kvm: vmx: Fix entry number check for add_atomic_switch_msr() mmc: sunxi: Filter out unsupported modes declared in the device tree mmc: block: handle complete_work on separate workqueue Input: bma150 - register input device after setting private data Input: elantech - enable 3rd button support on Fujitsu CELSIUS H780 Revert "nfsd4: return default lease period" Revert "mm: don't reclaim inodes with many attached pages" Revert "mm: slowly shrink slabs with a relatively small number of objects" alpha: fix page fault handling for r16-r18 targets alpha: Fix Eiger NR_IRQS to 128 s390/zcrypt: fix specification exception on z196 during ap probe tracing/uprobes: Fix output for multiple string arguments x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls scsi: sd: fix entropy gathering for most rotational disks signal: Restore the stop PTRACE_EVENT_EXIT md/raid1: don't clear bitmap bits on interrupted recovery. x86/a.out: Clear the dump structure initially dm crypt: don't overallocate the integrity tag space dm thin: fix bug where bio that overwrites thin block ignores FUA drm: Use array_size() when creating lease drm/vkms: Fix license inconsistent drm/i915: Block fbdev HPD processing during suspend drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set mm: proc: smaps_rollup: fix pss_locked calculation Linux 4.19.24 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-02-20 10:37:09 +01:00
Eric W. Biederman	a2b3e2c0f5	signal: Restore the stop PTRACE_EVENT_EXIT commit `cf43a757fd` upstream. In the middle of do_exit() there is there is a call "ptrace_event(PTRACE_EVENT_EXIT, code);" That call places the process in TACKED_TRACED aka "(TASK_WAKEKILL \| __TASK_TRACED)" and waits for for the debugger to release the task or SIGKILL to be delivered. Skipping past dequeue_signal when we know a fatal signal has already been delivered resulted in SIGKILL remaining pending and TIF_SIGPENDING remaining set. This in turn caused the scheduler to not sleep in PTACE_EVENT_EXIT as it figured a fatal signal was pending. This also caused ptrace_freeze_traced in ptrace_check_attach to fail because it left a per thread SIGKILL pending which is what fatal_signal_pending tests for. This difference in signal state caused strace to report strace: Exit of unknown pid NNNNN ignored Therefore update the signal handling state like dequeue_signal would when removing a per thread SIGKILL, by removing SIGKILL from the per thread signal mask and clearing TIF_SIGPENDING. Acked-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Ivan Delalande <colona@arista.com> Cc: stable@vger.kernel.org Fixes: `35634ffa17` ("signal: Always notice exiting tasks") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-20 10:25:48 +01:00
Andreas Ziegler	45649b9996	tracing/uprobes: Fix output for multiple string arguments commit `0722069a53` upstream. When printing multiple uprobe arguments as strings the output for the earlier arguments would also include all later string arguments. This is best explained in an example: Consider adding a uprobe to a function receiving two strings as parameters which is at offset 0xa0 in strlib.so and we want to print both parameters when the uprobe is hit (on x86_64): $ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \ /sys/kernel/debug/tracing/uprobe_events When the function is called as func("foo", "bar") and we hit the probe, the trace file shows a line like the following: [...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar" Note the extra "bar" printed as part of arg1. This behaviour stacks up for additional string arguments. The strings are stored in a dynamically growing part of the uprobe buffer by fetch_store_string() after copying them from userspace via strncpy_from_user(). The return value of strncpy_from_user() is then directly used as the required size for the string. However, this does not take the terminating null byte into account as the documentation for strncpy_from_user() cleary states that it "[...] returns the length of the string (not including the trailing NUL)" even though the null byte will be copied to the destination. Therefore, subsequent calls to fetch_store_string() will overwrite the terminating null byte of the most recently fetched string with the first character of the current string, leading to the "accumulation" of strings in earlier arguments in the output. Fix this by incrementing the return value of strncpy_from_user() by one if we did not hit the maximum buffer size. Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: `5baaa59ef0` ("tracing/probes: Implement 'memory' fetch method for uprobes") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-20 10:25:48 +01:00
Jiri Olsa	74cbb754d6	perf/x86: Add check_period PMU callback commit `81ec3f3c4c` upstream. Vince (and later on Ravi) reported crashes in the BTS code during fuzzing with the following backtrace: general protection fault: 0000 [#1] SMP PTI ... RIP: 0010:perf_prepare_sample+0x8f/0x510 ... Call Trace: <IRQ> ? intel_pmu_drain_bts_buffer+0x194/0x230 intel_pmu_drain_bts_buffer+0x160/0x230 ? tick_nohz_irq_exit+0x31/0x40 ? smp_call_function_single_interrupt+0x48/0xe0 ? call_function_single_interrupt+0xf/0x20 ? call_function_single_interrupt+0xa/0x20 ? x86_schedule_events+0x1a0/0x2f0 ? x86_pmu_commit_txn+0xb4/0x100 ? find_busiest_group+0x47/0x5d0 ? perf_event_set_state.part.42+0x12/0x50 ? perf_mux_hrtimer_restart+0x40/0xb0 intel_pmu_disable_event+0xae/0x100 ? intel_pmu_disable_event+0xae/0x100 x86_pmu_stop+0x7a/0xb0 x86_pmu_del+0x57/0x120 event_sched_out.isra.101+0x83/0x180 group_sched_out.part.103+0x57/0xe0 ctx_sched_out+0x188/0x240 ctx_resched+0xa8/0xd0 __perf_event_enable+0x193/0x1e0 event_function+0x8e/0xc0 remote_function+0x41/0x50 flush_smp_call_function_queue+0x68/0x100 generic_smp_call_function_single_interrupt+0x13/0x30 smp_call_function_single_interrupt+0x3e/0xe0 call_function_single_interrupt+0xf/0x20 </IRQ> The reason is that while event init code does several checks for BTS events and prevents several unwanted config bits for BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows to create BTS event without those checks being done. Following sequence will cause the crash: If we create an 'almost' BTS event with precise_ip and callchains, and it into a BTS event it will crash the perf_prepare_sample() function because precise_ip events are expected to come in with callchain data initialized, but that's not the case for intel_pmu_drain_bts_buffer() caller. Adding a check_period callback to be called before the period is changed via PERF_EVENT_IOC_PERIOD. It will deny the change if the event would become BTS. Plus adding also the limit_period check as well. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190204123532.GA4794@krava Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-20 10:25:45 +01:00
Ingo Molnar	d10e77c260	perf/core: Fix impossible ring-buffer sizes warning commit `528871b456` upstream. The following commit: `9dff0aa95a` ("perf/core: Don't WARN() for impossible ring-buffer sizes") results in perf recording failures with larger mmap areas: root@skl:/tmp# perf record -g -a failed to mmap with 12 (Cannot allocate memory) The root cause is that the following condition is buggy: if (order_base_2(size) >= MAX_ORDER) goto fail; The problem is that @size is in bytes and MAX_ORDER is in pages, so the right test is: if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER) goto fail; Fix it. Reported-by: "Jin, Yao" <yao.jin@linux.intel.com> Bisected-by: Borislav Petkov <bp@alien8.de> Analyzed-by: Peter Zijlstra <peterz@infradead.org> Cc: Julien Thierry <julien.thierry@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Fixes: `9dff0aa95a` ("perf/core: Don't WARN() for impossible ring-buffer sizes") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-20 10:25:44 +01:00
Greg Kroah-Hartman	0755dc9375	Merge 4.19.22 into android-4.19 Changes in 4.19.22 mtd: Make sure mtd->erasesize is valid even if the partition is of size 0 mtd: spinand: Handle the case where PROGRAM LOAD does not reset the cache mtd: spinand: Fix the error/cleanup path in spinand_init() mtd: rawnand: gpmi: fix MX28 bus master lockup problem libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD tools: iio: iio_generic_buffer: make num_loops signed iio: adc: axp288: Fix TS-pin handling iio: chemical: atlas-ph-sensor: correct IIO_TEMP values to millicelsius iio: ti-ads8688: Update buffer allocation for timestamps signal: Always notice exiting tasks signal: Better detection of synchronous signals misc: vexpress: Off by one in vexpress_syscfg_exec() mei: me: add ice lake point device id. samples: mei: use /dev/mei0 instead of /dev/mei debugfs: fix debugfs_rename parameter checking pinctrl: sunxi: Correct number of IRQ banks on H6 main pin controller pinctrl: cherryview: fix Strago DMI workaround tracing: uprobes: Fix typo in pr_fmt string mips: cm: reprime error cause MIPS: OCTEON: don't set octeon_dma_bar_type if PCI is disabled MIPS: VDSO: Use same -m%-float cflag as the kernel proper mips: loongson64: remove unreachable(), fix loongson_poweroff(). MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds ARM: iop32x/n2100: fix PCI IRQ mapping ARM: tango: Improve ARCH_MULTIPLATFORM compatibility ARM: dts: da850: fix interrupt numbers for clocksource firmware: arm_scmi: provide the mandatory device release callback powerpc/radix: Fix kernel crash with mremap() mic: vop: Fix use-after-free on remove mac80211: ensure that mgmt tx skbs have tailroom for encryption drm/modes: Prevent division by zero htotal drm/amd/powerplay: Fix missing break in switch drm/i915: always return something on DDI clock selection drm/vmwgfx: Fix setting of dma masks drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT xfrm: Make set-mark default behavior backward compatible Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal" libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive() xfrm: refine validation of template and selector families batman-adv: Avoid WARN on net_device without parent in netns batman-adv: Force mac header to start of data on xmit svcrdma: Reduce max_send_sges svcrdma: Remove max_sge check at connect time Linux 4.19.22 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-02-15 09:02:44 +01:00
Andreas Ziegler	7e44aab927	tracing: uprobes: Fix typo in pr_fmt string commit `ea6eb5e7d1` upstream. The subsystem-specific message prefix for uprobes was also "trace_kprobe: " instead of "trace_uprobe: " as described in the original commit message. Link: http://lkml.kernel.org/r/20190117133023.19292-1-andreas.ziegler@fau.de Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: `7257634135` ("tracing/probe: Show subsystem name in messages") Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-15 08:10:11 +01:00
Eric W. Biederman	959e46afec	signal: Better detection of synchronous signals commit `7146db3317` upstream. Recently syzkaller was able to create unkillablle processes by creating a timer that is delivered as a thread local signal on SIGHUP, and receiving SIGHUP SA_NODEFERER. Ultimately causing a loop failing to deliver SIGHUP but always trying. When the stack overflows delivery of SIGHUP fails and force_sigsegv is called. Unfortunately because SIGSEGV is numerically higher than SIGHUP next_signal tries again to deliver a SIGHUP. From a quality of implementation standpoint attempting to deliver the timer SIGHUP signal is wrong. We should attempt to deliver the synchronous SIGSEGV signal we just forced. We can make that happening in a fairly straight forward manner by instead of just looking at the signal number we also look at the si_code. In particular for exceptions (aka synchronous signals) the si_code is always greater than 0. That still has the potential to pick up a number of asynchronous signals as in a few cases the same si_codes that are used for synchronous signals are also used for asynchronous signals, and SI_KERNEL is also included in the list of possible si_codes. Still the heuristic is much better and timer signals are definitely excluded. Which is enough to prevent all known ways for someone sending a process signals fast enough to cause unexpected and arguably incorrect behavior. Cc: stable@vger.kernel.org Fixes: `a27341cd5f` ("Prioritize synchronous signals over 'normal' signals") Tested-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-15 08:10:11 +01:00
Eric W. Biederman	f681f2684f	signal: Always notice exiting tasks commit `35634ffa17` upstream. Recently syzkaller was able to create unkillablle processes by creating a timer that is delivered as a thread local signal on SIGHUP, and receiving SIGHUP SA_NODEFERER. Ultimately causing a loop failing to deliver SIGHUP but always trying. Upon examination it turns out part of the problem is actually most of the solution. Since 2.5 signal delivery has found all fatal signals, marked the signal group for death, and queued SIGKILL in every threads thread queue relying on signal->group_exit_code to preserve the information of which was the actual fatal signal. The conversion of all fatal signals to SIGKILL results in the synchronous signal heuristic in next_signal kicking in and preferring SIGHUP to SIGKILL. Which is especially problematic as all fatal signals have already been transformed into SIGKILL. Instead of dequeueing signals and depending upon SIGKILL to be the first signal dequeued, first test if the signal group has already been marked for death. This guarantees that nothing in the signal queue can prevent a process that needs to exit from exiting. Cc: stable@vger.kernel.org Tested-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Ref: ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-15 08:10:11 +01:00
Greg Kroah-Hartman	6e0411bdc2	Merge 4.19.21 into android-4.19 Changes in 4.19.21 devres: Align data[] to ARCH_KMALLOC_MINALIGN drm/bufs: Fix Spectre v1 vulnerability staging: iio: adc: ad7280a: handle error from __ad7280_read32() drm/vgem: Fix vgem_init to get drm device available. pinctrl: bcm2835: Use raw spinlock for RT compatibility ASoC: Intel: mrfld: fix uninitialized variable access gpiolib: Fix possible use after free on label drm/sun4i: Initialize registers in tcon-top driver genirq/affinity: Spread IRQs to all available NUMA nodes gpu: ipu-v3: image-convert: Prevent race between run and unprepare nds32: Fix gcc 8.0 compiler option incompatible. wil6210: fix reset flow for Talyn-mb wil6210: fix memory leak in wil_find_tx_bcast_2 ath10k: assign 'n_cipher_suites' for WCN3990 ath9k: dynack: use authentication messages for 'late' ack scsi: lpfc: Correct LCB RJT handling scsi: mpt3sas: Call sas_remove_host before removing the target devices scsi: lpfc: Fix LOGO/PLOGI handling when triggerd by ABTS Timeout event ARM: 8808/1: kexec:offline panic_smp_self_stop CPU clk: boston: fix possible memory leak in clk_boston_setup() dlm: Don't swamp the CPU with callbacks queued during recovery x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux) powerpc/pseries: add of_node_put() in dlpar_detach_node() crypto: aes_ti - disable interrupts while accessing S-box drm/vc4: ->x_scaling[1] should never be set to VC4_SCALING_NONE serial: fsl_lpuart: clear parity enable bit when disable parity ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl MIPS: Boston: Disable EG20T prefetch dpaa2-ptp: defer probe when portal allocation failed iwlwifi: fw: do not set sgi bits for HE connection staging:iio:ad2s90: Make probe handle spi_setup failure fpga: altera-cvp: Fix registration for CvP incapable devices Tools: hv: kvp: Fix a warning of buffer overflow with gcc 8.0.1 fpga: altera-cvp: fix 'bad IO access' on x86_64 vbox: fix link error with 'gcc -Og' platform/chrome: don't report EC_MKBP_EVENT_SENSOR_FIFO as wakeup i40e: prevent overlapping tx_timeout recover scsi: hisi_sas: change the time of SAS SSP connection staging: iio: ad7780: update voltage on read usbnet: smsc95xx: fix rx packet alignment drm/rockchip: fix for mailbox read size ARM: OMAP2+: hwmod: Fix some section annotations drm/amd/display: fix gamma not being applied correctly drm/amd/display: calculate stream->phy_pix_clk before clock mapping bpf: libbpf: retry map creation without the name net/mlx5: EQ, Use the right place to store/read IRQ affinity hint modpost: validate symbol names also in find_elf_symbol perf tools: Add Hygon Dhyana support soc/tegra: Don't leak device tree node reference media: rc: ensure close() is called on rc_unregister_device media: video-i2c: avoid accessing released memory area when removing driver media: mtk-vcodec: Release device nodes in mtk_vcodec_init_enc_pm() staging: erofs: fix the definition of DBG_BUGON clk: meson: meson8b: do not use cpu_div3 for cpu_scale_out_sel clk: meson: meson8b: fix the width of the cpu_scale_div clock clk: meson: meson8b: mark the CPU clock as CLK_IS_CRITICAL ptp: Fix pass zero to ERR_PTR() in ptp_clock_register dmaengine: xilinx_dma: Remove __aligned attribute on zynqmp_dma_desc_ll powerpc/32: Add .data..Lubsan_data/.data..Lubsan_type sections explicitly iio: adc: meson-saradc: check for devm_kasprintf failure iio: adc: meson-saradc: fix internal clock names iio: accel: kxcjk1013: Add KIOX010A ACPI Hardware-ID media: adv*/tc358743/ths8200: fill in min width/height/pixelclock ACPI: SPCR: Consider baud rate 0 as preconfigured state staging: pi433: fix potential null dereference f2fs: move dir data flush to write checkpoint process f2fs: fix race between write_checkpoint and write_begin f2fs: fix wrong return value of f2fs_acl_create i2c: sh_mobile: add support for r8a77990 (R-Car E3) arm64: io: Ensure calls to delay routines are ordered against prior readX() net: aquantia: return 'err' if set MPI_DEINIT state fails sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN soc: bcm: brcmstb: Don't leak device tree node reference nfsd4: fix crash on writing v4_end_grace before nfsd startup drm: Clear state->acquire_ctx before leaving drm_atomic_helper_commit_duplicated_state() perf: arm_spe: handle devm_kasprintf() failure arm64: io: Ensure value passed to __iormb() is held in a 64-bit register Thermal: do not clear passive state during system sleep thermal: Fix locking in cooling device sysfs update cur_state firmware/efi: Add NULL pointer checks in efivars API functions s390/zcrypt: improve special ap message cmd handling mt76x0: dfs: fix IBI_R11 configuration on non-radar channels arm64: ftrace: don't adjust the LR value drm/v3d: Fix prime imports of buffers from other drivers. ARM: dts: mmp2: fix TWSI2 ARM: dts: aspeed: add missing memory unit-address x86/fpu: Add might_fault() to user_insn() media: i2c: TDA1997x: select CONFIG_HDMI media: DaVinci-VPBE: fix error handling in vpbe_initialize() smack: fix access permissions for keyring xtensa: xtfpga.dtsi: fix dtc warnings about SPI usb: dwc3: Correct the logic for checking TRB full in __dwc3_prepare_one_trb() usb: dwc2: Disable power down feature on Samsung SoCs usb: hub: delay hub autosuspend if USB3 port is still link training timekeeping: Use proper seqcount initializer usb: mtu3: fix the issue about SetFeature(U1/U2_Enable) clk: sunxi-ng: a33: Set CLK_SET_RATE_PARENT for all audio module clocks media: imx274: select REGMAP_I2C drm/amdgpu/powerplay: fix clock stretcher limits on polaris (v2) tipc: fix node keep alive interval calculation driver core: Move async_synchronize_full call kobject: return error code if writing /sys/.../uevent fails IB/hfi1: Unreserve a reserved request when it is completed usb: dwc3: trace: add missing break statement to make compiler happy gpio: mt7621: report failure of devm_kasprintf() gpio: mt7621: pass mediatek_gpio_bank_probe() failure up the stack pinctrl: sx150x: handle failure case of devm_kstrdup iommu/amd: Fix amd_iommu=force_isolation ARM: dts: Fix OMAP4430 SDP Ethernet startup mips: bpf: fix encoding bug for mm_srlv32_op media: coda: fix H.264 deblocking filter controls ARM: dts: Fix up the D-Link DIR-685 MTD partition info watchdog: renesas_wdt: don't set divider while watchdog is running ARM: dts: imx51-zii-rdu1: Do not specify "power-gpio" for hpa1 usb: dwc3: gadget: Disable CSP for stream OUT ep iommu/arm-smmu-v3: Avoid memory corruption from Hisilicon MSI payloads iommu/arm-smmu: Add support for qcom,smmu-v2 variant iommu/arm-smmu-v3: Use explicit mb() when moving cons pointer sata_rcar: fix deferred probing clk: imx6sl: ensure MMDC CH0 handshake is bypassed platform/x86: mlx-platform: Fix tachometer registers cpuidle: big.LITTLE: fix refcount leak OPP: Use opp_table->regulators to verify no regulator case tee: optee: avoid possible double list_del() drm/msm/dsi: fix dsi clock names in DSI 10nm PLL driver drm/msm: dpu: Only check flush register against pending flushes lightnvm: pblk: fix resubmission of overwritten write err lbas lightnvm: pblk: add lock protection to list operations i2c-axxia: check for error conditions first phy: sun4i-usb: add support for missing USB PHY index mlxsw: spectrum_acl: Limit priority value udf: Fix BUG on corrupted inode switchtec: Fix SWITCHTEC_IOCTL_EVENT_IDX_ALL flags overwrite selftests/bpf: use __bpf_constant_htons in test_prog.c ARM: pxa: avoid section mismatch warning ASoC: fsl: Fix SND_SOC_EUKREA_TLV320 build error on i.MX8M KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines mmc: bcm2835: Recover from MMC_SEND_EXT_CSD mmc: bcm2835: reset host on timeout mmc: meson-mx-sdio: check devm_kasprintf for failure memstick: Prevent memstick host from getting runtime suspended during card detection mmc: sdhci-of-esdhc: Fix timeout checks mmc: sdhci-omap: Fix timeout checks mmc: sdhci-xenon: Fix timeout checks mmc: jz4740: Get CD/WP GPIOs from descriptors usb: renesas_usbhs: add support for RZ/G2E btrfs: harden agaist duplicate fsid on scanned devices serial: sh-sci: Fix locking in sci_submit_rx() serial: sh-sci: Resume PIO in sci_rx_interrupt() on DMA failure tty: serial: samsung: Properly set flags in autoCTS mode perf test: Fix perf_event_attr test failure perf dso: Fix unchecked usage of strncpy() perf header: Fix unchecked usage of strncpy() btrfs: use tagged writepage to mitigate livelock of snapshot perf probe: Fix unchecked usage of strncpy() i2c: sh_mobile: Add support for r8a774c0 (RZ/G2E) bnxt_en: Disable MSIX before re-reserving NQs/CMPL rings. tools/power/x86/intel_pstate_tracer: Fix non root execution for post processing a trace file livepatch: check kzalloc return values arm64: KVM: Skip MMIO insn after emulation usb: musb: dsps: fix otg state machine usb: musb: dsps: fix runtime pm for peripheral mode perf header: Fix up argument to ctime() perf tools: Cast off_t to s64 to avoid warning on bionic libc percpu: convert spin_lock_irq to spin_lock_irqsave. net: hns3: fix incomplete uninitialization of IRQ in the hns3_nic_uninit_vector_data() drm/amd/display: Add retry to read ddc_clock pin Bluetooth: hci_bcm: Handle deferred probing for the clock supply drm/amd/display: fix YCbCr420 blank color powerpc/uaccess: fix warning/error with access_ok() mac80211: fix radiotap vendor presence bitmap handling xfrm6_tunnel: Fix spi check in __xfrm6_tunnel_alloc_spi mlxsw: spectrum: Properly cleanup LAG uppers when removing port from LAG scsi: smartpqi: correct host serial num for ssa scsi: smartpqi: correct volume status scsi: smartpqi: increase fw status register read timeout cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan() net: hns3: add max vector number check for pf powerpc/perf: Fix thresholding counter data for unknown type iwlwifi: mvm: fix setting HE ppe FW config powerpc/powernv/ioda: Allocate indirect TCE levels of cached userspace addresses on demand mlx5: update timecounter at least twice per counter overflow drbd: narrow rcu_read_lock in drbd_sync_handshake drbd: disconnect, if the wrong UUIDs are attached on a connected peer drbd: skip spurious timeout (ping-timeo) when failing promote drbd: Avoid Clang warning about pointless switch statment drm/amd/display: validate extended dongle caps video: clps711x-fb: release disp device node in probe() md: fix raid10 hang issue caused by barrier fbdev: fbmem: behave better with small rotated displays and many CPUs i40e: define proper net_device::neigh_priv_len ice: Do not enable NAPI on q_vectors that have no rings igb: Fix an issue that PME is not enabled during runtime suspend ACPI/APEI: Clear GHES block_status before panic() fbdev: fbcon: Fix unregister crash when more than one framebuffer powerpc/mm: Fix reporting of kernel execute faults on the 8xx pinctrl: meson: meson8: fix the GPIO function for the GPIOAO pins pinctrl: meson: meson8b: fix the GPIO function for the GPIOAO pins KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupported powerpc/fadump: Do not allow hot-remove memory from fadump reserved area. kvm: Change offset in kvm_write_guest_offset_cached to unsigned NFS: nfs_compare_mount_options always compare auth flavors. perf build: Don't unconditionally link the libbfd feature test to -liberty and -lz hwmon: (lm80) fix a missing check of the status of SMBus read hwmon: (lm80) fix a missing check of bus read in lm80 probe seq_buf: Make seq_buf_puts() null-terminate the buffer crypto: ux500 - Use proper enum in cryp_set_dma_transfer crypto: ux500 - Use proper enum in hash_set_dma_transfer MIPS: ralink: Select CONFIG_CPU_MIPSR2_IRQ_VI on MT7620/8 cifs: check ntwrk_buf_start for NULL before dereferencing it f2fs: fix use-after-free issue when accessing sbi->stat_info um: Avoid marking pages with "changed protection" niu: fix missing checks of niu_pci_eeprom_read f2fs: fix sbi->extent_list corruption issue cgroup: fix parsing empty mount option string perf python: Do not force closing original perf descriptor in evlist.get_pollfd() scripts/decode_stacktrace: only strip base path when a prefix of the path arch/sh/boards/mach-kfr2r09/setup.c: fix struct mtd_oob_ops build warning ocfs2: don't clear bh uptodate for block read ocfs2: improve ocfs2 Makefile mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init zram: fix lockdep warning of free block handling isdn: hisax: hfc_pci: Fix a possible concurrency use-after-free bug in HFCPCI_l1hw() gdrom: fix a memory leak bug fsl/fman: Use GFP_ATOMIC in {memac,tgec}_add_hash_mac_address() block/swim3: Fix -EBUSY error when re-opening device after unmount thermal: bcm2835: enable hwmon explicitly kdb: Don't back trace on a cpu that didn't round up PCI: imx: Enable MSI from downstream components thermal: generic-adc: Fix adc to temp interpolation HID: lenovo: Add checks to fix of_led_classdev_register arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition kernel/hung_task.c: break RCU locks based on jiffies proc/sysctl: fix return error for proc_doulongvec_minmax() kernel/hung_task.c: force console verbose before panic fs/epoll: drop ovflist branch prediction exec: load_script: don't blindly truncate shebang string kernel/kcov.c: mark write_comp_data() as notrace scripts/gdb: fix lx-version string output xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat xfs: cancel COW blocks before swapext xfs: Fix error code in 'xfs_ioc_getbmap()' xfs: fix overflow in xfs_attr3_leaf_verify xfs: fix shared extent data corruption due to missing cow reservation xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers xfs: delalloc -> unwritten COW fork allocation can go wrong fs/xfs: fix f_ffree value for statfs when project quota is set xfs: fix PAGE_MASK usage in xfs_free_file_space xfs: fix inverted return from xfs_btree_sblock_verify_crc thermal: hwmon: inline helpers when CONFIG_THERMAL_HWMON is not set dccp: fool proof ccid_hc_[rt]x_parse_options() enic: fix checksum validation for IPv6 lib/test_rhashtable: Make test_insert_dup() allocate its hash table dynamically net: dp83640: expire old TX-skb net: dsa: Fix lockdep false positive splat net: dsa: Fix NULL checking in dsa_slave_set_eee() net: dsa: mv88e6xxx: Fix counting of ATU violations net: dsa: slave: Don't propagate flag changes on down slave interfaces net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames net: systemport: Fix WoL with password after deep sleep rds: fix refcount bug in rds_sock_addref Revert "net: phy: marvell: avoid pause mode on SGMII-to-Copper for 88e151x" rxrpc: bad unlock balance in rxrpc_recvmsg sctp: check and update stream->out_curr when allocating stream_out sctp: walk the list of asoc safely skge: potential memory corruption in skge_get_regs() virtio_net: Account for tx bytes and packets on sending xdp_frames net/mlx5e: FPGA, fix Innova IPsec TX offload data path performance xfs: eof trim writeback mapping as soon as it is cached ALSA: compress: Fix stop handling on compressed capture streams ALSA: usb-audio: Add support for new T+A USB DAC ALSA: hda - Serialize codec registrations ALSA: hda/realtek - Fix lose hp_pins for disable auto mute ALSA: hda/realtek - Use a common helper for hp pin reference ALSA: hda/realtek - Headset microphone support for System76 darp5 fuse: call pipe_buf_release() under pipe lock fuse: decrement NR_WRITEBACK_TEMP on the right page fuse: handle zero sized retrieve correctly HID: debug: fix the ring buffer implementation dmaengine: bcm2835: Fix interrupt race on RT dmaengine: bcm2835: Fix abort of transactions dmaengine: imx-dma: fix wrong callback invoke futex: Handle early deadlock return correctly irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID usb: phy: am335x: fix race condition in _probe usb: dwc3: gadget: Handle 0 xfer length for OUT EP usb: gadget: udc: net2272: Fix bitwise and boolean operations usb: gadget: musb: fix short isoc packets with inventra dma staging: speakup: fix tty-operation NULL derefs scsi: cxlflash: Prevent deadlock when adapter probe fails scsi: aic94xx: fix module loading KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222) kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974) KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM perf/x86/intel/uncore: Add Node ID mask x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out() perf/core: Don't WARN() for impossible ring-buffer sizes perf tests evsel-tp-sched: Fix bitwise operator serial: fix race between flush_to_ldisc and tty_open serial: 8250_pci: Make PCI class test non fatal serial: sh-sci: Do not free irqs that have already been freed cacheinfo: Keep the old value if of_property_read_u32 fails IB/hfi1: Add limit test for RC/UC send via loopback perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu() ath9k: dynack: make ewma estimation faster ath9k: dynack: check da->enabled first in sampling routines Linux 4.19.21 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-02-12 20:37:21 +01:00
Mark Rutland	1aeeb17668	perf/core: Don't WARN() for impossible ring-buffer sizes commit `9dff0aa95a` upstream. The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine how large its ringbuffer mmap should be. This can be configured to arbitrary values, which can be larger than the maximum possible allocation from kmalloc. When this is configured to a suitably large value (e.g. thanks to the perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() in __alloc_pages_nodemask(): WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 __alloc_pages_nodemask+0x3f8/0xbc8 Let's avoid this by checking that the requested allocation is possible before calling kzalloc. Reported-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-12 19:47:26 +01:00
Josh Poimboeuf	97a7fa90ea	cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM commit `b284909aba` upstream. With the following commit: `73d5e2b472` ("cpu/hotplug: detect SMT disabled by BIOS") ... the hotplug code attempted to detect when SMT was disabled by BIOS, in which case it reported SMT as permanently disabled. However, that code broke a virt hotplug scenario, where the guest is booted with only primary CPU threads, and a sibling is brought online later. The problem is that there doesn't seem to be a way to reliably distinguish between the HW "SMT disabled by BIOS" case and the virt "sibling not yet brought online" case. So the above-mentioned commit was a bit misguided, as it permanently disabled SMT for both cases, preventing future virt sibling hotplugs. Going back and reviewing the original problems which were attempted to be solved by that commit, when SMT was disabled in BIOS: 1) /sys/devices/system/cpu/smt/control showed "on" instead of "notsupported"; and 2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning. I'd propose that we instead consider #1 above to not actually be a problem. Because, at least in the virt case, it's possible that SMT wasn't disabled by BIOS and a sibling thread could be brought online later. So it makes sense to just always default the smt control to "on" to allow for that possibility (assuming cpuid indicates that the CPU supports SMT). The real problem is #2, which has a simple fix: change vmx_vm_init() to query the actual current SMT state -- i.e., whether any siblings are currently online -- instead of looking at the SMT "control" sysfs value. So fix it by: a) reverting the original "fix" and its followup fix: `73d5e2b472` ("cpu/hotplug: detect SMT disabled by BIOS") `bc2d8d262c` ("cpu/hotplug: Fix SMT supported evaluation") and b) changing vmx_vm_init() to query the actual current SMT state -- instead of the sysfs control value -- to determine whether the L1TF warning is needed. This also requires the 'sched_smt_present' variable to exported, instead of 'cpu_smt_control'. Fixes: `73d5e2b472` ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joe Mario <jmario@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-12 19:47:25 +01:00
Thomas Gleixner	ee73954d9a	futex: Handle early deadlock return correctly commit `1a1fb985f2` upstream. commit `56222b212e` ("futex: Drop hb->lock before enqueueing on the rtmutex") changed the locking rules in the futex code so that the hash bucket lock is not longer held while the waiter is enqueued into the rtmutex wait list. This made the lock and the unlock path symmetric, but unfortunately the possible early exit from __rt_mutex_proxy_start() due to a detected deadlock was not updated accordingly. That allows a concurrent unlocker to observe inconsitent state which triggers the warning in the unlock path. futex_lock_pi() futex_unlock_pi() lock(hb->lock) queue(hb_waiter) lock(hb->lock) lock(rtmutex->wait_lock) unlock(hb->lock) // acquired hb->lock hb_waiter = futex_top_waiter() lock(rtmutex->wait_lock) __rt_mutex_proxy_start() ---> fail remove(rtmutex_waiter); ---> returns -EDEADLOCK unlock(rtmutex->wait_lock) // acquired wait_lock wake_futex_pi() rt_mutex_next_owner() --> returns NULL --> WARN lock(hb->lock) unqueue(hb_waiter) The problem is caused by the remove(rtmutex_waiter) in the failure case of __rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the hash bucket but no waiter on the rtmutex, i.e. inconsistent state. The original commit handles this correctly for the other early return cases (timeout, signal) by delaying the removal of the rtmutex waiter until the returning task reacquired the hash bucket lock. Treat the failure case of __rt_mutex_proxy_start() in the same way and let the existing cleanup code handle the eventual handover of the rtmutex gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter removal for the failure case, so that the other callsites are still operating correctly. Add proper comments to the code so all these details are fully documented. Thanks to Peter for helping with the analysis and writing the really valuable code comments. Fixes: `56222b212e` ("futex: Drop hb->lock before enqueueing on the rtmutex") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Stefan Liebler <stli@linux.ibm.com> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-12 19:47:24 +01:00
Anders Roxell	58e57bcbc1	kernel/kcov.c: mark write_comp_data() as notrace [ Upstream commit `6347244316` ] Since __sanitizer_cov_trace_const_cmp4 is marked as notrace, the function called from __sanitizer_cov_trace_const_cmp4 shouldn't be traceable either. ftrace_graph_caller() gets called every time func write_comp_data() gets called if it isn't marked 'notrace'. This is the backtrace from gdb: #0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179 #1 0xffffff8010201920 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151 #2 0xffffff8010439714 in write_comp_data (type=5, arg1=0, arg2=0, ip=18446743524224276596) at ../kernel/kcov.c:116 #3 0xffffff8010439894 in __sanitizer_cov_trace_const_cmp4 (arg1=<optimized out>, arg2=<optimized out>) at ../kernel/kcov.c:188 #4 0xffffff8010201874 in prepare_ftrace_return (self_addr=18446743524226602768, parent=0xffffff801014b918, frame_pointer=18446743524223531344) at ./include/generated/atomic-instrumented.h:27 #5 0xffffff801020194c in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182 Rework so that write_comp_data() that are called from __sanitizer_cov_trace__cmp() are marked as 'notrace'. Commit `903e8ff867` ("kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace") missed to mark write_comp_data() as 'notrace'. When that patch was created gcc-7 was used. In lib/Kconfig.debug config KCOV_ENABLE_COMPARISONS depends on $(cc-option,-fsanitize-coverage=trace-cmp) That code path isn't hit with gcc-7. However, it were that with gcc-8. Link: http://lkml.kernel.org/r/20181206143011.23719-1-anders.roxell@linaro.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Co-developed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:47:20 +01:00
Liu, Chuansheng	f0d32c54ff	kernel/hung_task.c: force console verbose before panic [ Upstream commit `168e06f793` ] Based on commit `401c636a0e` ("kernel/hung_task.c: show all hung tasks before panic"), we could get the call stack of hung task. However, if the console loglevel is not high, we still can not see the useful panic information in practice, and in most cases users don't set console loglevel to high level. This patch is to force console verbose before system panic, so that the real useful information can be seen in the console, instead of being like the following, which doesn't have hung task information. INFO: task init:1 blocked for more than 120 seconds. Tainted: G U W 4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Kernel panic - not syncing: hung_task: blocked tasks CPU: 2 PID: 479 Comm: khungtaskd Tainted: G U W 4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1 Call Trace: dump_stack+0x4f/0x65 panic+0xde/0x231 watchdog+0x290/0x410 kthread+0x12c/0x150 ret_from_fork+0x35/0x40 reboot: panic mode set: p,w Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A6015B675@SHSMSX101.ccr.corp.intel.com Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:47:19 +01:00
Cheng Lin	9beb84c027	proc/sysctl: fix return error for proc_doulongvec_minmax() [ Upstream commit `09be178400` ] If the number of input parameters is less than the total parameters, an EINVAL error will be returned. For example, we use proc_doulongvec_minmax to pass up to two parameters with kern_table: { .procname = "monitor_signals", .data = &monitor_sigs, .maxlen = 2sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, Reproduce: When passing two parameters, it's work normal. But passing only one parameter, an error "Invalid argument"(EINVAL) is returned. [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 1 2 [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals -bash: echo: write error: Invalid argument [root@cl150 ~]# echo $? 1 [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 3 2 [root@cl150 ~]# The following is the result after apply this patch. No error is returned when the number of input parameters is less than the total parameters. [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 1 2 [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals [root@cl150 ~]# echo $? 0 [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals 3 2 [root@cl150 ~]# There are three processing functions dealing with digital parameters, __do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax. This patch deals with __do_proc_doulongvec_minmax, just as __do_proc_dointvec does, adding a check for parameters 'left'. In __do_proc_douintvec, its code implementation explicitly does not support multiple inputs. static int __do_proc_douintvec(...){ ... / * Arrays are not supported, keep this simple. Do not add * support for them. / if (vleft != 1) { lenp = 0; return -EINVAL; } ... } So, just __do_proc_doulongvec_minmax has the problem. And most use of proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one parameter. Link: http://lkml.kernel.org/r/1544081775-15720-1-git-send-email-cheng.lin130@zte.com.cn Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:47:19 +01:00
Tetsuo Handa	9c8939b03b	kernel/hung_task.c: break RCU locks based on jiffies [ Upstream commit `304ae42739` ] check_hung_uninterruptible_tasks() is currently calling rcu_lock_break() for every 1024 threads. But check_hung_task() is very slow if printk() was called, and is very fast otherwise. If many threads within some 1024 threads called printk(), the RCU grace period might be extended enough to trigger RCU stall warnings. Therefore, calling rcu_lock_break() for every some fixed jiffies will be safer. Link: http://lkml.kernel.org/r/1544800658-11423-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:47:19 +01:00
Douglas Anderson	3818c29a65	kdb: Don't back trace on a cpu that didn't round up [ Upstream commit `162bc7f5af` ] If you have a CPU that fails to round up and then run 'btc' you'll end up crashing in kdb becaue we dereferenced NULL. Let's add a check. It's wise to also set the task to NULL when leaving the debugger so that if we fail to round up on a later entry into the debugger we won't backtrace a stale task. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:47:19 +01:00
Ondrej Mosnacek	4b5abffd63	cgroup: fix parsing empty mount option string [ Upstream commit `e250d91d65` ] This fixes the case where all mount options specified are consumed by an LSM and all that's left is an empty string. In this case cgroupfs should accept the string and not fail. How to reproduce (with SELinux enabled): # umount /sys/fs/cgroup/unified # mount -o context=system_u:object_r:cgroup_t:s0 -t cgroup2 cgroup2 /sys/fs/cgroup/unified mount: /sys/fs/cgroup/unified: wrong fs type, bad option, bad superblock on cgroup2, missing codepage or helper program, or other error. # dmesg \| tail -n 1 [ 31.575952] cgroup: cgroup2: unknown option "" Fixes: `67e9c74b8a` ("cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type") [NOTE: should apply on top of commit `5136f6365c` ("cgroup: implement "nsdelegate" mount option"), older versions need manual rebase] Suggested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:47:17 +01:00
Peter Rajnoha	f7debeebcd	kobject: return error code if writing /sys/.../uevent fails [ Upstream commit `df44b47965` ] Propagate error code back to userspace if writing the /sys/.../uevent file fails. Before, the write operation always returned with success, even if we failed to recognize the input string or if we failed to generate the uevent itself. With the error codes properly propagated back to userspace, we are able to react in userspace accordingly by not assuming and awaiting a uevent that is not delivered. Signed-off-by: Peter Rajnoha <prajnoha@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:47:06 +01:00
Bart Van Assche	0105d80dd1	timekeeping: Use proper seqcount initializer [ Upstream commit `ce10a5b395` ] tk_core.seq is initialized open coded, but that misses to initialize the lockdep map when lockdep is enabled. Lockdep splats involving tk_core seq consequently lack a name and are hard to read. Use the proper initializer which takes care of the lockdep map initialization. [ tglx: Massaged changelog ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: tj@kernel.org Cc: johannes.berg@intel.com Link: https://lkml.kernel.org/r/20181128234325.110011-12-bvanassche@acm.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:47:05 +01:00
Long Li	46ed4f4fa1	genirq/affinity: Spread IRQs to all available NUMA nodes [ Upstream commit `b825921990` ] If the number of NUMA nodes exceeds the number of MSI/MSI-X interrupts which are allocated for a device, the interrupt affinity spreading code fails to spread them across all nodes. The reason is, that the spreading code starts from node 0 and continues up to the number of interrupts requested for allocation. This leaves the nodes past the last interrupt unused. This results in interrupt concentration on the first nodes which violates the assumption of the block layer that all nodes are covered evenly. As a consequence the NUMA nodes above the number of interrupts are all assigned to hardware queue 0 and therefore NUMA node 0, which results in bad performance and has CPU hotplug implications, because queue 0 gets shut down when the last CPU of node 0 is offlined. Go over all NUMA nodes and assign them round-robin to all requested interrupts to solve this. [ tglx: Massaged changelog ] Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Michael Kelley <mikelley@microsoft.com> Link: https://lkml.kernel.org/r/20181102180248.13583-1-longli@linuxonhyperv.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-02-12 19:46:57 +01:00
Greg Kroah-Hartman	c28f73fe42	Merge 4.19.20 into android-4.19 Changes in 4.19.20 Fix "net: ipv4: do not handle duplicate fragments as overlapping" drm/msm/gpu: fix building without debugfs ipv6: Consider sk_bound_dev_if when binding a socket to an address ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation ipvlan, l3mdev: fix broken l3s mode wrt local routes l2tp: copy 4 more bytes to linear part if necessary l2tp: fix reading optional fields of L2TPv3 net: ip_gre: always reports o_key to userspace net: ip_gre: use erspan key field for tunnel lookup net/mlx4_core: Add masking for a few queries on HCA caps netrom: switch to sock timer API net/rose: fix NULL ax25_cb kernel panic net: set default network namespace in init_dummy_netdev() ravb: expand rx descriptor data to accommodate hw checksum sctp: improve the events for sctp stream reset tun: move the call to tun_set_real_num_queues ucc_geth: Reset BQL queue when stopping device vhost: fix OOB in get_rx_bufs() net: ip6_gre: always reports o_key to userspace sctp: improve the events for sctp stream adding net/mlx5e: Allow MAC invalidation while spoofchk is ON ip6mr: Fix notifiers call on mroute_clean_tables() Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager" sctp: set chunk transport correctly when it's a new asoc sctp: set flow sport from saddr only when it's 0 virtio_net: Don't enable NAPI when interface is down virtio_net: Don't call free_old_xmit_skbs for xdp_frames virtio_net: Fix not restoring real_num_rx_queues virtio_net: Fix out of bounds access of sq virtio_net: Don't process redirected XDP frames when XDP is disabled virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs virtio_net: Differentiate sk_buff and xdp_frame on freeing CIFS: Do not count -ENODATA as failure for query directory CIFS: Fix trace command logging for SMB2 reads and writes CIFS: Do not consider -ENODATA as stat failure for reads fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions() selftests/seccomp: Enhance per-arch ptrace syscall skip tests NFS: Fix up return value on fatal errors in nfs_page_async_flush() ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment arm64: kaslr: ensure randomized quantities are clean also when kaslr is off arm64: Do not issue IPIs for user executable ptes arm64: hyp-stub: Forbid kprobing of the hyp-stub arm64: hibernate: Clean the __hyp_text to PoC after resume gpio: altera-a10sr: Set proper output level for direction_output gpiolib: fix line event timestamps for nested irqs gpio: pcf857x: Fix interrupts on multiple instances gpio: sprd: Fix the incorrect data register gpio: sprd: Fix incorrect irq type setting for the async EIC gfs2: Revert "Fix loop in gfs2_rbm_find" mmc: bcm2835: Fix DMA channel leak on probe error mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay ALSA: usb-audio: Add Opus #3 to quirks for native DSD support ALSA: hda/realtek - Fixed hp_pin no value IB/hfi1: Remove overly conservative VM_EXEC flag check platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes mmc: sdhci-iproc: handle mmc_of_parse() errors during probe Btrfs: fix deadlock when allocating tree block during leaf/node split btrfs: On error always free subvol_name in btrfs_mount kernel/exit.c: release ptraced tasks before zap_pid_ns_processes mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT oom, oom_reaper: do not enqueue same task twice mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages mm, oom: fix use-after-free in oom_kill_process mm: hwpoison: use do_send_sig_info() instead of force_sig() mm: migrate: don't rely on __PageMovable() of newpage after unlocking it of: Convert to using %pOFn instead of device_node.name of: overlay: add tests to validate kfrees from overlay removal of: overlay: add missing of_node_get() in __of_attach_node_sysfs of: overlay: use prop add changeset entry for property in new nodes of: overlay: do not duplicate properties from overlay for new nodes md/raid5: fix 'out of memory' during raid cache recovery cifs: Always resolve hostname before reconnecting Linux 4.19.20 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-02-07 08:40:17 +01:00
Andrei Vagin	c7122344f9	kernel/exit.c: release ptraced tasks before zap_pid_ns_processes commit `8fb335e078` upstream. Currently, exit_ptrace() adds all ptraced tasks in a dead list, then zap_pid_ns_processes() waits on all tasks in a current pidns, and only then are tasks from the dead list released. zap_pid_ns_processes() can get stuck on waiting tasks from the dead list. In this case, we will have one unkillable process with one or more dead children. Thanks to Oleg for the advice to release tasks in find_child_reaper(). Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com Fixes: `7c8bd2322c` ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()") Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:14 +01:00
Greg Kroah-Hartman	18ba00a34e	Merge 4.19.19 into android-4.19 Changes in 4.19.19 amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs net: bridge: Fix ethernet header pointer before check skb forwardable net: Fix usage of pskb_trim_rcsum net: phy: marvell: Errata for mv88e6390 internal PHYs net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling net/sched: act_tunnel_key: fix memory leak in case of action replace net_sched: refetch skb protocol for each filter openvswitch: Avoid OOB read when parsing flow nlattrs vhost: log dirty page correctly mlxsw: pci: Increase PCI SW reset timeout net: ipv4: Fix memory leak in network namespace dismantle mlxsw: spectrum_fid: Update dummy FID index mlxsw: pci: Ring CQ's doorbell before RDQ's net/sched: cls_flower: allocate mask dynamically in fl_change() udp: with udp_segment release on error path ip6_gre: fix tunnel list corruption for x-netns erspan: build the header with the right proto according to erspan_ver net: phy: marvell: Fix deadlock from wrong locking ip6_gre: update version related info when changing link tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state mei: me: mark LBG devices as having dma support mei: me: add denverton innovation engine device IDs USB: leds: fix regression in usbport led trigger USB: serial: simple: add Motorola Tetra TPG2200 device id USB: serial: pl2303: add new PID to support PL2303TB ceph: clear inode pointer when snap realm gets dropped by its inode ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages ASoC: rt5514-spi: Fix potential NULL pointer dereference ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode clk: socfpga: stratix10: fix rate calculation for pll clocks clk: socfpga: stratix10: fix naming convention for the fixed-clocks inotify: Fix fd refcount leak in inotify_add_watch(). ALSA: hda/realtek - Fix typo for ALC225 model ALSA: hda - Add mute LED support for HP ProBook 470 G5 ARCv2: lib: memeset: fix doing prefetchw outside of buffer ARC: adjust memblock_reserve of kernel memory ARC: perf: map generic branches to correct hardware condition s390/mm: always force a load of the primary ASCE on context switch s390/early: improve machine detection s390/smp: fix CPU hotplug deadlock with CPU rescan misc: ibmvsm: Fix potential NULL pointer dereference char/mwave: fix potential Spectre v1 vulnerability mmc: dw_mmc-bluefield: : Fix the license information mmc: meson-gx: Free irq in release() callback staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1 tty: Handle problem if line discipline does not have receive_buf uart: Fix crash in uart_write and uart_put_char tty/n_hdlc: fix __might_sleep warning hv_balloon: avoid touching uninitialized struct page during tail onlining Drivers: hv: vmbus: Check for ring when getting debug info vgacon: unconfuse vc_origin when using soft scrollback CIFS: Fix possible hang during async MTU reads and writes CIFS: Fix credits calculations for reads with errors CIFS: Fix credit calculation for encrypted reads with errors CIFS: Do not reconnect TCP session in add_credits() smb3: add credits we receive from oplock/break PDUs Input: xpad - add support for SteelSeries Stratus Duo Input: input_event - provide override for sparc64 Input: uinput - fix undefined behavior in uinput_validate_absinfo() acpi/nfit: Block function zero DSMs acpi/nfit: Fix command-supported detection scsi: ufs: Use explicit access size in ufshcd_dump_regs dm thin: fix passdown_double_checking_shared_status() dm crypt: fix parsing of extended IV arguments drm/amdgpu: Add APTX quirk for Lenovo laptop KVM: x86: Fix single-step debugging KVM: x86: Fix PV IPIs for 32-bit KVM host KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error kvm: x86/vmx: Use kzalloc for cached_vmcs12 KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned x86/pkeys: Properly copy pkey state at fork() x86/selftests/pkeys: Fork() to check for state being preserved x86/kaslr: Fix incorrect i8254 outb() parameters x86/entry/64/compat: Fix stack switching for XEN PV posix-cpu-timers: Unbreak timer rearming net: sun: cassini: Cleanup license conflict irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it can: bcm: check timer values before ktime conversion can: flexcan: fix NULL pointer exception during bringup vt: make vt_console_print() compatible with the unicode screen buffer vt: always call notifier with the console lock held vt: invoke notifier on screen size change drm/meson: Fix atomic mode switching regression bpf: improve verifier branch analysis bpf: add per-insn complexity limit bpf: move {prev_,}insn_idx into verifier env bpf: move tmp variable into ax register in interpreter bpf: enable access to ax register also from verifier rewrite bpf: restrict map value pointer arithmetic for unprivileged bpf: restrict stack pointer arithmetic for unprivileged bpf: restrict unknown scalars of mixed signed bounds for unprivileged bpf: fix check_map_access smin_value test when pointer contains offset bpf: prevent out of bounds speculation on pointer arithmetic bpf: fix sanitation of alu op with pointer / scalar type from different paths bpf: fix inner map masking to prevent oob under speculation s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU nvmet-rdma: Add unlikely for response allocated check nvmet-rdma: fix null dereference under heavy load Revert "mm, memory_hotplug: initialize struct pages for the full memory section" usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup ide: fix a typo in the settings proc file name Input: input_event - fix the CONFIG_SPARC64 mixup Linux 4.19.19 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-01-31 08:29:40 +01:00
Daniel Borkmann	37c9e3ee42	bpf: fix inner map masking to prevent oob under speculation [ commit `9d5564ddcf` upstream ] During review I noticed that inner meta map setup for map in map is buggy in that it does not propagate all needed data from the reference map which the verifier is later accessing. In particular one such case is index masking to prevent out of bounds access under speculative execution due to missing the map's unpriv_array/index_mask field propagation. Fix this such that the verifier is generating the correct code for inlined lookups in case of unpriviledged use. Before patch (test_verifier's 'map in map access' dump): # bpftool prog dump xla id 3 0: (62) (u32 )(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:4] 5: (07) r1 += 272 \| 6: (61) r0 = (u32 )(r2 +0) \| 7: (35) if r0 >= 0x1 goto pc+6 \| Inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 \| with index masking for 9: (67) r0 <<= 3 \| map->unpriv_array. 10: (0f) r0 += r1 \| 11: (79) r0 = (u64 )(r0 +0) \| 12: (15) if r0 == 0x0 goto pc+1 \| 13: (05) goto pc+1 \| 14: (b7) r0 = 0 \| 15: (15) if r0 == 0x0 goto pc+11 16: (62) (u32 )(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 \| 21: (61) r0 = (u32 )(r2 +0) \| Index masking missing (!) 22: (35) if r0 >= 0x1 goto pc+3 \| for inner map despite 23: (67) r0 <<= 3 \| map->unpriv_array set. 24: (0f) r0 += r1 \| 25: (05) goto pc+1 \| 26: (b7) r0 = 0 \| 27: (b7) r0 = 0 28: (95) exit After patch: # bpftool prog dump xla id 1 0: (62) (u32 )(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:2] 5: (07) r1 += 272 \| 6: (61) r0 = (u32 )(r2 +0) \| 7: (35) if r0 >= 0x1 goto pc+6 \| Same inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 \| with index masking due to 9: (67) r0 <<= 3 \| map->unpriv_array. 10: (0f) r0 += r1 \| 11: (79) r0 = (u64 )(r0 +0) \| 12: (15) if r0 == 0x0 goto pc+1 \| 13: (05) goto pc+1 \| 14: (b7) r0 = 0 \| 15: (15) if r0 == 0x0 goto pc+12 16: (62) (u32 )(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 \| 21: (61) r0 = (u32 )(r2 +0) \| 22: (35) if r0 >= 0x1 goto pc+4 \| Now fixed inlined inner map 23: (54) (u32) r0 &= (u32) 0 \| lookup with proper index masking 24: (67) r0 <<= 3 \| for map->unpriv_array. 25: (0f) r0 += r1 \| 26: (05) goto pc+1 \| 27: (b7) r0 = 0 \| 28: (b7) r0 = 0 29: (95) exit Fixes: `b2157399cc` ("bpf: prevent out-of-bounds speculation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-31 08:14:41 +01:00
Daniel Borkmann	eed84f94ff	bpf: fix sanitation of alu op with pointer / scalar type from different paths [ commit `d3bd7413e0` upstream ] While `979d63d50c` ("bpf: prevent out of bounds speculation on pointer arithmetic") took care of rejecting alu op on pointer when e.g. pointer came from two different map values with different map properties such as value size, Jann reported that a case was not covered yet when a given alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from different branches where we would incorrectly try to sanitize based on the pointer's limit. Catch this corner case and reject the program instead. Fixes: `979d63d50c` ("bpf: prevent out of bounds speculation on pointer arithmetic") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-31 08:14:41 +01:00
Daniel Borkmann	f92a819b4c	bpf: prevent out of bounds speculation on pointer arithmetic [ commit `979d63d50c` upstream ] Jann reported that the original commit back in `b2157399cc` ("bpf: prevent out-of-bounds speculation") was not sufficient to stop CPU from speculating out of bounds memory access: While `b2157399cc` only focussed on masking array map access for unprivileged users for tail calls and data access such that the user provided index gets sanitized from BPF program and syscall side, there is still a more generic form affected from BPF programs that applies to most maps that hold user data in relation to dynamic map access when dealing with unknown scalars or "slow" known scalars as access offset, for example: - Load a map value pointer into R6 - Load an index into R7 - Do a slow computation (e.g. with a memory dependency) that loads a limit into R8 (e.g. load the limit from a map for high latency, then mask it to make the verifier happy) - Exit if R7 >= R8 (mispredicted branch) - Load R0 = R6[R7] - Load R0 = R6[R0] For unknown scalars there are two options in the BPF verifier where we could derive knowledge from in order to guarantee safe access to the memory: i) While </>/<=/>= variants won't allow to derive any lower or upper bounds from the unknown scalar where it would be safe to add it to the map value pointer, it is possible through ==/!= test however. ii) another option is to transform the unknown scalar into a known scalar, for example, through ALU ops combination such as R &= <imm> followed by R \|= <imm> or any similar combination where the original information from the unknown scalar would be destroyed entirely leaving R with a constant. The initial slow load still precedes the latter ALU ops on that register, so the CPU executes speculatively from that point. Once we have the known scalar, any compare operation would work then. A third option only involving registers with known scalars could be crafted as described in [0] where a CPU port (e.g. Slow Int unit) would be filled with many dependent computations such that the subsequent condition depending on its outcome has to wait for evaluation on its execution port and thereby executing speculatively if the speculated code can be scheduled on a different execution port, or any other form of mistraining as described in [1], for example. Given this is not limited to only unknown scalars, not only map but also stack access is affected since both is accessible for unprivileged users and could potentially be used for out of bounds access under speculation. In order to prevent any of these cases, the verifier is now sanitizing pointer arithmetic on the offset such that any out of bounds speculation would be masked in a way where the pointer arithmetic result in the destination register will stay unchanged, meaning offset masked into zero similar as in array_index_nospec() case. With regards to implementation, there are three options that were considered: i) new insn for sanitation, ii) push/pop insn and sanitation as inlined BPF, iii) reuse of ax register and sanitation as inlined BPF. Option i) has the downside that we end up using from reserved bits in the opcode space, but also that we would require each JIT to emit masking as native arch opcodes meaning mitigation would have slow adoption till everyone implements it eventually which is counter-productive. Option ii) and iii) have both in common that a temporary register is needed in order to implement the sanitation as inlined BPF since we are not allowed to modify the source register. While a push / pop insn in ii) would be useful to have in any case, it requires once again that every JIT needs to implement it first. While possible, amount of changes needed would also be unsuitable for a -stable patch. Therefore, the path which has fewer changes, less BPF instructions for the mitigation and does not require anything to be changed in the JITs is option iii) which this work is pursuing. The ax register is already mapped to a register in all JITs (modulo arm32 where it's mapped to stack as various other BPF registers there) and used in constant blinding for JITs-only so far. It can be reused for verifier rewrites under certain constraints. The interpreter's tmp "register" has therefore been remapped into extending the register set with hidden ax register and reusing that for a number of instructions that needed the prior temporary variable internally (e.g. div, mod). This allows for zero increase in stack space usage in the interpreter, and enables (restricted) generic use in rewrites otherwise as long as such a patchlet does not make use of these instructions. The sanitation mask is dynamic and relative to the offset the map value or stack pointer currently holds. There are various cases that need to be taken under consideration for the masking, e.g. such operation could look as follows: ptr += val or val += ptr or ptr -= val. Thus, the value to be sanitized could reside either in source or in destination register, and the limit is different depending on whether the ALU op is addition or subtraction and depending on the current known and bounded offset. The limit is derived as follows: limit := max_value_size - (smin_value + off). For subtraction: limit := umax_value + off. This holds because we do not allow any pointer arithmetic that would temporarily go out of bounds or would have an unknown value with mixed signed bounds where it is unclear at verification time whether the actual runtime value would be either negative or positive. For example, we have a derived map pointer value with constant offset and bounded one, so limit based on smin_value works because the verifier requires that statically analyzed arithmetic on the pointer must be in bounds, and thus it checks if resulting smin_value + off and umax_value + off is still within map value bounds at time of arithmetic in addition to time of access. Similarly, for the case of stack access we derive the limit as follows: MAX_BPF_STACK + off for subtraction and -off for the case of addition where off := ptr_reg->off + ptr_reg->var_off.value. Subtraction is a special case for the masking which can be in form of ptr += -val, ptr -= -val, or ptr -= val. In the first two cases where we know that the value is negative, we need to temporarily negate the value in order to do the sanitation on a positive value where we later swap the ALU op, and restore original source register if the value was in source. The sanitation of pointer arithmetic alone is still not fully sufficient as is, since a scenario like the following could happen ... PTR += 0x1000 (e.g. K-based imm) PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON PTR += 0x1000 PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON [...] ... which under speculation could end up as ... PTR += 0x1000 PTR -= 0 [ truncated by mitigation ] PTR += 0x1000 PTR -= 0 [ truncated by mitigation ] [...] ... and therefore still access out of bounds. To prevent such case, the verifier is also analyzing safety for potential out of bounds access under speculative execution. Meaning, it is also simulating pointer access under truncation. We therefore "branch off" and push the current verification state after the ALU operation with known 0 to the verification stack for later analysis. Given the current path analysis succeeded it is likely that the one under speculation can be pruned. In any case, it is also subject to existing complexity limits and therefore anything beyond this point will be rejected. In terms of pruning, it needs to be ensured that the verification state from speculative execution simulation must never prune a non-speculative execution path, therefore, we mark verifier state accordingly at the time of push_stack(). If verifier detects out of bounds access under speculative execution from one of the possible paths that includes a truncation, it will reject such program. Given we mask every reg-based pointer arithmetic for unprivileged programs, we've been looking into how it could affect real-world programs in terms of size increase. As the majority of programs are targeted for privileged-only use case, we've unconditionally enabled masking (with its alu restrictions on top of it) for privileged programs for the sake of testing in order to check i) whether they get rejected in its current form, and ii) by how much the number of instructions and size will increase. We've tested this by using Katran, Cilium and test_l4lb from the kernel selftests. For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb we've used test_l4lb.o as well as test_l4lb_noinline.o. We found that none of the programs got rejected by the verifier with this change, and that impact is rather minimal to none. balancer_kern.o had 13,904 bytes (1,738 insns) xlated and 7,797 bytes JITed before and after the change. Most complex program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated and 18,538 bytes JITed before and after and none of the other tail call programs in bpf_lxc.o had any changes either. For the older bpf_lxc_opt_-DUNKNOWN.o object we found a small increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed after the change. Other programs from that object file had similar small increase. Both test_l4lb.o had no change and remained at 6,544 bytes (817 insns) xlated and 3,401 bytes JITed and for test_l4lb_noinline.o constant at 5,080 bytes (634 insns) xlated and 3,313 bytes JITed. This can be explained in that LLVM typically optimizes stack based pointer arithmetic by using K-based operations and that use of dynamic map access is not overly frequent. However, in future we may decide to optimize the algorithm further under known guarantees from branch and value speculation. Latter seems also unclear in terms of prediction heuristics that today's CPUs apply as well as whether there could be collisions in e.g. the predictor's Value History/Pattern Table for triggering out of bounds access, thus masking is performed unconditionally at this point but could be subject to relaxation later on. We were generally also brainstorming various other approaches for mitigation, but the blocker was always lack of available registers at runtime and/or overhead for runtime tracking of limits belonging to a specific pointer. Thus, we found this to be minimally intrusive under given constraints. With that in place, a simple example with sanitized access on unprivileged load at post-verification time looks as follows: # bpftool prog dump xlated id 282 [...] 28: (79) r1 = (u64 )(r7 +0) 29: (79) r2 = (u64 )(r7 +8) 30: (57) r1 &= 15 31: (79) r3 = (u64 )(r0 +4608) 32: (57) r3 &= 1 33: (47) r3 \|= 1 34: (2d) if r2 > r3 goto pc+19 35: (b4) (u32) r11 = (u32) 20479 \| 36: (1f) r11 -= r2 \| Dynamic sanitation for pointer 37: (4f) r11 \|= r2 \| arithmetic with registers 38: (87) r11 = -r11 \| containing bounded or known 39: (c7) r11 s>>= 63 \| scalars in order to prevent 40: (5f) r11 &= r2 \| out of bounds speculation. 41: (0f) r4 += r11 \| 42: (71) r4 = (u8 )(r4 +0) 43: (6f) r4 <<= r1 [...] For the case where the scalar sits in the destination register as opposed to the source register, the following code is emitted for the above example: [...] 16: (b4) (u32) r11 = (u32) 20479 17: (1f) r11 -= r2 18: (4f) r11 \|= r2 19: (87) r11 = -r11 20: (c7) r11 s>>= 63 21: (5f) r2 &= r11 22: (0f) r2 += r0 23: (61) r0 = (u32 )(r2 +0) [...] JIT blinding example with non-conflicting use of r10: [...] d5: je 0x0000000000000106 _ d7: mov 0x0(%rax),%edi \| da: mov $0xf153246,%r10d \| Index load from map value and e0: xor $0xf153259,%r10 \| (const blinded) mask with 0x1f. e7: and %r10,%rdi \|_ ea: mov $0x2f,%r10d \| f0: sub %rdi,%r10 \| Sanitized addition. Both use r10 f3: or %rdi,%r10 \| but do not interfere with each f6: neg %r10 \| other. (Neither do these instructions f9: sar $0x3f,%r10 \| interfere with the use of ax as temp fd: and %r10,%rdi \| in interpreter.) 100: add %rax,%rdi \|_ 103: mov 0x0(%rdi),%eax [...] Tested that it fixes Jann's reproducer, and also checked that test_verifier and test_progs suite with interpreter, JIT and JIT with hardening enabled on x86-64 and arm64 runs successfully. [0] Speculose: Analyzing the Security Implications of Speculative Execution in CPUs, Giorgi Maisuradze and Christian Rossow, https://arxiv.org/pdf/1801.04084.pdf [1] A Systematic Evaluation of Transient Execution Attacks and Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, Daniel Gruss, https://arxiv.org/pdf/1811.05441.pdf Fixes: `b2157399cc` ("bpf: prevent out-of-bounds speculation") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-31 08:14:41 +01:00
Daniel Borkmann	4f7f708d0e	bpf: fix check_map_access smin_value test when pointer contains offset [ commit `b7137c4eab` upstream ] In check_map_access() we probe actual bounds through __check_map_access() with offset of reg->smin_value + off for lower bound and offset of reg->umax_value + off for the upper bound. However, even though the reg->smin_value could have a negative value, the final result of the sum with off could be positive when pointer arithmetic with known and unknown scalars is combined. In this case we reject the program with an error such as "R<x> min value is negative, either use unsigned index or do a if (index >=0) check." even though the access itself would be fine. Therefore extend the check to probe whether the actual resulting reg->smin_value + off is less than zero. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-31 08:14:41 +01:00
Daniel Borkmann	44f8fc6499	bpf: restrict unknown scalars of mixed signed bounds for unprivileged [ commit `9d7eceede7` upstream ] For unknown scalars of mixed signed bounds, meaning their smin_value is negative and their smax_value is positive, we need to reject arithmetic with pointer to map value. For unprivileged the goal is to mask every map pointer arithmetic and this cannot reliably be done when it is unknown at verification time whether the scalar value is negative or positive. Given this is a corner case, the likelihood of breaking should be very small. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-31 08:14:41 +01:00
Daniel Borkmann	5332dda94f	bpf: restrict stack pointer arithmetic for unprivileged [ commit `e4298d2583` upstream ] Restrict stack pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a stack pointer as a destination we simulate a check_stack_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. This is analog to map value pointer arithmetic and needed for masking later on. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-31 08:14:41 +01:00
Daniel Borkmann	9e57b2969d	bpf: restrict map value pointer arithmetic for unprivileged [ commit `0d6303db79` upstream ] Restrict map value pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a map value pointer as a destination it will simulate a check_map_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. We use this later on for masking any pointer arithmetic with the remainder of the map value space. The likelihood of breaking any existing real-world unprivileged eBPF program is very small for this corner case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-31 08:14:40 +01:00
Daniel Borkmann	232ac70dd3	bpf: enable access to ax register also from verifier rewrite [ commit `9b73bfdd08` upstream ] Right now we are using BPF ax register in JIT for constant blinding as well as in interpreter as temporary variable. Verifier will not be able to use it simply because its use will get overridden from the former in bpf_jit_blind_insn(). However, it can be made to work in that blinding will be skipped if there is prior use in either source or destination register on the instruction. Taking constraints of ax into account, the verifier is then open to use it in rewrites under some constraints. Note, ax register already has mappings in every eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-31 08:14:40 +01:00
Daniel Borkmann	b855e31037	bpf: move tmp variable into ax register in interpreter [ commit `144cd91c4c` upstream ] This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run() into the hidden ax register. The latter is currently only used in JITs for constant blinding as a temporary scratch register, meaning the BPF interpreter will never see the use of ax. Therefore it is safe to use it for the cases where tmp has been used earlier. This is needed to later on allow restricted hidden use of ax in both interpreter and JITs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-31 08:14:40 +01:00

1 2 3 4 5 ...

28629 Commits