Commit Graph

841151 Commits

Author SHA1 Message Date
Mika Westerberg
53b22f900c PCI / ACPI: Add _PR0 dependent devices
If otherwise unrelated PCI devices share ACPI power resources turning
them on causes the devices to enter D0uninitialized power state which may
cause problems.

For example in Intel Ice Lake two root ports (RP0 and RP1), Thunderbolt
controller (NHI) and xHCI controller all share power resources as can be
ween in the topology below where power resources are marked with []:

  Host bridge
    |
    +- RP0 ---\
    +- RP1 ---|--+--> [TBT]
    +- NHI --/   |
    |            |
    |            v
    +- xHCI --> [D3C]

In a situation where all devices sharing the power resources are in
D3cold (the power resources are turned off) and for example the
Thunderbolt controller is runtime resumed resulting that the power
resources are turned on. This means that the other devices sharing them
(RP0, RP1 and xHCI) are transitioned into D0uninitialized state. If they
were configured to trigger wake (PME) on a certain event that
configuration gets lost after reset so we would need to re-initialize
them to get the wakeup working as expected again. To do so we would need
to runtime resume all of them to make sure their registers get restored
properly before we can runtime suspend them again.

Since we just added concept of "_PR0 dependent device" we can solve this
by calling the relevant add/remove functions when the PCI device is bind
to its ACPI representation. If it has power resources the PCI device
will be added as dependent device to them and runtime resumed whenever
they are physically turned on. This should make sure PCI core can
reconfigure wakes after the device is transitioned into D0uninitialized.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-27 12:31:57 +02:00
Mika Westerberg
4533771c1e ACPI / PM: Introduce concept of a _PR0 dependent device
If there are shared power resources between otherwise unrelated devices
turning them on causes the other devices sharing them to be powered up
as well. In case of PCI devices go into D0uninitialized state meaning
that if they were configured to trigger wake that configuration is lost
at this point.

For this reason introduce a concept of "_PR0 dependent device" that can
be added to any ACPI device that has power resources. The dependent
device will be included in a list of dependent devices for all power
resources returned by the ACPI device's _PR0 (assuming it has one).
Whenever a power resource having dependent devices is turned physically
on (its _ON method is called) we runtime resume all of them to allow
their driver or in case of PCI the PCI core to re-initialize the device
and its wake configuration.

This adds two functions that can be used to add and remove these
dependent devices. Note the dependent device does not necessary need
share power resources so this functionality can be used to add "software
dependencies" as well if needed.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-27 12:31:57 +02:00
Mika Westerberg
83a16e3f6d PCI / ACPI: Use cached ACPI device state to get PCI device power state
The ACPI power state returned by acpi_device_get_power() may depend on
the configuration of ACPI power resources in the system which may change
any time after acpi_device_get_power() has returned, unless the
reference counters of the ACPI power resources in question are set to
prevent that from happening. Thus it is invalid to use acpi_device_get_power()
in acpi_pci_get_power_state() the way it is done now and the value of
the ->power.state field in the corresponding struct acpi_device objects
(which reflects the ACPI power resources reference counting, among other
things) should be used instead.

As an example where this becomes an issue is Intel Ice Lake where the
Thunderbolt controller (NHI), two PCIe root ports (RP0 and RP1) and xHCI
all share the same power resources. The following picture with power
resources marked with [] shows the topology:

  Host bridge
    |
    +- RP0 ---\
    +- RP1 ---|--+--> [TBT]
    +- NHI --/   |
    |            |
    |            v
    +- xHCI --> [D3C]

Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method
of the devices in question returns either TBT or D3C or both.

Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now
since the TBT power resource is still on when the root ports are runtime
suspended their dev->current_state is set to D3hot. When NHI is runtime
suspended TBT is finally turned off but state of the root ports remain
to be D3hot. Now when the xHCI is runtime suspended D3C gets also turned
off. PCI core thus has power states of these devices cached in their
dev->current_state as follows:

  RP0 -> D3hot
  RP1 -> D3hot
  NHI -> D3cold
  xHCI -> D3cold

If the user now runs lspci for instance, the result is all 1's like in
the below output (00:07.0 is the first root port, RP0):

00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel driver in use: pcieport

In short the hardware state is not in sync with the software state
anymore. The exact same thing happens with the PME polling thread which
ends up bringing the root ports back into D0 after they are runtime
suspended.

For this reason, modify acpi_pci_get_power_state() so that it uses the
ACPI device power state that was cached by the ACPI core. This makes the
PCI device power state match the ACPI device power state regardless of
state of the shared power resources which may still be on at this point.

Link: https://lore.kernel.org/r/20190618161858.77834-2-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-27 12:31:57 +02:00
Rafael J. Wysocki
25bc694a8a Merge back PCI power management material for v5.3. 2019-06-24 10:11:27 +02:00
Mika Westerberg
000dd5316e PCI: Do not poll for PME if the device is in D3cold
PME polling does not take into account that a device that is directly
connected to the host bridge may go into D3cold as well. This leads to a
situation where the PME poll thread reads from a config space of a
device that is in D3cold and gets incorrect information because the
config space is not accessible.

Here is an example from Intel Ice Lake system where two PCIe root ports
are in D3cold (I've instrumented the kernel to log the PMCSR register
contents):

  [   62.971442] pcieport 0000:00:07.1: Check PME status, PMCSR=0xffff
  [   62.971504] pcieport 0000:00:07.0: Check PME status, PMCSR=0xffff

Since 0xffff is interpreted so that PME is pending, the root ports will
be runtime resumed. This repeats over and over again essentially
blocking all runtime power management.

Prevent this from happening by checking whether the device is in D3cold
before its PME status is read.

Fixes: 71a83bd727 ("PCI/PM: add runtime PM support to PCIe port")
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: 3.6+ <stable@vger.kernel.org> # v3.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-18 01:40:41 +02:00
Mika Westerberg
c2bf1fc212 PCI: Add missing link delays required by the PCIe spec
Currently Linux does not follow PCIe spec regarding the required delays
after reset. A concrete example is a Thunderbolt add-in-card that
consists of a PCIe switch and two PCIe endpoints:

  +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
                                  +-01.0-[04-36]-- DS hotplug port
                                  +-02.0-[37]----00.0 xHCI controller
                                  \-04.0-[38-6b]-- DS hotplug port

The root port (1b.0) and the PCIe switch downstream ports are all PCIe
gen3 so they support 8GT/s link speeds.

We wait for the PCIe hierarchy to enter D3cold (runtime):

  pcieport 0000:00:1b.0: power state changed by ACPI to D3cold

When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
PCIe switch is put to reset and its power is re-applied. This means that
we must follow the rules in PCIe 4.0 section 6.6.1.

For the PCIe gen3 ports we are dealing with here, the following applies:

  With a Downstream Port that supports Link speeds greater than 5.0
  GT/s, software must wait a minimum of 100 ms after Link training
  completes before sending a Configuration Request to the device
  immediately below that Port. Software can determine when Link training
  completes by polling the Data Link Layer Link Active bit or by setting
  up an associated interrupt (see Section 6.7.3.3).

Translating this into the above topology we would need to do this (DLLLA
stands for Data Link Layer Link Active):

  pcieport 0000:00:1b.0: wait for 100ms after DLLLA is set before access to 0000:01:00.0
  pcieport 0000:02:00.0: wait for 100ms after DLLLA is set before access to 0000:03:00.0
  pcieport 0000:02:02.0: wait for 100ms after DLLLA is set before access to 0000:37:00.0

I've instrumented the kernel with additional logging so we can see the
actual delays the kernel performs:

  pcieport 0000:00:1b.0: power state changed by ACPI to D0
  pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
  pcieport 0000:00:1b.0: waking up bus
  pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
  pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60)
  ...
  pcieport 0000:00:1b.0: PME# disabled
  pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  ...
  pcieport 0000:01:00.0: PME# disabled
  pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  ...
  pcieport 0000:02:00.0: PME# disabled
  pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  ...
  pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
  pcieport 0000:02:01.0: PME# disabled
  pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  ...
  pcieport 0000:02:02.0: PME# disabled
  pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  ...
  pcieport 0000:02:04.0: PME# disabled
  pcieport 0000:02:01.0: PME# enabled
  pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
  pcieport 0000:02:04.0: PME# enabled
  pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
  thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000)
  ...
  thunderbolt 0000:03:00.0: PME# disabled
  xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000)
  ...
  xhci_hcd 0000:37:00.0: PME# disabled

For the switch upstream port (01:00.0) we wait for 100ms but not taking
into account the DLLLA requirement. We then wait 10ms for D3hot -> D0
transition of the root port and the two downstream hotplug ports. This
means that we deviate from what the spec requires.

Performing the same check for system sleep (s2idle) transitions we can
see following when resuming from s2idle:

  pcieport 0000:00:1b.0: power state changed by ACPI to D0
  pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60)
  ...
  pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  ...
  pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  pcieport 0000:02:02.0: restoring config space at offset 0x2c (was 0x0, writing 0x0)
  pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  pcieport 0000:02:02.0: restoring config space at offset 0x28 (was 0x0, writing 0x0)
  pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff)
  pcieport 0000:02:02.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1)
  pcieport 0000:02:01.0: restoring config space at offset 0x2c (was 0x0, writing 0x60)
  pcieport 0000:02:02.0: restoring config space at offset 0x20 (was 0x0, writing 0x73f073f0)
  pcieport 0000:02:04.0: restoring config space at offset 0x2c (was 0x0, writing 0x60)
  pcieport 0000:02:01.0: restoring config space at offset 0x28 (was 0x0, writing 0x60)
  pcieport 0000:02:00.0: restoring config space at offset 0x2c (was 0x0, writing 0x0)
  pcieport 0000:02:02.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1)
  pcieport 0000:02:04.0: restoring config space at offset 0x28 (was 0x0, writing 0x60)
  pcieport 0000:02:01.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1ff10001)
  pcieport 0000:02:00.0: restoring config space at offset 0x28 (was 0x0, writing 0x0)
  pcieport 0000:02:02.0: restoring config space at offset 0x18 (was 0x0, writing 0x373702)
  pcieport 0000:02:04.0: restoring config space at offset 0x24 (was 0x10001, writing 0x49f12001)
  pcieport 0000:02:01.0: restoring config space at offset 0x20 (was 0x0, writing 0x73e05c00)
  pcieport 0000:02:00.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1)
  pcieport 0000:02:04.0: restoring config space at offset 0x20 (was 0x0, writing 0x89f07400)
  pcieport 0000:02:01.0: restoring config space at offset 0x1c (was 0x101, writing 0x5151)
  pcieport 0000:02:00.0: restoring config space at offset 0x20 (was 0x0, writing 0x8a008a00)
  pcieport 0000:02:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
  pcieport 0000:02:04.0: restoring config space at offset 0x1c (was 0x101, writing 0x6161)
  pcieport 0000:02:01.0: restoring config space at offset 0x18 (was 0x0, writing 0x360402)
  pcieport 0000:02:00.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1)
  pcieport 0000:02:04.0: restoring config space at offset 0x18 (was 0x0, writing 0x6b3802)
  pcieport 0000:02:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
  pcieport 0000:02:00.0: restoring config space at offset 0x18 (was 0x0, writing 0x30302)
  pcieport 0000:02:01.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
  pcieport 0000:02:04.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
  pcieport 0000:02:00.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
  pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
  pcieport 0000:02:04.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
  pcieport 0000:02:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
  xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000)
  ...
  thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000)

This is even worse. None of the mandatory delays are performed. If this
would be S3 instead of s2idle then according to PCI FW spec 3.2 section
4.6.8.  there is a specific _DSM that allows the OS to skip the delays
but this platform does not provide the _DSM and does not go to S3 anyway
so no firmware is involved that could already handle these delays.

In this particular Intel Coffee Lake platform these delays are not
actually needed because there is an additional delay as part of the ACPI
power resource that is used to turn on power to the hierarchy but since
that additional delay is not required by any of standards (PCIe, ACPI)
it is not present in the Intel Ice Lake, for example where missing the
mandatory delays causes pciehp to start tearing down the stack too early
(links are not yet trained).

For this reason, change the PCIe portdrv PM resume hooks so that they
perform the mandatory delays before the downstream component gets
resumed. We perform the delays before port services are resumed because
otherwise pciehp might find that the link is not up (even if it is just
training) and tears-down the hierarchy.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-18 01:40:41 +02:00
Rafael J. Wysocki
0c7376ada9 PCI: PM: Replace pci_dev_keep_suspended() with two functions
The code in pci_dev_keep_suspended() is relatively hard to follow due
to the negative checks in it and in its callers and the function has
a possible side-effect (disabling the PME) which doesn't really match
its role.

For this reason, move the PME disabling from pci_dev_keep_suspended()
to a separate function and change the semantics (and name) of the
rest of it, so that 'true' is returned when the device needs to be
resumed (and not the other way around).  Change the callers of
pci_dev_keep_suspended() accordingly.

While at it, make the code flow in pci_pm_poweroff() reflect the
pci_pm_suspend() more closely to avoid arbitrary differences between
them.

This is a cosmetic change with no intention to alter behavior.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-06-17 12:30:24 +02:00
Rafael J. Wysocki
234f223d63 PCI: PM: Avoid resuming devices in D3hot during system suspend
The current code resumes devices in D3hot during system suspend if
the target power state for them is D3cold, but that is not necessary
in general.  It only is necessary to do that if the platform firmware
requires the device to be resumed, but that should be covered by
the platform_pci_need_resume() check anyway, so rework
pci_dev_keep_suspended() to avoid returning 'false' for devices
in D3hot which need not be resumed due to platform firmware
requirements.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-06-17 12:30:24 +02:00
Linus Torvalds
9e0babf2c0 Linux 5.2-rc5 2019-06-16 08:49:45 -10:00
Linus Torvalds
963172d9c7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "The accumulated fixes from this and last week:

   - Fix vmalloc TLB flush and map range calculations which lead to
     stale TLBs, spurious faults and other hard to diagnose issues.

   - Use fault_in_pages_writable() for prefaulting the user stack in the
     FPU code as it's less fragile than the current solution

   - Use the PF_KTHREAD flag when checking for a kernel thread instead
     of current->mm as the latter can give the wrong answer due to
     use_mm()

   - Compute the vmemmap size correctly for KASLR and 5-Level paging.
     Otherwise this can end up with a way too small vmemmap area.

   - Make KASAN and 5-level paging work again by making sure that all
     invalid bits are masked out when computing the P4D offset. This
     worked before but got broken recently when the LDT remap area was
     moved.

   - Prevent a NULL pointer dereference in the resource control code
     which can be triggered with certain mount options when the
     requested resource is not available.

   - Enforce ordering of microcode loading vs. perf initialization on
     secondary CPUs. Otherwise perf tries to access a non-existing MSR
     as the boot CPU marked it as available.

   - Don't stop the resource control group walk early otherwise the
     control bitmaps are not updated correctly and become inconsistent.

   - Unbreak kgdb by returning 0 on success from
     kgdb_arch_set_breakpoint() instead of an error code.

   - Add more Icelake CPU model defines so depending changes can be
     queued in other trees"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback
  x86/kasan: Fix boot with 5-level paging and KASAN
  x86/fpu: Don't use current->mm to check for a kthread
  x86/kgdb: Return 0 from kgdb_arch_set_breakpoint()
  x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled
  x86/resctrl: Don't stop walking closids when a locksetup group is found
  x86/fpu: Update kernel's FPU state before using for the fsave header
  x86/mm/KASLR: Compute the size of the vmemmap section properly
  x86/fpu: Use fault_in_pages_writeable() for pre-faulting
  x86/CPU: Add more Icelake model numbers
  mm/vmalloc: Avoid rare case of flushing TLB with weird arguments
  mm/vmalloc: Fix calculation of direct map addr range
2019-06-16 07:28:14 -10:00
Linus Torvalds
efba92d58f Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "A set of small fixes:

   - Repair the ktime_get_coarse() functions so they actually deliver
     what they are supposed to: tick granular time stamps. The current
     code missed to add the accumulated nanoseconds part of the
     timekeeper so the resulting granularity was 1 second.

   - Prevent the tracer from infinitely recursing into time getter
     functions in the arm architectured timer by marking these functions
     notrace

   - Fix a trivial compiler warning caused by wrong qualifier ordering"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Repair ktime_get_coarse*() granularity
  clocksource/drivers/arm_arch_timer: Don't trace count reader functions
  clocksource/drivers/timer-ti-dm: Change to new style declaration
2019-06-16 07:22:56 -10:00
Linus Torvalds
f763cf8e47 Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fixes from Thomas Gleixner:
 "Two small fixes for RAS:

   - Use a proper search algorithm to find the correct element in the
     CEC array. The replacement was a better choice than fixing the
     crash causes by the original search function with horrible duct
     tape.

   - Move the timer based decay function into thread context so it can
     actually acquire the mutex which protects the CEC array to prevent
     corruption"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  RAS/CEC: Convert the timer callback to a workqueue
  RAS/CEC: Fix binary search function
2019-06-16 07:19:15 -10:00
Linus Torvalds
e01e060fe0 Merge tag 'platform-drivers-x86-v5.2-3' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:

 - fix a couple of Mellanox driver enumeration issues

 - fix ASUS laptop regression with backlight

 - fix Dell computers that got a wrong mode (tablet versus laptop) after
   resume

* tag 'platform-drivers-x86-v5.2-3' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/mellanox: mlxreg-hotplug: Add devm_free_irq call to remove flow
  platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registration
  platform/x86: intel-vbtn: Report switch events when event wakes device
  platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
2019-06-15 07:38:54 -10:00
Linus Torvalds
ff39074b1d Merge tag 'usb-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are some small USB driver fixes for 5.2-rc5

  Nothing major, just some small gadget fixes, usb-serial new device
  ids, a few new quirks, and some small fixes for some regressions that
  have been found after the big 5.2-rc1 merge.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: Make sure an alt mode exist before getting its partner
  usb: gadget: udc: lpc32xx: fix return value check in lpc32xx_udc_probe()
  usb: gadget: dwc2: fix zlp handling
  usb: dwc2: Set actual frame number for completed ISOC transfer for none DDMA
  usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC
  usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i]
  usb: phy: mxs: Disable external charger detect in mxs_phy_hw_init()
  usb: dwc2: Fix DMA cache alignment issues
  usb: dwc2: host: Fix wMaxPacketSize handling (fix webcam regression)
  USB: Fix chipmunk-like voice when using Logitech C270 for recording audio.
  USB: usb-storage: Add new ID to ums-realtek
  usb: typec: ucsi: ccg: fix memory leak in do_flash
  USB: serial: option: add Telit 0x1260 and 0x1261 compositions
  USB: serial: pl2303: add Allied Telesis VT-Kit3
  USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode
2019-06-15 07:34:23 -10:00
Linus Torvalds
fa1827d773 Merge tag 'powerpc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "One fix for a regression introduced by our 32-bit KASAN support, which
  broke booting on machines with "bootx" early debugging enabled.

  A fix for a bug which broke kexec on 32-bit, introduced by changes to
  the 32-bit STRICT_KERNEL_RWX support in v5.1.

  Finally two fixes going to stable for our THP split/collapse handling,
  discovered by Nick. The first fixes random crashes and/or corruption
  in guests under sufficient load.

  Thanks to: Nicholas Piggin, Christophe Leroy, Aaro Koskinen, Mathieu
  Malaterre"

* tag 'powerpc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
  powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate()
  powerpc/64s: Fix THP PMD collapse serialisation
  powerpc: Fix kexec failure on book3s/32
2019-06-15 07:29:32 -10:00
Linus Torvalds
6a71398c6a Merge tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Out of range read of stack trace output

 - Fix for NULL pointer dereference in trace_uprobe_create()

 - Fix to a livepatching / ftrace permission race in the module code

 - Fix for NULL pointer dereference in free_ftrace_func_mapper()

 - A couple of build warning clean ups

* tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()
  module: Fix livepatch/ftrace module text permissions race
  tracing/uprobe: Fix obsolete comment on trace_uprobe_create()
  tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create()
  tracing: Make two symbols static
  tracing: avoid build warning with HAVE_NOP_MCOUNT
  tracing: Fix out-of-range read in trace_stack_print()
2019-06-15 07:24:11 -10:00
Borislav Petkov
78f4e932f7 x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback
Adric Blake reported the following warning during suspend-resume:

  Enabling non-boot CPUs ...
  x86: Booting SMP configuration:
  smpboot: Booting Node 0 Processor 1 APIC 0x2
  unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
   at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
  Call Trace:
   intel_set_tfa
   intel_pmu_cpu_starting
   ? x86_pmu_dead_cpu
   x86_pmu_starting_cpu
   cpuhp_invoke_callback
   ? _raw_spin_lock_irqsave
   notify_cpu_starting
   start_secondary
   secondary_startup_64
  microcode: sig=0x806ea, pf=0x80, revision=0x96
  microcode: updated to revision 0xb4, date = 2019-04-01
  CPU1 is up

The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
by microcode. The log above shows that the microcode loader callback
happens after the PMU restoration, leading to the conjecture that
because the microcode hasn't been updated yet, that MSR is not present
yet, leading to the #GP.

Add a microcode loader-specific hotplug vector which comes before
the PERF vectors and thus executes earlier and makes sure the MSR is
present.

Fixes: 400816f60c ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: Adric Blake <promarbler14@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: x86@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
2019-06-15 10:00:29 +02:00
Linus Torvalds
0011572c88 Merge branch 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "This has an unusually high density of tricky fixes:

   - task_get_css() could deadlock when it races against a dying cgroup.

   - cgroup.procs didn't list thread group leaders with live threads.

     This could mislead readers to think that a cgroup is empty when
     it's not. Fixed by making PROCS iterator include dead tasks. I made
     a couple mistakes making this change and this pull request contains
     a couple follow-up patches.

   - When cpusets run out of online cpus, it updates cpusmasks of member
     tasks in bizarre ways. Joel improved the behavior significantly"

* 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: restore sanity to cpuset_cpus_allowed_fallback()
  cgroup: Fix css_task_iter_advance_css_set() cset skip condition
  cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
  cgroup: Include dying leaders with live threads in PROCS iterations
  cgroup: Implement css_task_iter_skip()
  cgroup: Call cgroup_release() before __exit_signal()
  docs cgroups: add another example size for hugetlb
  cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
2019-06-14 17:46:14 -10:00
Linus Torvalds
6aa7a22b97 Merge tag 'drm-fixes-2019-06-14' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
 "Nothing unsettling here, also not aware of anything serious still
  pending.

  The edid override regression fix took a bit longer since this seems to
  be an area with an overabundance of bad options. But the fix we have
  now seems like a good path forward.

  Next week it should be back to Dave.

  Summary:

   - fix regression on amdgpu on SI

   - fix edid override regression

   - driver fixes: amdgpu, i915, mediatek, meson, panfrost

   - fix writecombine for vmap in gem-shmem helper (used by panfrost)

   - add more panel quirks"

* tag 'drm-fixes-2019-06-14' of git://anongit.freedesktop.org/drm/drm: (25 commits)
  drm/amdgpu: return 0 by default in amdgpu_pm_load_smu_firmware
  drm/amdgpu: Fix bounds checking in amdgpu_ras_is_supported()
  drm: add fallback override/firmware EDID modes workaround
  drm/edid: abstract override/firmware EDID retrieval
  drm/i915/perf: fix whitelist on Gen10+
  drm/i915/sdvo: Implement proper HDMI audio support for SDVO
  drm/i915: Fix per-pixel alpha with CCS
  drm/i915/dmc: protect against reading random memory
  drm/i915/dsi: Use a fuzzy check for burst mode clock check
  drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc
  drm/panfrost: Require the simple_ondemand governor
  drm/panfrost: make devfreq optional again
  drm/gem_shmem: Use a writecombine mapping for ->vaddr
  drm: panel-orientation-quirks: Add quirk for GPD MicroPC
  drm: panel-orientation-quirks: Add quirk for GPD pocket2
  drm/meson: fix G12A primary plane disabling
  drm/meson: fix primary plane disabling
  drm/meson: fix G12A HDMI PLL settings for 4K60 1000/1001 variations
  drm/mediatek: call mtk_dsi_stop() after mtk_drm_crtc_atomic_disable()
  drm/mediatek: clear num_pipes when unbind driver
  ...
2019-06-14 17:34:45 -10:00
Linus Torvalds
4066524401 Merge tag 'gfs2-v5.2.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
 "Fix rounding error in gfs2_iomap_page_prepare"

* tag 'gfs2-v5.2.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix rounding error in gfs2_iomap_page_prepare
2019-06-14 17:27:12 -10:00
Linus Torvalds
1ed1fa5f9c Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
 "A single bug fix for hpsa.

  The user visible consequences aren't clear, but the ioaccel2 raid
  acceleration may misfire on the malformed request assuming the payload
  is big enough to require chaining (more than 31 sg entries)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: hpsa: correct ioaccel2 chaining
2019-06-14 15:52:51 -10:00
Linus Torvalds
7b10315128 Merge tag 'for-linus-20190614' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - Remove references to old schedulers for the scheduler switching and
   blkio controller documentation (Andreas)

 - Kill duplicate check for report zone for null_blk (Chaitanya)

 - Two bcache fixes (Coly)

 - Ensure that mq-deadline is selected if zoned block device is enabled,
   as we need that to support them (Damien)

 - Fix io_uring memory leak (Eric)

 - ps3vram fallout from LBDAF removal (Geert)

 - Redundant blk-mq debugfs debugfs_create return check cleanup (Greg)

 - Extend NOPLM quirk for ST1000LM024 drives (Hans)

 - Remove error path warning that can now trigger after the queue
   removal/addition fixes (Ming)

* tag 'for-linus-20190614' of git://git.kernel.dk/linux-block:
  block/ps3vram: Use %llu to format sector_t after LBDAF removal
  libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk
  bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached
  bcache: fix stack corruption by PRECEDING_KEY()
  blk-mq: remove WARN_ON(!q->elevator) from blk_mq_sched_free_requests
  blkio-controller.txt: Remove references to CFQ
  block/switching-sched.txt: Update to blk-mq schedulers
  null_blk: remove duplicate check for report zone
  blk-mq: no need to check return value of debugfs_create functions
  io_uring: fix memory leak of UNIX domain socket inode
  block: force select mq-deadline for zoned block devices
2019-06-14 15:41:18 -10:00
Linus Torvalds
5dcedf4600 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "I2C has two simple but wanted driver fixes for you"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: pca-platform: Fix GPIO lookup code
  i2c: acorn: fix i2c warning
2019-06-14 15:25:27 -10:00
Casey Schaufler
6e7739fc93 Smack: Restore the smackfsdef mount option and add missing prefixes
The 5.1 mount system rework changed the smackfsdef mount option to
smackfsdefault.  This fixes the regression by making smackfsdef treated
the same way as smackfsdefault.

Also fix the smack_param_specs[] to have "smack" prefixes on all the
names.  This isn't visible to a user unless they either:

 (a) Try to mount a filesystem that's converted to the internal mount API
     and that implements the ->parse_monolithic() context operation - and
     only then if they call security_fs_context_parse_param() rather than
     security_sb_eat_lsm_opts().

     There are no examples of this upstream yet, but nfs will probably want
     to do this for nfs2 or nfs3.

 (b) Use fsconfig() to configure the filesystem - in which case
     security_fs_context_parse_param() will be called.

This issue is that smack_sb_eat_lsm_opts() checks for the "smack" prefix
on the options, but smack_fs_context_parse_param() does not.

Fixes: c3300aaf95 ("smack: get rid of match_token()")
Fixes: 2febd254ad ("smack: Implement filesystem context security hooks")
Cc: stable@vger.kernel.org
Reported-by: Jose Bollo <jose.bollo@iot.bzh>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-14 14:25:04 -10:00
Wei Li
04e03d9a61 ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()
The mapper may be NULL when called from register_ftrace_function_probe()
with probe->data == NULL.

This issue can be reproduced as follow (it may be covered by compiler
optimization sometime):

/ # cat /sys/kernel/debug/tracing/set_ftrace_filter
#### all functions enabled ####
/ # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
[  206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  206.952402] Mem abort info:
[  206.952819]   ESR = 0x96000006
[  206.955326]   Exception class = DABT (current EL), IL = 32 bits
[  206.955844]   SET = 0, FnV = 0
[  206.956272]   EA = 0, S1PTW = 0
[  206.956652] Data abort info:
[  206.957320]   ISV = 0, ISS = 0x00000006
[  206.959271]   CM = 0, WnR = 0
[  206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
[  206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
[  206.964953] Internal error: Oops: 96000006 [#1] SMP
[  206.971122] Dumping ftrace buffer:
[  206.973677]    (ftrace buffer empty)
[  206.975258] Modules linked in:
[  206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
[  206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
[  206.978955] Hardware name: linux,dummy-virt (DT)
[  206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
[  206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
[  206.980874] lr : ftrace_count_free+0x68/0x80
[  206.982539] sp : ffff0000182f3ab0
[  206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
[  206.983632] x27: ffff000013054b40 x26: 0000000000000001
[  206.984000] x25: ffff00001385f000 x24: 0000000000000000
[  206.984394] x23: ffff000013453000 x22: ffff000013054000
[  206.984775] x21: 0000000000000000 x20: ffff00001385fe28
[  206.986575] x19: ffff000013872c30 x18: 0000000000000000
[  206.987111] x17: 0000000000000000 x16: 0000000000000000
[  206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
[  206.987850] x13: 000000000017430e x12: 0000000000000580
[  206.988251] x11: 0000000000000000 x10: cccccccccccccccc
[  206.988740] x9 : 0000000000000000 x8 : ffff000013917550
[  206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
[  206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
[  206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
[  206.991771] x1 : 0000000000000000 x0 : 0000000000000000
[  206.992557] Call trace:
[  206.993101]  free_ftrace_func_mapper+0x2c/0x118
[  206.994827]  ftrace_count_free+0x68/0x80
[  206.995238]  release_probe+0xfc/0x1d0
[  206.995555]  register_ftrace_function_probe+0x4a8/0x868
[  206.995923]  ftrace_trace_probe_callback.isra.4+0xb8/0x180
[  206.996330]  ftrace_dump_callback+0x50/0x70
[  206.996663]  ftrace_regex_write.isra.29+0x290/0x3a8
[  206.997157]  ftrace_filter_write+0x44/0x60
[  206.998971]  __vfs_write+0x64/0xf0
[  206.999285]  vfs_write+0x14c/0x2f0
[  206.999591]  ksys_write+0xbc/0x1b0
[  206.999888]  __arm64_sys_write+0x3c/0x58
[  207.000246]  el0_svc_common.constprop.0+0x408/0x5f0
[  207.000607]  el0_svc_handler+0x144/0x1c8
[  207.000916]  el0_svc+0x8/0xc
[  207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
[  207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
[  207.010126] Kernel panic - not syncing: Fatal exception
[  207.011322] SMP: stopping secondary CPUs
[  207.013956] Dumping ftrace buffer:
[  207.014595]    (ftrace buffer empty)
[  207.015632] Kernel Offset: disabled
[  207.017187] CPU features: 0x002,20006008
[  207.017985] Memory Limit: none
[  207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.com

Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 17:40:21 -04:00
Josh Poimboeuf
9f255b632b module: Fix livepatch/ftrace module text permissions race
It's possible for livepatch and ftrace to be toggling a module's text
permissions at the same time, resulting in the following panic:

  BUG: unable to handle page fault for address: ffffffffc005b1d9
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0003) - permissions violation
  PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
  Oops: 0003 [#1] PREEMPT SMP PTI
  CPU: 1 PID: 453 Comm: insmod Tainted: G           O  K   5.2.0-rc1-a188339ca5 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
  RIP: 0010:apply_relocate_add+0xbe/0x14c
  Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
  RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
  RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
  RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
  RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
  R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
  R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
  FS:  00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   klp_init_object_loaded+0x10f/0x219
   ? preempt_latency_start+0x21/0x57
   klp_enable_patch+0x662/0x809
   ? virt_to_head_page+0x3a/0x3c
   ? kfree+0x8c/0x126
   patch_init+0x2ed/0x1000 [livepatch_test02]
   ? 0xffffffffc0060000
   do_one_initcall+0x9f/0x1c5
   ? kmem_cache_alloc_trace+0xc4/0xd4
   ? do_init_module+0x27/0x210
   do_init_module+0x5f/0x210
   load_module+0x1c41/0x2290
   ? fsnotify_path+0x3b/0x42
   ? strstarts+0x2b/0x2b
   ? kernel_read+0x58/0x65
   __do_sys_finit_module+0x9f/0xc3
   ? __do_sys_finit_module+0x9f/0xc3
   __x64_sys_finit_module+0x1a/0x1c
   do_syscall_64+0x52/0x61
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

The above panic occurs when loading two modules at the same time with
ftrace enabled, where at least one of the modules is a livepatch module:

CPU0					CPU1
klp_enable_patch()
  klp_init_object_loaded()
    module_disable_ro()
    					ftrace_module_enable()
					  ftrace_arch_code_modify_post_process()
				    	    set_all_modules_text_ro()
      klp_write_object_relocations()
        apply_relocate_add()
	  *patches read-only code* - BOOM

A similar race exists when toggling ftrace while loading a livepatch
module.

Fix it by ensuring that the livepatch and ftrace code patching
operations -- and their respective permissions changes -- are protected
by the text_mutex.

Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.com

Reported-by: Johannes Erdfelt <johannes@erdfelt.com>
Fixes: 444d13ff10 ("modules: add ro_after_init support")
Acked-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 17:01:50 -04:00
Eiichi Tsukata
a4158345ec tracing/uprobe: Fix obsolete comment on trace_uprobe_create()
Commit 0597c49c69 ("tracing/uprobes: Use dyn_event framework for
uprobe events") cleaned up the usage of trace_uprobe_create(), and the
function has been no longer used for removing uprobe/uretprobe.

Link: http://lkml.kernel.org/r/20190614074026.8045-2-devel@etsukata.com

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 17:00:36 -04:00
Eiichi Tsukata
f01098c74b tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create()
Just like the case of commit 8b05a3a750 ("tracing/kprobes: Fix NULL
pointer dereference in trace_kprobe_create()"), writing an incorrectly
formatted string to uprobe_events can trigger NULL pointer dereference.

Reporeducer:

  # echo r > /sys/kernel/debug/tracing/uprobe_events

dmesg:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 8000000079d12067 P4D 8000000079d12067 PUD 7b7ab067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 1903 Comm: bash Not tainted 5.2.0-rc3+ #15
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
  RIP: 0010:strchr+0x0/0x30
  Code: c0 eb 0d 84 c9 74 18 48 83 c0 01 48 39 d0 74 0f 0f b6 0c 07 3a 0c 06 74 ea 19 c0 83 c8 01 c3 31 c0 c3 0f 1f 84 00 00 00 00 00 <0f> b6 07 89 f2 40 38 f0 75 0e eb 13 0f b6 47 01 48 83 c
  RSP: 0018:ffffb55fc0403d10 EFLAGS: 00010293

  RAX: ffff993ffb793400 RBX: 0000000000000000 RCX: ffffffffa4852625
  RDX: 0000000000000000 RSI: 000000000000002f RDI: 0000000000000000
  RBP: ffffb55fc0403dd0 R08: ffff993ffb793400 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff993ff9cc1668 R14: 0000000000000001 R15: 0000000000000000
  FS:  00007f30c5147700(0000) GS:ffff993ffda00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000007b628000 CR4: 00000000000006f0
  Call Trace:
   trace_uprobe_create+0xe6/0xb10
   ? __kmalloc_track_caller+0xe6/0x1c0
   ? __kmalloc+0xf0/0x1d0
   ? trace_uprobe_create+0xb10/0xb10
   create_or_delete_trace_uprobe+0x35/0x90
   ? trace_uprobe_create+0xb10/0xb10
   trace_run_command+0x9c/0xb0
   trace_parse_run_command+0xf9/0x1eb
   ? probes_open+0x80/0x80
   __vfs_write+0x43/0x90
   vfs_write+0x14a/0x2a0
   ksys_write+0xa2/0x170
   do_syscall_64+0x7f/0x200
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Link: http://lkml.kernel.org/r/20190614074026.8045-1-devel@etsukata.com

Cc: stable@vger.kernel.org
Fixes: 0597c49c69 ("tracing/uprobes: Use dyn_event framework for uprobe events")
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 16:51:14 -04:00
YueHaibing
ff585c5b9a tracing: Make two symbols static
Fix sparse warnings:

kernel/trace/trace.c:6927:24: warning:
 symbol 'get_tracing_log_err' was not declared. Should it be static?
kernel/trace/trace.c:8196:15: warning:
 symbol 'trace_instance_dir' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20190614153210.24424-1-yuehaibing@huawei.com

Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 16:49:26 -04:00
Vasily Gorbik
cbdaeaf050 tracing: avoid build warning with HAVE_NOP_MCOUNT
Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it)
and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for
testing CC_USING_* defines) to avoid conditional compilation and fix
the following gcc 9 warning on s390:

kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined
but not used [-Wunused-function]

Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours

Fixes: 2f4df0017b ("tracing: Add -mcount-nop option support")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 16:34:57 -04:00
Eiichi Tsukata
becf33f694 tracing: Fix out-of-range read in trace_stack_print()
Puts range check before dereferencing the pointer.

Reproducer:

  # echo stacktrace > trace_options
  # echo 1 > events/enable
  # cat trace > /dev/null

KASAN report:

  ==================================================================
  BUG: KASAN: use-after-free in trace_stack_print+0x26b/0x2c0
  Read of size 8 at addr ffff888069d20000 by task cat/1953

  CPU: 0 PID: 1953 Comm: cat Not tainted 5.2.0-rc3+ #5
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
  Call Trace:
   dump_stack+0x8a/0xce
   print_address_description+0x60/0x224
   ? trace_stack_print+0x26b/0x2c0
   ? trace_stack_print+0x26b/0x2c0
   __kasan_report.cold+0x1a/0x3e
   ? trace_stack_print+0x26b/0x2c0
   kasan_report+0xe/0x20
   trace_stack_print+0x26b/0x2c0
   print_trace_line+0x6ea/0x14d0
   ? tracing_buffers_read+0x700/0x700
   ? trace_find_next_entry_inc+0x158/0x1d0
   s_show+0xea/0x310
   seq_read+0xaa7/0x10e0
   ? seq_escape+0x230/0x230
   __vfs_read+0x7c/0x100
   vfs_read+0x16c/0x3a0
   ksys_read+0x121/0x240
   ? kernel_write+0x110/0x110
   ? perf_trace_sys_enter+0x8a0/0x8a0
   ? syscall_slow_exit_work+0xa9/0x410
   do_syscall_64+0xb7/0x390
   ? prepare_exit_to_usermode+0x165/0x200
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f867681f910
  Code: b6 fe ff ff 48 8d 3d 0f be 08 00 48 83 ec 08 e8 06 db 01 00 66 0f 1f 44 00 00 83 3d f9 2d 2c 00 00 75 10 b8 00 00 00 00 04
  RSP: 002b:00007ffdabf23488 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
  RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f867681f910
  RDX: 0000000000020000 RSI: 00007f8676cde000 RDI: 0000000000000003
  RBP: 00007f8676cde000 R08: ffffffffffffffff R09: 0000000000000000
  R10: 0000000000000871 R11: 0000000000000246 R12: 00007f8676cde000
  R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000ec0

  Allocated by task 1214:
   save_stack+0x1b/0x80
   __kasan_kmalloc.constprop.0+0xc2/0xd0
   kmem_cache_alloc+0xaf/0x1a0
   getname_flags+0xd2/0x5b0
   do_sys_open+0x277/0x5a0
   do_syscall_64+0xb7/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 1214:
   save_stack+0x1b/0x80
   __kasan_slab_free+0x12c/0x170
   kmem_cache_free+0x8a/0x1c0
   putname+0xe1/0x120
   do_sys_open+0x2c5/0x5a0
   do_syscall_64+0xb7/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  The buggy address belongs to the object at ffff888069d20000
   which belongs to the cache names_cache of size 4096
  The buggy address is located 0 bytes inside of
   4096-byte region [ffff888069d20000, ffff888069d21000)
  The buggy address belongs to the page:
  page:ffffea0001a74800 refcount:1 mapcount:0 mapping:ffff88806ccd1380 index:0x0 compound_mapcount: 0
  flags: 0x100000000010200(slab|head)
  raw: 0100000000010200 dead000000000100 dead000000000200 ffff88806ccd1380
  raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff888069d1ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffff888069d1ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  >ffff888069d20000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                     ^
   ffff888069d20080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff888069d20100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ==================================================================

Link: http://lkml.kernel.org/r/20190610040016.5598-1-devel@etsukata.com

Fixes: 4285f2fcef ("tracing: Remove the ULONG_MAX stack trace hackery")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 16:28:42 -04:00
Andreas Gruenbacher
2741b6723b gfs2: Fix rounding error in gfs2_iomap_page_prepare
The pos and len arguments to the iomap page_prepare callback are not
block aligned, so we need to take that into account when computing the
number of blocks.

Fixes: d0a22a4b03 ("gfs2: Fix iomap write page reclaim deadlock")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-14 18:49:07 +02:00
Linus Torvalds
72a20cee5d Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "Here are some arm64 fixes for -rc5.

  The only non-trivial change (in terms of the diffstat) is fixing our
  SVE ptrace API for big-endian machines, but the majority of this is
  actually the addition of much-needed comments and updates to the
  documentation to try to avoid this mess biting us again in future.

  There are still a couple of small things on the horizon, but nothing
  major at this point.

  Summary:

   - Fix broken SVE ptrace API when running in a big-endian configuration

   - Fix performance regression due to off-by-one in TLBI range checking

   - Fix build regression when using Clang"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/sve: Fix missing SVE/FPSIMD endianness conversions
  arm64: tlbflush: Ensure start/end of address range are aligned to stride
  arm64: Don't unconditionally add -Wno-psabi to KBUILD_CFLAGS
2019-06-14 06:16:47 -10:00
Linus Torvalds
fd6b99fa41 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "16 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/devm_memremap_pages: fix final page put race
  PCI/P2PDMA: track pgmap references per resource, not globally
  lib/genalloc: introduce chunk owners
  PCI/P2PDMA: fix the gen_pool_add_virt() failure path
  mm/devm_memremap_pages: introduce devm_memunmap_pages
  drivers/base/devres: introduce devm_release_action()
  mm/vmscan.c: fix trying to reclaim unevictable LRU page
  coredump: fix race condition between collapse_huge_page() and core dumping
  mm/mlock.c: change count_mm_mlocked_page_nr return type
  mm: mmu_gather: remove __tlb_reset_range() for force flush
  fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
  mm/vmscan.c: fix recent_rotated history
  mm/mlock.c: mlockall error for flag MCL_ONFAULT
  scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE
  mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
  mm: memcontrol: don't batch updates of local VM stats and events
2019-06-14 06:08:46 -10:00
Daniel Vetter
e14c5873d2 Merge branch 'drm-fixes-5.2' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Fixes for 5.2:
- Extend previous vce fix for resume to uvd and vcn
- Fix bounds checking in ras debugfs interface
- Fix a regression on SI using amdgpu

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613021856.3307-1-alexander.deucher@amd.com
2019-06-14 17:55:22 +02:00
Linus Torvalds
c78ad1be4b Merge tag 'iommu-fixes-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:

 - three fixes for Intel VT-d to fix a potential dead-lock, a formatting
   fix and a bit setting fix

 - one fix for the ARM-SMMU to make it work on some platforms with
   sub-optimal SMMU emulation

* tag 'iommu-fixes-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/arm-smmu: Avoid constant zero in TLBI writes
  iommu/vt-d: Set the right field for Page Walk Snoop
  iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock
  iommu: Add missing new line for dma type
2019-06-14 05:49:35 -10:00
Linus Torvalds
7617c9a087 Merge tag 'gpio-v5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fix from Linus Walleij:
 "A single fix for the PCA953x driver affecting some fringe variants of
  the chip"

* tag 'gpio-v5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: pca953x: hack to fix 24 bit gpio expanders
2019-06-14 05:48:29 -10:00
Linus Torvalds
bcb46a0e0e Merge tag 'sound-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "It might feel like deja vu to receive a bulk of changes at rc5, and it
  happens again; we've got a collection of fixes for ASoC. Most of fixes
  are targeted for the newly merged SOF (Sound Open Firmware) stuff and
  the relevant fixes for Intel platforms.

  Other than that, there are a few regression fixes for the recent ASoC
  core changes and HD-audio quirk, as well as a couple of FireWire fixes
  and for other ASoC codecs"

* tag 'sound-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (54 commits)
  Revert "ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops"
  ALSA: ice1712: Check correct return value to snd_i2c_sendbytes (EWS/DMX 6Fire)
  ALSA: oxfw: allow PCM capture for Stanton SCS.1m
  ALSA: firewire-motu: fix destruction of data for isochronous resources
  ASoC: Intel: sst: fix kmalloc call with wrong flags
  ASoC: core: Fix deadlock in snd_soc_instantiate_card()
  SoC: rt274: Fix internal jack assignment in set_jack callback
  ALSA: hdac: fix memory release for SST and SOF drivers
  ASoC: SOF: Intel: hda: use the defined ppcap functions
  ASoC: core: move DAI pre-links initiation to snd_soc_instantiate_card
  ASoC: Intel: cht_bsw_rt5672: fix kernel oops with platform_name override
  ASoC: Intel: cht_bsw_nau8824: fix kernel oops with platform_name override
  ASoC: Intel: bytcht_es8316: fix kernel oops with platform_name override
  ASoC: Intel: cht_bsw_max98090: fix kernel oops with platform_name override
  ASoC: sun4i-i2s: Add offset to RX channel select
  ASoC: sun4i-i2s: Fix sun8i tx channel offset mask
  ASoC: max98090: remove 24-bit format support if RJ is 0
  ASoC: da7219: Fix build error without CONFIG_I2C
  ASoC: SOF: Intel: hda: Fix COMPILE_TEST build error
  ASoC: SOF: fix DSP oops definitions in FW ABI
  ...
2019-06-14 05:37:06 -10:00
Andrey Ryabinin
f3176ec942 x86/kasan: Fix boot with 5-level paging and KASAN
Since commit d52888aa27 ("x86/mm: Move LDT remap out of KASLR region on
5-level paging") kernel doesn't boot with KASAN on 5-level paging machines.
The bug is actually in early_p4d_offset() and introduced by commit
12a8cc7fcf ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")

early_p4d_offset() tries to convert pgd_val(*pgd) value to a physical
address. This doesn't make sense because pgd_val() already contains the
physical address.

It did work prior to commit d52888aa27 because the result of
"__pa_nodebug(pgd_val(*pgd)) & PTE_PFN_MASK" was the same as "pgd_val(*pgd)
& PTE_PFN_MASK". __pa_nodebug() just set some high bits which were masked
out by applying PTE_PFN_MASK.

After the change of the PAGE_OFFSET offset in commit d52888aa27
__pa_nodebug(pgd_val(*pgd)) started to return a value with more high bits
set and PTE_PFN_MASK wasn't enough to mask out all of them. So it returns a
wrong not even canonical address and crashes on the attempt to dereference
it.

Switch back to pgd_val() & PTE_PFN_MASK to cure the issue.

Fixes: 12a8cc7fcf ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Cc: stable@vger.kernel.org
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190614143149.2227-1-aryabinin@virtuozzo.com
2019-06-14 16:37:30 +02:00
Thomas Gleixner
e3ff9c3678 timekeeping: Repair ktime_get_coarse*() granularity
Jason reported that the coarse ktime based time getters advance only once
per second and not once per tick as advertised.

The code reads only the monotonic base time, which advances once per
second. The nanoseconds are accumulated on every tick in xtime_nsec up to
a second and the regular time getters take this nanoseconds offset into
account, but the ktime_get_coarse*() implementation fails to do so.

Add the accumulated xtime_nsec value to the monotonic base time to get the
proper per tick advancing coarse tinme.

Fixes: b9ff604cff ("timekeeping: Add ktime_get_coarse_with_offset")
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906132136280.1791@nanos.tec.linutronix.de
2019-06-14 11:51:44 +02:00
Daniel Vetter
744ed8cb8a Merge tag 'drm-misc-fixes-2019-06-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Sean writes:

meson: A few G12A fixes across the driver (Neil)
quirks: A couple quirks for GPD devices (Hans)
gem_shmem: Use writecombine when vmapping non-dmabuf BOs (Boris)
panfrost: A couple tweaks to requiring devfreq (Neil & Ezequiel)
edid: Ensure we return the override mode when ddc probe fails (Jani)

Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613143946.GA24233@art_vandelay
2019-06-14 11:36:30 +02:00
Hui Wang
17d304604a Revert "ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops"
This reverts commit 9cb40eb184.

This patch introduces noise and headphone playback issue after
rebooting or suspending/resuming. Let us revert it.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203831
Fixes: 9cb40eb184 ("ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops")
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-06-14 10:52:03 +02:00
Dan Williams
50f44ee724 mm/devm_memremap_pages: fix final page put race
Logan noticed that devm_memremap_pages_release() kills the percpu_ref
drops all the page references that were acquired at init and then
immediately proceeds to unplug, arch_remove_memory(), the backing pages
for the pagemap.  If for some reason device shutdown actually collides
with a busy / elevated-ref-count page then arch_remove_memory() should
be deferred until after that reference is dropped.

As it stands the "wait for last page ref drop" happens *after*
devm_memremap_pages_release() returns, which is obviously too late and
can lead to crashes.

Fix this situation by assigning the responsibility to wait for the
percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup()
callback.  Implement the new cleanup callback for all
devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma.

Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 41e94a8513 ("add devm_memremap_pages")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
Dan Williams
1570175abd PCI/P2PDMA: track pgmap references per resource, not globally
In preparation for fixing a race between devm_memremap_pages_release()
and the final put of a page from the device-page-map, allocate a
percpu-ref per p2pdma resource mapping.

Link: http://lkml.kernel.org/r/155727338646.292046.9922678317501435597.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
Dan Williams
795ee30648 lib/genalloc: introduce chunk owners
The p2pdma facility enables a provider to publish a pool of dma
addresses for a consumer to allocate.  A genpool is used internally by
p2pdma to collect dma resources, 'chunks', to be handed out to
consumers.  Whenever a consumer allocates a resource it needs to pin the
'struct dev_pagemap' instance that backs the chunk selected by
pci_alloc_p2pmem().

Currently that reference is taken globally on the entire provider
device.  That sets up a lifetime mismatch whereby the p2pdma core needs
to maintain hacks to make sure the percpu_ref is not released twice.

This lifetime mismatch also stands in the way of a fix to
devm_memremap_pages() whereby devm_memremap_pages_release() must wait for
the percpu_ref ->release() callback to complete before it can proceed to
teardown pages.

So, towards fixing this situation, introduce the ability to store a 'chunk
owner' at gen_pool_add() time, and a facility to retrieve the owner at
gen_pool_{alloc,free}() time.  For p2pdma this will be used to store and
recall individual dev_pagemap reference counter instances per-chunk.

Link: http://lkml.kernel.org/r/155727338118.292046.13407378933221579644.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
Dan Williams
e615a19121 PCI/P2PDMA: fix the gen_pool_add_virt() failure path
The pci_p2pdma_add_resource() implementation immediately frees the pgmap
if gen_pool_add_virt() fails.  However, that means that when @dev
triggers a devres release devm_memremap_pages_release() will crash
trying to access the freed @pgmap.

Use the new devm_memunmap_pages() to manually free the mapping in the
error path.

Link: http://lkml.kernel.org/r/155727337603.292046.13101332703665246702.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fixes: 52916982af ("PCI/P2PDMA: Support peer-to-peer memory")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
Dan Williams
2e3f139e8e mm/devm_memremap_pages: introduce devm_memunmap_pages
Use the new devm_release_action() facility to allow
devm_memremap_pages_release() to be manually triggered.

Link: http://lkml.kernel.org/r/155727337088.292046.5774214552136776763.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
Dan Williams
2374b68225 drivers/base/devres: introduce devm_release_action()
Patch series "mm/devm_memremap_pages: Fix page release race", v2.

Logan audited the devm_memremap_pages() shutdown path and noticed that
it was possible to proceed to arch_remove_memory() before all potential
page references have been reaped.

Introduce a new ->cleanup() callback to do the work of waiting for any
straggling page references and then perform the percpu_ref_exit() in
devm_memremap_pages_release() context.

For p2pdma this involves some deeper reworks to reference count
resources on a per-instance basis rather than a per pci-device basis.  A
modified genalloc api is introduced to convey a driver-private pointer
through gen_pool_{alloc,free}() interfaces.  Also, a
devm_memunmap_pages() api is introduced since p2pdma does not
auto-release resources on a setup failure.

The dax and pmem changes pass the nvdimm unit tests, and the p2pdma
changes should now pass testing with the pci_p2pdma_release() fix.
Jrme, how does this look for HMM?

This patch (of 6):

The devm_add_action() facility allows a resource allocation routine to
add custom devm semantics.  One such user is devm_memremap_pages().

There is now a need to manually trigger
devm_memremap_pages_release().  Introduce devm_release_action() so the
release action can be triggered via a new devm_memunmap_pages() api in a
follow-on change.

Link: http://lkml.kernel.org/r/155727336530.292046.2926860263201336366.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
Minchan Kim
a58f2cef26 mm/vmscan.c: fix trying to reclaim unevictable LRU page
There was the below bug report from Wu Fangsuo.

On the CMA allocation path, isolate_migratepages_range() could isolate
unevictable LRU pages and reclaim_clean_page_from_list() can try to
reclaim them if they are clean file-backed pages.

  page:ffffffbf02f33b40 count:86 mapcount:84 mapping:ffffffc08fa7a810 index:0x24
  flags: 0x19040c(referenced|uptodate|arch_1|mappedtodisk|unevictable|mlocked)
  raw: 000000000019040c ffffffc08fa7a810 0000000000000024 0000005600000053
  raw: ffffffc009b05b20 ffffffc009b05b20 0000000000000000 ffffffc09bf3ee80
  page dumped because: VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page))
  page->mem_cgroup:ffffffc09bf3ee80
  ------------[ cut here ]------------
  kernel BUG at /home/build/farmland/adroid9.0/kernel/linux/mm/vmscan.c:1350!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 7125 Comm: syz-executor Tainted: G S              4.14.81 #3
  Hardware name: ASR AQUILAC EVB (DT)
  task: ffffffc00a54cd00 task.stack: ffffffc009b00000
  PC is at shrink_page_list+0x1998/0x3240
  LR is at shrink_page_list+0x1998/0x3240
  pc : [<ffffff90083a2158>] lr : [<ffffff90083a2158>] pstate: 60400045
  sp : ffffffc009b05940
  ..
     shrink_page_list+0x1998/0x3240
     reclaim_clean_pages_from_list+0x3c0/0x4f0
     alloc_contig_range+0x3bc/0x650
     cma_alloc+0x214/0x668
     ion_cma_allocate+0x98/0x1d8
     ion_alloc+0x200/0x7e0
     ion_ioctl+0x18c/0x378
     do_vfs_ioctl+0x17c/0x1780
     SyS_ioctl+0xac/0xc0

Wu found it's due to commit ad6b67041a ("mm: remove SWAP_MLOCK in
ttu").  Before that, unevictable pages go to cull_mlocked so that we
can't reach the VM_BUG_ON_PAGE line.

To fix the issue, this patch filters out unevictable LRU pages from the
reclaim_clean_pages_from_list in CMA.

Link: http://lkml.kernel.org/r/20190524071114.74202-1-minchan@kernel.org
Fixes: ad6b67041a ("mm: remove SWAP_MLOCK in ttu")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Wu Fangsuo <fangsuowu@asrmicro.com>
Debugged-by: Wu Fangsuo <fangsuowu@asrmicro.com>
Tested-by: Wu Fangsuo <fangsuowu@asrmicro.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Suryawanshi <pankaj.suryawanshi@einfochips.com>
Cc: <stable@vger.kernel.org>	[4.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
Andrea Arcangeli
59ea6d06cf coredump: fix race condition between collapse_huge_page() and core dumping
When fixing the race conditions between the coredump and the mmap_sem
holders outside the context of the process, we focused on
mmget_not_zero()/get_task_mm() callers in 04f5866e41 ("coredump: fix
race condition between mmget_not_zero()/get_task_mm() and core
dumping"), but those aren't the only cases where the mmap_sem can be
taken outside of the context of the process as Michal Hocko noticed
while backporting that commit to older -stable kernels.

If mmgrab() is called in the context of the process, but then the
mm_count reference is transferred outside the context of the process,
that can also be a problem if the mmap_sem has to be taken for writing
through that mm_count reference.

khugepaged registration calls mmgrab() in the context of the process,
but the mmap_sem for writing is taken later in the context of the
khugepaged kernel thread.

collapse_huge_page() after taking the mmap_sem for writing doesn't
modify any vma, so it's not obvious that it could cause a problem to the
coredump, but it happens to modify the pmd in a way that breaks an
invariant that pmd_trans_huge_lock() relies upon.  collapse_huge_page()
needs the mmap_sem for writing just to block concurrent page faults that
call pmd_trans_huge_lock().

Specifically the invariant that "!pmd_trans_huge()" cannot become a
"pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.

The coredump will call __get_user_pages() without mmap_sem for reading,
which eventually can invoke a lockless page fault which will need a
functional pmd_trans_huge_lock().

So collapse_huge_page() needs to use mmget_still_valid() to check it's
not running concurrently with the coredump...  as long as the coredump
can invoke page faults without holding the mmap_sem for reading.

This has "Fixes: khugepaged" to facilitate backporting, but in my view
it's more a bug in the coredump code that will eventually have to be
rewritten to stop invoking page faults without the mmap_sem for reading.
So the long term plan is still to drop all mmget_still_valid().

Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
Fixes: ba76149f47 ("thp: khugepaged")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00