commit adf6895754 upstream.
Integration testing with a BIOS that generates injected health event
notifications fails to communicate those events to userspace. The nfit
driver neglects to link the ACPI DIMM device with the necessary driver
data so acpi_nvdimm_notify() fails this lookup:
nfit_mem = dev_get_drvdata(dev);
if (nfit_mem && nfit_mem->flags_attr)
sysfs_notify_dirent(nfit_mem->flags_attr);
Add the necessary linkage when installing the notification handler and
clean it up when the nfit driver instance is torn down.
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Fixes: ba9c8dd3c2 ("acpi, nfit: add dimm device notification support")
Reported-by: Daniel Osawa <daniel.k.osawa@intel.com>
Tested-by: Daniel Osawa <daniel.k.osawa@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb82e0b4a7 upstream.
The commit f6f8285132 ("pstore: pass allocated memory region back to
caller") changed the check of the return value from erst_read() in
erst_reader() in the following way:
if (len == -ENOENT)
goto skip;
- else if (len < 0) {
- rc = -1;
+ else if (len < sizeof(*rcd)) {
+ rc = -EIO;
goto out;
This introduced another bug: since the comparison with sizeof() is
cast to unsigned, a negative len value doesn't hit any longer.
As a result, when an error is returned from erst_read(), the code
falls through, and it may eventually lead to some weird thing like
memory corruption.
This patch adds the negative error value check more explicitly for
addressing the issue.
Fixes: f6f8285132 (pstore: pass allocated memory region back to caller)
Tested-by: Jerry Tang <jtang@suse.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 53c5eaabae upstream.
Originally the Samsung quirks removed by commit 4c237371 can be covered
by commit e923e8e7 and ec_freeze_events=Y mode. But commit 9c40f956
changed ec_freeze_events=Y back to N, making this problem re-surface.
Actually, if commit e923e8e7 is robust enough, we can freely change
ec_freeze_events mode, so this patch fixes the issue by improving
commit e923e8e7.
Related commits listed in the merged order:
Commit: e923e8e79e
Subject: ACPI / EC: Fix an issue that SCI_EVT cannot be detected
after event is enabled
Commit: 4c237371f2
Subject: ACPI / EC: Remove old CLEAR_ON_RESUME quirk
Commit: 9c40f956ce
Subject: Revert "ACPI / EC: Enable event freeze mode..." to fix
a regression
This patch not only fixes the reported post-resume EC event triggering
source issue, but also fixes an unreported similar issue related to the
driver bind by adding EC event triggering source in ec_install_handlers().
Fixes: e923e8e79e (ACPI / EC: Fix an issue that SCI_EVT cannot be detected after event is enabled)
Fixes: 4c237371f2 (ACPI / EC: Remove old CLEAR_ON_RESUME quirk)
Fixes: 9c40f956ce (Revert "ACPI / EC: Enable event freeze mode..." to fix a regression)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196833
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reported-by: Alistair Hamilton <ahpatent@gmail.com>
Tested-by: Alistair Hamilton <ahpatent@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98529b9272 upstream.
Commit 2a5708409e (ACPI / EC: Fix a gap that ECDT EC cannot handle
EC events) introduced acpi_ec_ecdt_start(), but that function is
invoked before acpi_ec_query_init(), which is too early. This causes
the kernel to crash if an EC event occurs after boot, when ec_query_wq
is not valid:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
...
Workqueue: events acpi_ec_event_handler
task: ffff9f539790dac0 task.stack: ffffb437c0e10000
RIP: 0010:__queue_work+0x32/0x430
Normally, the DSDT EC should always be valid, so acpi_ec_ecdt_start()
is actually a no-op in the majority of cases. However, commit
c712bb58d8 (ACPI / EC: Add support to skip boot stage DSDT probe)
caused the probing of the DSDT EC as the "boot EC" to be skipped when
the ECDT EC is valid and uncovered the bug.
Fix this issue by invoking acpi_ec_ecdt_start() after acpi_ec_query_init()
in acpi_ec_init().
Link: https://jira01.devtools.intel.com/browse/LCK-4348
Fixes: 2a5708409e (ACPI / EC: Fix a gap that ECDT EC cannot handle EC events)
Fixes: c712bb58d8 (ACPI / EC: Add support to skip boot stage DSDT probe)
Reported-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: Feng Chenzhou <chenzhoux.feng@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d64f82cce upstream.
When removing a GHES device notified by SCI, list_del_rcu() is used,
ghes_remove() should call synchronize_rcu() before it goes on to call
kfree(ghes), otherwise concurrent RCU readers may still hold this list
entry after it has been freed.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Fixes: 81e88fdc43 (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e3d5092b67 upstream.
The on-stack resource-window 'win' in setup_res() is not
properly initialized. This causes the pointers in the
embedded 'struct resource' to contain stale addresses.
These pointers (in my case the ->child pointer) later get
propagated to the global iomem_resources list, causing a #GP
exception when the list is traversed in
iomem_map_sanity_check().
Fixes: c183619b63 (x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug)
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c2a6bbaf0c ]
The way acpi_find_child_device() works currently is that, if there
are two (or more) devices with the same _ADR value in the same
namespace scope (which is not specifically allowed by the spec and
the OS behavior in that case is not defined), the first one of them
found to be present (with the help of _STA) will be returned.
This covers the majority of cases, but is not sufficient if some of
the devices in question have a _HID (or _CID) returning some valid
ACPI/PNP device IDs (which is disallowed by the spec) and the
ASL writers' expectation appears to be that the OS will match
devices without a valid ACPI/PNP device ID against a given bus
address first.
To cover this special case as well, modify find_child_checks()
to prefer devices without ACPI/PNP device IDs over devices that
have them.
Suggested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e700d2c59 upstream.
nfit_init() calls nfit_mce_register() on module load. When the module
load fails the nfit mce decoder is not unregistered. The module's
memory is freed leaving the decoder chain referencing junk. This will
cause panics as future registrations will reference the free'd memory.
Unregister the nfit mce decoder on module init failure.
[v2]: register and then unregister mce handler to avoid losing mce events
[v3]: also cleanup nfit workqueue
Fixes: 6839a6d96f ("nfit: do an ARS scrub on hitting a latent media error")
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: lszubowi@redhat.com
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c40f956ce upstream.
On Lenovo ThinkPad X1 Carbon - the 5th Generation, enabling an earlier
EC event freezing timing causes acpitz-virtual-0 to report a stuck
48C temparature. And with EC firmware revisioned as 1.14, without
reverting back to old EC event freezing timing, the fan still blows
up after a system resume.
This reverts the culprit change so that the regression can be fixed
without upgrading the EC firmware.
Fixes: d30283057e (ACPI / EC: Enable event freeze mode to improve event handling)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191181#c168
Tested-by: Damjan Georgievski <gdamjan@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 662591461c upstream.
According to bug reports, although the busy polling mode can make
noirq stages execute faster, it causes abnormal fan blowing up after
system resume (see the first link below for a video demonstration)
on Lenovo ThinkPad X1 Carbon - the 5th Generation. The problem can
be fixed by upgrading the EC firmware on that machine.
However, many reporters confirm that the problem can be fixed by
stopping busy polling during suspend/resume and for some of them
upgrading the EC firmware is not an option.
For this reason, drop the noirq stage hooks from the EC driver
to fix the regression.
Fixes: c3a696b6e8 (ACPI / EC: Use busy polling mode when GPE is not enabled)
Link: https://youtu.be/9NQ9x-Jm99Q
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196129
Reported-by: Andreas Lindhe <andreas@lindhe.io>
Tested-by: Gjorgji Jankovski <j.gjorgji@gmail.com>
Tested-by: Damjan Georgievski <gdamjan@gmail.com>
Tested-by: Fernando Chaves <nanochaves@gmail.com>
Tested-by: Tomislav Ivek <tomislav.ivek@gmail.com>
Tested-by: Denis P. <theoriginal.skullburner@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 878d8db039 upstream.
Revert commit 77e9a4aa9d (ACPI / button: Change default behavior to
lid_init_state=open) which changed the kernel's behavior on laptops
that boot with closed lids and expect the lid switch state to be
reported accurately by the kernel.
If you boot or resume your laptop with the lid closed on a docking
station while using an external monitor connected to it, both internal
and external displays will light on, while only the external should.
There is a design choice in gdm to only provide the greeter on the
internal display when lit on, so users only see a gray area on the
external monitor. Also, the cursor will not show up as it's by
default on the internal display too.
To "fix" that, users have to open the laptop once and close it once
again to sync the state of the switch with the hardware state.
Even if the "method" operation mode implementation can be buggy on
some platforms, the "open" choice is worse. It breaks docking
stations basically and there is no way to have a user-space hwdb to
fix that.
On the contrary, it's rather easy in user-space to have a hwdb
with the problematic platforms. Then, libinput (1.7.0+) can fix
the state of the lid switch for us: you need to set the udev
property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 'write_open'.
When libinput detects internal keyboard events, it will overwrite the
state of the switch to open, making it reliable again. Given that
logind only checks the lid switch value after a timeout, we can
assume the user will use the internal keyboard before this timeout
expires.
For example, such a hwdb entry is:
libinput:name:*Lid Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
Link: https://bugzilla.gnome.org/show_bug.cgi?id=782380
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe8c470ab8 upstream.
gcc -O2 cannot always prove that the loop in acpi_power_get_inferred_state()
is enterered at least once, so it assumes that cur_state might not get
initialized:
drivers/acpi/power.c: In function 'acpi_power_get_inferred_state':
drivers/acpi/power.c:222:9: error: 'cur_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This sets the variable to zero at the start of the loop, to ensure that
there is well-defined behavior even for an empty list. This gets rid of
the warning.
The warning first showed up when the -Os flag got removed in a bug fix
patch in linux-4.11-rc5.
I would suggest merging this addon patch on top of that bug fix to avoid
introducing a new warning in the stable kernels.
Fixes: 61b79e16c6 (ACPI: Fix incompatibility with mcount-based function graph tracing)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b03b99a329 upstream.
While reviewing the -stable patch for commit 86ef58a4e3 "nfit,
libnvdimm: fix interleave set cookie calculation" Ben noted:
"This is returning an int, thus it's effectively doing a 32-bit
comparison and not the 64-bit comparison you say is needed."
Update the compare operation to be immune to this integer demotion problem.
Cc: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Fixes: 86ef58a4e3 ("nfit, libnvdimm: fix interleave set cookie calculation")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cbc00c1310 ]
In commit 821d6f0359 (ACPI / sleep: Do not save NVS for new machines to
accelerate S3), to optimize S3 suspend/resume speed, code is introduced
to ignore NVS memory saving during S3 for all the platforms later than
2012.
But, Lenovo G50-45, a platform released in 2015, still needs NVS memory
saving during S3. A quirk is introduced for this platform.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=189431
Tested-by: Przemek <soprwa@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Drop unnecessary code ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 77e9a4aa9d ]
More and more platforms need the button.lid_init_state=open quirk. This
patch sets it the default behavior.
If a platform doesn't send lid open event or lid open event is lost due to
the underlying system problems, then we can compare various combinations:
1. systemd/acpid is used to suspend system or not, systemd has a special
logic forcing open event after resuming;
2. _LID returns a cached value or not.
The result is as follows:
1. lid_init_state=method
1. cached
1. resumed by lid:
(x) event=close
(x) systemd=suspends again
(x) acpid=suspends again
(x) state=close
2. resumed by other:
(o) event=close
(x) systemd=suspends again
(x) acpid=suspends again
(o) state=close
2. non-cached
1. resumed by lid:
(o) event=open
(o) systemd=resumes
(o) acpid=resumes
(o) state=open
2. resumed by other:
(o) event=close
(x) systemd=suspends again
(x) acpid=suspends again
(o) state=close
2. lid_init_state=open
1. cached
1. resumed by lid:
(o) event=open
(o) systemd=resumes
(o) acpid=resumes
(x) state=close
2. resumed by other:
(x) event=open
(o) systemd=resumes
(o) acpid=resumes
(o) state=close
2. non-cached
1. resumed by lid:
(o) event=open
(o) systemd=resumes
(o) acpid=resumes
(o) state=open
2. resumed by other:
(x) event=open
(o) systemd=resumes
(o) acpid=resumes
(o) state=close
3. lid_init_state=ignore
1. cached
1. resumed by lid:
(o) event=none
(x) systemd=suspends again
(o) acpid=resumes
(x) state=close
2. resumed by other:
(o) event=none
(x) systemd=suspends again
(o) acpid=resumes
(o) state=close
2. non-cached
1. resumed by lid:
(o) event=none
(x) systemd=suspends again
(o) acpid=resumes
(o) state=open
2. resumed by other:
(o) event=none
(x) systemd=suspends again
(o) acpid=resumes
(o) state=close
As a conclusion:
1. With systemd changed, lid_init_state=ignore has only one problem and the
problem comes from an underlying issue, not userspace and kernel lid
handling.
2. Without systemd changed, lid_init_state=open can be the default
behavior as the pass ratio is not much worse than lid_init_state=ignore.
3. lid_init_state=method is buggy, we can have a separate patch to make it
deprectated.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=187271
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c4aa1eecb ]
Sometimes, the users may require a quirk to be provided from ACPI subsystem
core to prevent a GPE from flooding.
Normally, if a GPE cannot be dispatched, ACPICA core automatically prevents
the GPE from firing. But there are cases the GPE is dispatched by _Lxx/_Exx
provided via AML table, and OSPM is lacking of the knowledge to get
_Lxx/_Exx correctly executed to handle the GPE, thus the GPE flooding may
still occur.
The existing quirk mechanism can be enabled/disabled using the following
commands to prevent such kind of GPE flooding during runtime:
# echo mask > /sys/firmware/acpi/interrupts/gpe00
# echo unmask > /sys/firmware/acpi/interrupts/gpe00
To avoid GPE flooding during boot, we need a boot stage mechanism.
This patch provides such a boot stage quirk mechanism to stop this kind of
GPE flooding. This patch doesn't fix any feature gap but since the new
feature gaps could be found in the future endlessly, and can disappear if
the feature gaps are filled, providing a boot parameter rather than a DMI
table should suffice.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=53071
Link: https://bugzilla.kernel.org/show_bug.cgi?id=117481
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/887793
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08f63d9774 upstream.
No platform-device is required for IO(x)APICs, so don't even
create them.
[ rjw: This fixes a problem with leaking platform device objects
after IOAPIC/IOxAPIC hot-removal events.]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61b79e16c6 upstream.
Paul Menzel reported a warning:
WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0
Bad frame pointer: expected f6919d98, received f6919db0
from func acpi_pm_device_sleep_wake return to c43b6f9d
The warning means that function graph tracing is broken for the
acpi_pm_device_sleep_wake() function. That's because the ACPI Makefile
unconditionally sets the '-Os' gcc flag to optimize for size. That's an
issue because mcount-based function graph tracing is incompatible with
'-Os' on x86, thanks to the following gcc bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
I have another patch pending which will ensure that mcount-based
function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on
x86.
But this patch is needed in addition to that one because the ACPI
Makefile overrides that config option for no apparent reason. It has
had this flag since the beginning of git history, and there's no related
comment, so I don't know why it's there. As far as I can tell, there's
no reason for it to be there. The appropriate behavior is for it to
honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the
kernel.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 708f5dcc21 ]
The Dell Latitude 3350's ethernet card attempts to use a reserved
IRQ (18), resulting in ACPI being unable to enable the ethernet.
Adding it to acpi_rev_dmi_table[] helps to work around this problem.
Signed-off-by: Michael Pobega <mpobega@neverware.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9523b9bf6d ]
Precision 5520 and 3520 either hang at login and during suspend or reboot.
It turns out that that adding them to acpi_rev_dmi_table[] helps to work
around those issues.
Signed-off-by: Alex Hung <alex.hung@canonical.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86ef58a4e3 upstream.
The interleave-set cookie is a sum that sanity checks the composition of
an interleave set has not changed from when the namespace was initially
created. The checksum is calculated by sorting the DIMMs by their
location in the interleave-set. The comparison for the sort must be
64-bit wide, not byte-by-byte as performed by memcmp() in the broken
case.
Fix the implementation to accept correct cookie values in addition to
the Linux "memcmp" order cookies, but only allow correct cookies to be
generated going forward. It does mean that namespaces created by
third-party-tooling, or created by newer kernels with this fix, will not
validate on older kernels. However, there are a couple mitigating
conditions:
1/ platforms with namespace-label capable NVDIMMs are not widely
available.
2/ interleave-sets with a single-dimm are by definition not affected
(nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.
The cookie stored in the namespace label will be fixed by any write the
namespace label, the most straightforward way to achieve this is to
write to the "alt_name" attribute of a namespace in sysfs.
Fixes: eaf961536e ("libnvdimm, nfit: add interleave-set state-tracking infrastructure")
Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e471486c13 upstream.
We queue an on-stack work item to 'nfit_wq' and wait for it to complete
as part of a 'flush_probe' request. However, if the user cancels the
wait we need to make sure the item is flushed from the queue otherwise
we are leaving an out-of-scope stack address on the work list.
BUG: unable to handle kernel paging request at ffffbcb3c72f7cd0
IP: [<ffffffffa9413a7b>] __list_add+0x1b/0xb0
[..]
RIP: 0010:[<ffffffffa9413a7b>] [<ffffffffa9413a7b>] __list_add+0x1b/0xb0
RSP: 0018:ffffbcb3c7ba7c00 EFLAGS: 00010046
[..]
Call Trace:
[<ffffffffa90bb11a>] insert_work+0x3a/0xc0
[<ffffffffa927fdda>] ? seq_open+0x5a/0xa0
[<ffffffffa90bb30a>] __queue_work+0x16a/0x460
[<ffffffffa90bbb08>] queue_work_on+0x38/0x40
[<ffffffffc0cf2685>] acpi_nfit_flush_probe+0x95/0xc0 [nfit]
[<ffffffffc0cf25d0>] ? nfit_visible+0x40/0x40 [nfit]
[<ffffffffa9571495>] wait_probe_show+0x25/0x60
[<ffffffffa9546b30>] dev_attr_show+0x20/0x50
Fixes: 7ae0fa439f ("nfit, libnvdimm: async region scrub workqueue")
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a545715d2d upstream.
When removing and adding cpu 0 on a system with GHES NMI the following stack
trace is seen when re-adding the cpu:
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2
Call Trace:
dump_stack+0x63/0x8e
__warn+0xd1/0xf0
warn_slowpath_null+0x1d/0x20
setup_local_APIC+0x275/0x370
apic_ap_setup+0xe/0x20
start_secondary+0x48/0x180
set_init_arg+0x55/0x55
early_idt_handler_array+0x120/0x120
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x13d/0x14c
During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the
ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
(0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also
0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
that something has set the IRQ_WORK_VECTOR line prior to the APIC being
initialized.
Commit 2383844d48 ("GHES: Elliminate double-loop in the NMI handler")
incorrectly modified the behavior such that the handler returns
NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
work queue for every NMI.
This patch modifies the ghes_proc_irq_work() to run as it did prior to
2383844d48 ("GHES: Elliminate double-loop in the NMI handler") by
properly returning NMI_HANDLED and only calling the work queue if
NMI_HANDLED has been set.
Fixes: 2383844d48 (GHES: Elliminate double-loop in the NMI handler)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6276e53fa8 upstream.
The HP Pavilion dv6 has a non-working acpi_video0 backlight interface
and an intel_backlight interface which works fine. Add a force_native
quirk for it so that the non-working acpi_video0 interface does not get
registered.
Note that there are quite a few HP Pavilion dv6 variants, some
woth ATI and some with NVIDIA hybrid gfx, both seem to need this
quirk to have working backlight control. There are also some versions
with only Intel integrated gfx, these may not need this quirk, but it
should not hurt there.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1204476
Link: https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1416940
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 350fa038c3 upstream.
The Dell XPS 17 L702X has a non-working acpi_video0 backlight interface
and an intel_backlight interface which works fine. Add a force_native
quirk for it so that the non-working acpi_video0 interface does not get
registered.
Note that there also is an issue with the brightnesskeys on this laptop,
they do not generate key-press events in anyway. That is not solved by
this patch.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1123661
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent flurry of bug discoveries in the nfit driver's DSM marshalling
routine has highlighted the fact that we do not have unit test coverage
for this routine. Add a self-test of acpi_nfit_ctl() routine before
probing the "nfit_test.0" device. This mocks stimulus to acpi_nfit_ctl()
and if any of the tests fail "nfit_test.0" will be unavailable causing
the rest of the tests to not run / fail.
This unit test will also be a place to land reproductions of quirky BIOS
behavior discovered in the field and ensure the kernel does not regress
against implementations it has seen in practice.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Given dimms and bus commands share the same command number space we need
to be careful that we are translating status in the correct context.
Otherwise we can, for example, fail an ND_CMD_GET_CONFIG_SIZE command
because max_xfer is zero. It fails because that condition erroneously
correlates with the 'cleared == 0' failure of ND_CMD_CLEAR_ERROR.
Cc: <stable@vger.kernel.org>
Fixes: aef2533822 ("libnvdimm, nfit: centralize command status translation")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If an ARS Status command returns truncated output, do not process
partial records or otherwise consume non-status fields.
Cc: <stable@vger.kernel.org>
Fixes: 0caeef63e6 ("libnvdimm: Add a poison list and export badblocks")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Given ambiguities in the ACPI 6.1 definition of the "Output (Size)"
field of the ARS (Address Range Scrub) Status command, a firmware
implementation may in practice return 0, 4, or 8 to indicate that there
is no output payload to process.
The specification states "Size of Output Buffer in bytes, including this
field.". However, 'Output Buffer' is also the name of the entire
payload, and earlier in the specification it states "Max Query ARS
Status Output Buffer Size: Maximum size of buffer (including the Status
and Extended Status fields)".
Without this fix if the BIOS happens to return 0 it causes memory
corruption as evidenced by this result from the acpi_nfit_ctl() unit
test.
ars_status00000000: 00020000 00000000 ........
BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff)
kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC
task: ffff8803332d2ec0 task.stack: ffffc9000174c000
RIP: 0010:[<ffffffff814cfe72>] [<ffffffff814cfe72>] __memcpy+0x12/0x20
RSP: 0018:ffffc9000174f9a8 EFLAGS: 00010246
RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56
RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000
RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8
R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0
FS: 00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0
Stack:
ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac
0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000
0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246
Call Trace:
[<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit]
[<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test]
Cc: <stable@vger.kernel.org>
Fixes: 747ffe11b4 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ACPI DSMs can have an 'extended' status which can be non-zero to convey
additional information about the command. In the xlat_status routine,
where we translate the command statuses, we were returning an error for
a non-zero extended status, even if the primary status indicated success.
Return from each command's 'case' once we have verified both its status
and extend status are good.
Cc: <stable@vger.kernel.org>
Fixes: 11294d63ac ("nfit: fail DSMs that return non-zero status by default")
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We have a couple of drivers, acpi_apd.c and acpi_lpss.c,
that need to pass extra build-in properties to the devices
they create. Previously the drivers added those properties
to the struct device which is member of the struct
acpi_device, but that does not work. Those properties need
to be assigned to the struct device of the platform device
instead in order for them to become available to the
drivers.
To fix this, this patch changes acpi_create_platform_device
function to take struct property_entry pointer as parameter.
Fixes: 20a875e2e8 (serial: 8250_dw: Add quirk for APM X-Gene SoC)
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Tested-by: Jérôme de Bretagne <jerome.debretagne@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* acpica-fixes:
ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()
ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()
ACPICA: Dispatcher: Fix order issue of method termination
* acpi-pci-fixes:
ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs
ACPI/PCI: pci_link: penalize SCI correctly
ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages
* acpi-apei-fixes:
ACPI / APEI: Fix incorrect return value of ghes_proc()
In the code path of acpi_ev_initialize_region(), there is namespace
modification code unlocked. This patch tunes the code to make sure
such modification are always locked.
Fixes: 74f51b80a0 (ACPICA: Namespace: Fix dynamic table loading issues)
Tested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is a lock unbalanced exit path in acpi_ds_initialize_method(),
this patch corrects it.
Fixes: 441ad11d07 (ACPICA: Dispatcher: Fix a mutex issue for method auto serialization)
Tested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The last step of the method termination should be the end of the method
serialization. Otherwise, the steps happening after it will face the race
issues that cannot be protected by the method serialization mechanism.
This patch fixes this issue by moving the per-method-object deletion code
prior than the end of the method serialization. Otherwise, the possible
race issues may result in AE_ALREADY_EXISTS error in a parallel
environment.
Fixes: 74f51b80a0 (ACPICA: Namespace: Fix dynamic table loading issues)
Reported-and-tested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 103544d869 ("ACPI,PCI,IRQ: reduce resource requirements")
replaced the addition of PIRQ_PENALTY_PCI_USING in acpi_pci_link_allocate()
with an addition in acpi_irq_pci_sharing_penalty(), but f7eca374f0
("ACPI,PCI,IRQ: separate ISA penalty calculation") removed the use
of acpi_irq_pci_sharing_penalty() for ISA IRQs.
Therefore, PIRQ_PENALTY_PCI_USING is missing from ISA IRQs used by
interrupt links. Include that penalty by adding it in the
acpi_pci_link_allocate() path.
Fixes: f7eca374f0 (ACPI,PCI,IRQ: separate ISA penalty calculation)
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ondrej reported that IRQs stopped working in v4.7 on several
platforms. A typical scenario, from Ondrej's VT82C694X/694X, is:
ACPI: Using PIC for interrupt routing
ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 *11 12 14 15)
ACPI: No IRQ available for PCI Interrupt Link [LNKA]
8139too 0000:00:0f.0: PCI INT A: no GSI
We're using PIC routing, so acpi_irq_balance == 0, and LNKA is already
active at IRQ 11. In that case, acpi_pci_link_allocate() only tries
to use the active IRQ (IRQ 11) which also happens to be the SCI.
We should penalize the SCI by PIRQ_PENALTY_PCI_USING, but
irq_get_trigger_type(11) returns something other than
IRQ_TYPE_LEVEL_LOW, so we penalize it by PIRQ_PENALTY_ISA_ALWAYS
instead, which makes acpi_pci_link_allocate() assume the IRQ isn't
available and give up.
Add acpi_penalize_sci_irq() so platforms can tell us the SCI IRQ,
trigger, and polarity directly and we don't have to depend on
irq_get_trigger_type().
Fixes: 103544d869 (ACPI,PCI,IRQ: reduce resource requirements)
Link: http://lkml.kernel.org/r/201609251512.05657.linux@rainbow-software.org
Reported-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We do not want to store the SCI penalty in the acpi_isa_irq_penalty[]
table because acpi_isa_irq_penalty[] only holds ISA IRQ penalties and
there's no guarantee that the SCI is an ISA IRQ. We add in the SCI
penalty as a special case in acpi_irq_get_penalty().
But if we called acpi_penalize_isa_irq() or acpi_irq_penalty_update()
for an SCI that happened to be an ISA IRQ, they stored the SCI
penalty (part of the acpi_irq_get_penalty() return value) in
acpi_isa_irq_penalty[]. Subsequent calls to acpi_irq_get_penalty()
returned a penalty that included *two* SCI penalties.
Fixes: 103544d869 (ACPI,PCI,IRQ: reduce resource requirements)
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull more ACPI updates from Rafael Wysocki:
"This includes a couple of fixes needed after recent changes, two ACPI
driver fixes (fan and "Processor Aggregator"), an update of the ACPI
device properties handling code and a new MAINTAINERS entry for ACPI
on ARM64.
Specifics:
- Fix an unused function warning that started to appear after recent
changes in the ACPI EC driver (Eric Biggers).
- Fix the KERN_CONT usage in acpi_os_vprintf() that has become
(particularly) annoying recently (Joe Perches).
- Fix the fan status checking in the ACPI fan driver to avoid
returning incorrect error codes sometimes (Srinivas Pandruvada).
- Fix the ACPI Processor Aggregator driver (PAD) to always let the
special processor_aggregator driver from Xen take over when running
as Xen dom0 (Juergen Gross).
- Update the handling of reference device properties in ACPI by
allowing empty rows ("holes") to appear in reference property lists
(Mika Westerberg).
- Add a new MAINTAINERS entry for ACPI on ARM64 (Lorenzo Pieralisi)"
* tag 'acpi-extra-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
acpi_os_vprintf: Use printk_get_level() to avoid unnecessary KERN_CONT
ACPI / PAD: don't register acpi_pad driver if running as Xen dom0
ACPI / property: Allow holes in reference properties
MAINTAINERS: Add ARM64-specific ACPI maintainers entry
ACPI / EC: Fix unused function warning when CONFIG_PM_SLEEP=n
ACPI / fan: Fix error reading cur_state