commit 8e43a905dd upstream.
Bootup with lockdep enabled has been broken on v7 since b46c0f7465
("ARM: 7321/1: cache-v7: Disable preemption when reading CCSIDR").
This is because v7_setup (which is called very early during boot) calls
v7_flush_dcache_all, and the save_and_disable_irqs added by that patch
ends up attempting to call into lockdep C code (trace_hardirqs_off())
when we are in no position to execute it (no stack, MMU off).
Fix this by using a notrace variant of save_and_disable_irqs. The code
already uses the notrace variant of restore_irqs.
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b46c0f7465 upstream.
armv7's flush_cache_all() flushes caches via set/way. To
determine the cache attributes (line size, number of sets,
etc.) the assembly first writes the CSSELR register to select a
cache level and then reads the CCSIDR register. The CSSELR register
is banked per-cpu and is used to determine which cache level CCSIDR
reads. If the task is migrated between when the CSSELR is written and
the CCSIDR is read the CCSIDR value may be for an unexpected cache
level (for example L1 instead of L2) and incorrect cache flushing
could occur.
Disable interrupts across the write and read so that the correct
cache attributes are read and used for the cache flushing
routine. We disable interrupts instead of disabling preemption
because the critical section is only 3 instructions and we want
to call v7_dcache_flush_all from __v7_setup which doesn't have a
full kernel stack with a struct thread_info.
This fixes a problem we see in scm_call() when flush_cache_all()
is called from preemptible context and sometimes the L2 cache is
not properly flushed out.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9f9a03150 upstream.
To ensure that we don't just reuse the bad delegation when we attempt to
recover the nfs4_state that received the bad stateid error.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d6144de8b upstream.
If the read or write buffer size associated with the command sent
through the mmc_blk_ioctl is zero, do not prepare data buffer.
This enables a ioctl(2) call to for instance send a MMC_SWITCH to set
a byte in the ext_csd.
Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Note that since the patch isn't applicable (and unnecessary) to
3.3-rc, there is no corresponding upstream fix.]
The cx5051 parser calls snd_hda_input_jack_add() in the init callback
to create and initialize the jack detection instances. Since the init
callback is called at each time when the device gets woken up after
suspend or power-saving mode, the duplicated instances are accumulated
at each call. This ends up with the kernel warnings with the too
large array size.
The fix is simply to move the calls of snd_hda_input_jack_add() into
the parser section instead of the init callback.
The fix is needed only up to 3.2 kernel, since the HD-audio jack layer
was redesigned in the 3.3 kernel.
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 545d680938 upstream.
After passing through a ->setxattr() call, eCryptfs needs to copy the
inode attributes from the lower inode to the eCryptfs inode, as they
may have changed in the lower filesystem's ->setxattr() path.
One example is if an extended attribute containing a POSIX Access
Control List is being set. The new ACL may cause the lower filesystem to
modify the mode of the lower inode and the eCryptfs inode would need to
be updated to reflect the new mode.
https://launchpad.net/bugs/926292
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sebastien Bacher <seb128@ubuntu.com>
Cc: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b57e6b560f upstream.
read_lock(&tpt_trig->trig.leddev_list_lock) is accessed via the path
ieee80211_open (->) ieee80211_do_open (->) ieee80211_mod_tpt_led_trig
(->) ieee80211_start_tpt_led_trig (->) tpt_trig_timer before initializing
it.
the intilization of this read/write lock happens via the path
ieee80211_led_init (->) led_trigger_register, but we are doing
'ieee80211_led_init' after 'ieeee80211_if_add' where we
register netdev_ops.
so we access leddev_list_lock before initializing it and causes the
following bug in chrome laptops with AR928X cards with the following
script
while true
do
sudo modprobe -v ath9k
sleep 3
sudo modprobe -r ath9k
sleep 3
done
BUG: rwlock bad magic on CPU#1, wpa_supplicant/358, f5b9eccc
Pid: 358, comm: wpa_supplicant Not tainted 3.0.13 #1
Call Trace:
[<8137b9df>] rwlock_bug+0x3d/0x47
[<81179830>] do_raw_read_lock+0x19/0x29
[<8137f063>] _raw_read_lock+0xd/0xf
[<f9081957>] tpt_trig_timer+0xc3/0x145 [mac80211]
[<f9081f3a>] ieee80211_mod_tpt_led_trig+0x152/0x174 [mac80211]
[<f9076a3f>] ieee80211_do_open+0x11e/0x42e [mac80211]
[<f9075390>] ? ieee80211_check_concurrent_iface+0x26/0x13c [mac80211]
[<f9076d97>] ieee80211_open+0x48/0x4c [mac80211]
[<812dbed8>] __dev_open+0x82/0xab
[<812dc0c9>] __dev_change_flags+0x9c/0x113
[<812dc1ae>] dev_change_flags+0x18/0x44
[<8132144f>] devinet_ioctl+0x243/0x51a
[<81321ba9>] inet_ioctl+0x93/0xac
[<812cc951>] sock_ioctl+0x1c6/0x1ea
[<812cc78b>] ? might_fault+0x20/0x20
[<810b1ebb>] do_vfs_ioctl+0x46e/0x4a2
[<810a6ebb>] ? fget_light+0x2f/0x70
[<812ce549>] ? sys_recvmsg+0x3e/0x48
[<810b1f35>] sys_ioctl+0x46/0x69
[<8137fa77>] sysenter_do_call+0x12/0x2
Cc: Gary Morain <gmorain@google.com>
Cc: Paul Stewart <pstew@google.com>
Cc: Abhijit Pradhan <abhijit@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 71f6bd4a23 upstream.
Fixes PCI device detection on IBM xSeries IBM 3850 M2 / x3950 M2
when using ACPI resources (_CRS).
This is default, a manual workaround (without this patch)
would be pci=nocrs boot param.
V2: Add dev_warn if the workaround is hit. This should reveal
how common such setups are (via google) and point to possible
problems if things are still not working as expected.
-> Suggested by Jan Beulich.
Tested-by: garyhade@us.ibm.com
Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a45a9407c upstream.
perf on POWER stopped working after commit e050e3f0a7 (perf: Fix
broken interrupt rate throttling). That patch exposed a bug in
the POWER perf_events code.
Since the PMCs count upwards and take an exception when the top bit
is set, we want to write 0x80000000 - left in power_pmu_start. We were
instead programming in left which effectively disables the counter
until we eventually hit 0x80000000. This could take seconds or longer.
With the patch applied I get the expected number of samples:
SAMPLE events: 9948
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 363434b5dc upstream.
An error while creating sysfs attribute files in the driver's probe function
results in an error abort, but already created files are not removed. This patch
fixes the problem.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Dirk Eibach <eibach@gdsys.de>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f2da1ac0b upstream.
Initialize PPR register for both channels, and set correct PPR register bits.
Also remove unnecessary variable initializations.
Signed-off-by: Chris D Schimp <silverchris@gmail.com>
[guenter.roeck@ericsson.com: Merged two patches into one]
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b63d97a36e upstream.
RPM calculation from tachometer value does not depend on PPR.
Also, do not report negative RPM values.
Signed-off-by: Chris D Schimp <silverchris@gmail.com>
[guenter.roeck@ericsson.com: do not report negative RPM values]
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2ea0f5f04 upstream.
Use standard ror64() instead of hand-written.
There is no standard ror64, so create it.
The difference is shift value being "unsigned int" instead of uint64_t
(for which there is no reason). gcc starts to emit native ROR instructions
which it doesn't do for some reason currently. This should make the code
faster.
Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73736e0387 upstream.
Zhihua Che reported a possible memleak in slub allocator on
CONFIG_PREEMPT=y builds.
It is possible current thread migrates right before disabling irqs in
__slab_alloc(). We must check again c->freelist, and perform a normal
allocation instead of scratching c->freelist.
Many thanks to Zhihua Che for spotting this bug, introduced in 2.6.39
V2: Its also possible an IRQ freed one (or several) object(s) and
populated c->freelist, so its not a CONFIG_PREEMPT only problem.
Reported-by: Zhihua Che <zhihua.che@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a92d687c8 upstream.
Unfortunately in reducing W from 80 to 16 we ended up unrolling
the loop twice. As gcc has issues dealing with 64-bit ops on
i386 this means that we end up using even more stack space (>1K).
This patch solves the W reduction by moving LOAD_OP/BLEND_OP
into the loop itself, thus avoiding the need to duplicate it.
While the stack space still isn't great (>0.5K) it is at least
in the same ball park as the amount of stack used for our C sha1
implementation.
Note that this patch basically reverts to the original code so
the diff looks bigger than it really is.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58d7d18b52 upstream.
The previous patch used the modulus operator over a power of 2
unnecessarily which may produce suboptimal binary code. This
patch changes changes them to binary ands instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09e87e5c4f upstream.
In order to enable temperature mode aka automatic mode for the F75373 and
F75375 chips, the two FANx_MODE bits in the fan configuration register
need be set to 01, not 10.
Signed-off-by: Nikolaus Schulz <mail@microschulz.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 977b7e3a52 upstream.
When a SD card is hot removed without umount, del_gendisk() will call
bdi_unregister() without destroying/freeing it. This leaves the bdi in
the bdi->dev = NULL, bdi->wb.task = NULL, bdi->bdi_list removed state.
When sync(2) gets the bdi before bdi_unregister() and calls
bdi_queue_work() after the unregister, trace_writeback_queue will be
dereferencing the NULL bdi->dev. Fix it with a simple test for NULL.
LKML-reference: http://lkml.org/lkml/2012/1/18/346
Reported-by: Rabin Vincent <rabin@rab.in>
Tested-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07ae2dfcf4 upstream.
The current code checks for stored_mpdu_num > 1, causing
the reorder_timer to be triggered indefinitely, but the
frame is never timed-out (until the next packet is received)
Signed-off-by: Eliad Peller <eliad@wizery.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3310225dfc upstream.
PROP_MAX_SHIFT should be set to <=32 on 64-bit box. This fixes two bugs
in the below lines of bdi_dirty_limit():
bdi_dirty *= numerator;
do_div(bdi_dirty, denominator);
1) divide error: do_div() only uses the lower 32 bit of the denominator,
which may trimmed to be 0 when PROP_MAX_SHIFT > 32.
2) overflow: (bdi_dirty * numerator) could easily overflow if numerator
used up to 48 bits, leaving only 16 bits to bdi_dirty
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Ilya Tumaykin <librarian_rus@yahoo.com>
Tested-by: Ilya Tumaykin <librarian_rus@yahoo.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4a03fc7ef upstream.
This patch fixes an issue where perf report shows nan% for certain
perf.data files. The below is from a report for a do_fork probe:
-nan% sshd [kernel.kallsyms] [k] do_fork
-nan% packagekitd [kernel.kallsyms] [k] do_fork
-nan% dbus-daemon [kernel.kallsyms] [k] do_fork
-nan% bash [kernel.kallsyms] [k] do_fork
A git bisect shows commit f3bda2c as the cause. However, looking back
through the git history, I saw commit 640c03c which seems to have
removed the required initialization for perf_sample->period. The problem
only started showing after commit f3bda2c. The below patch re-introduces
the initialization and it fixes the problem for me.
With the below patch, for the same perf.data:
73.08% bash [kernel.kallsyms] [k] do_fork
8.97% 11-dhclient [kernel.kallsyms] [k] do_fork
6.41% sshd [kernel.kallsyms] [k] do_fork
3.85% 20-chrony [kernel.kallsyms] [k] do_fork
2.56% sendmail [kernel.kallsyms] [k] do_fork
This patch applies over current linux-tip commit 9949284.
Problem introduced in:
$ git describe 640c03c
v2.6.37-rc3-83-g640c03c
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120203170113.5190.25558.stgit@localhost6.localdomain6
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d3aaeb38c4, along
with dependent backports of commits:
69cce1d1409de79c127c218fa90f07580da35a31f7e57044eee049f28883 ]
Gergely Kalman reported crashes in check_peer_redir().
It appears commit f39925dbde (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.
Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.
Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.
As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)
Many thanks to Gergely for providing a pretty report pointing to the
bug.
Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8eb28480e upstream.
The driver uses the pstate number from the status register as index in
its table of ACPI pstates (powernow_table). This is wrong as this is
not a 1-to-1 mapping.
For example we can have _PSS information to just utilize Pstate 0 and
Pstate 4, ie.
powernow-k8: Core Performance Boosting: on.
powernow-k8: 0 : pstate 0 (2200 MHz)
powernow-k8: 1 : pstate 4 (1400 MHz)
In this example the driver's powernow_table has just 2 entries. Using
the pstate number (4) as index into this table is just plain wrong.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 201bf0f129 upstream.
Due to CPB we can't directly map SW Pstates to Pstate MSRs. Get rid of
the paranoia check. (assuming that the ACPI Pstate information is
correct.)
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1608ea5f4b upstream.
As ZTE have and will use more pid for new products this year,
so we need to add some new zte 3g-dongle's pid on option.c ,
and delete one pid 0x0154 because it use for mass-storage port.
Signed-off-by: Rui li <li.rui27@zte.com.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4436a7c17 upstream.
The Netlogic XLP SoC's on-chip USB controller appears as a PCI
USB device, but does not need the EHCI/OHCI handoff done in
usb/host/pci-quirks.c.
The pci-quirks.c is enabled for all vendors and devices, and is
enabled if USB and PCI are configured.
If we do not skip the qurik handling on XLP, the readb() call in
ehci_bios_handoff() will cause a crash since byte access is not
supported for EHCI registers in XLP.
Signed-off-by: Jayachandran C <jayachandranc@netlogicmicro.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 683da59d7b upstream.
ab943a2e12 (USB: gadget: gadget zero uses new suspend/resume hooks)
introduced a copy-paste error where f_loopback.c writes to a variable
declared in f_sourcesink.c. This prevents one from creating gadgets
that only have a loopback function.
Signed-off-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 635032cb39 upstream.
Programming an image was broken, because odev->buf_offs was not advanced
for val == 0 in append_values(). This regression was introduced in:
commit 1ff12a4aa3
Author: Kevin A. Granade <kevin.granade@gmail.com>
Date: Sat Sep 5 01:03:39 2009 -0500
Staging: asus_oled: Cleaned up checkpatch issues.
Fix the image processing by special-casing val == 0.
I have tested this change on an Asus G50V laptop only.
Cc: Jakub Schmidtke <sjakub@gmail.com>
Cc: Kevin A. Granade <kevin.granade@gmail.com>
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fbc890987 upstream.
According to SPC-4, the sense key for commands that are failed with
INVALID FIELD IN PARAMETER LIST and INVALID FIELD IN CDB should be
ILLEGAL REQUEST (5h) rather than ABORTED COMMAND (Bh). Without this
patch, a tcm_loop LUN incorrectly gives:
# sg_raw -r 1 -v /dev/sda 3 1 0 0 ff 0
Sense Information:
Fixed format, current; Sense key: Aborted Command
Additional sense: Invalid field in cdb
Raw sense data (in hex):
70 00 0b 00 00 00 00 0a 00 00 00 00 24 00 00 00
00 00
While a real SCSI disk gives:
Sense Information:
Fixed format, current; Sense key: Illegal Request
Additional sense: Invalid field in cdb
Raw sense data (in hex):
70 00 05 00 00 00 00 18 00 00 00 00 24 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
with the main point being that the real disk gives a sense key of
ILLEGAL REQUEST (5h).
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6816966a84 upstream.
Initiators that aren't the active reservation holder should be able to
do a PERSISTENT RESERVE IN command in all cases, so add it to the list
of allowed CDBs in core_scsi3_pr_seq_non_holder().
Signed-off-by: Marco Sanvido <marco@purestorage.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e08e34e37 upstream.
The comments quote the right parts of the spec:
* d) Establish a unit attention condition for the
* initiator port associated with every I_T nexus
* that lost its registration other than the I_T
* nexus on which the PERSISTENT RESERVE OUT command
* was received, with the additional sense code set
* to REGISTRATIONS PREEMPTED.
and
* e) Establish a unit attention condition for the initiator
* port associated with every I_T nexus that lost its
* persistent reservation and/or registration, with the
* additional sense code set to REGISTRATIONS PREEMPTED;
but the actual code accidentally uses ASCQ_2AH_RESERVATIONS_PREEMPTED
instead of ASCQ_2AH_REGISTRATIONS_PREEMPTED. Fix this.
Signed-off-by: Marco Sanvido <marco@purestorage.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9980cdcf2 upstream.
Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always false,
and so triggers some BUGs in Transparent HugePage codepaths.
asm-generic/bug.h mentions this problem, and provides a WARN_ON_SMP(x);
but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP, WARN_ON_SMP_ONCE,
VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing VM_BUG_ONs.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc9086004b upstream.
When isolating pages for migration, migration starts at the start of a
zone while the free scanner starts at the end of the zone. Migration
avoids entering a new zone by never going beyond the free scanned.
Unfortunately, in very rare cases nodes can overlap. When this happens,
migration isolates pages without the LRU lock held, corrupting lists
which will trigger errors in reclaim or during page free such as in the
following oops
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810f795c>] free_pcppages_bulk+0xcc/0x450
PGD 1dda554067 PUD 1e1cb58067 PMD 0
Oops: 0000 [#1] SMP
CPU 37
Pid: 17088, comm: memcg_process_s Tainted: G X
RIP: free_pcppages_bulk+0xcc/0x450
Process memcg_process_s (pid: 17088, threadinfo ffff881c2926e000, task ffff881c2926c0c0)
Call Trace:
free_hot_cold_page+0x17e/0x1f0
__pagevec_free+0x90/0xb0
release_pages+0x22a/0x260
pagevec_lru_move_fn+0xf3/0x110
putback_lru_page+0x66/0xe0
unmap_and_move+0x156/0x180
migrate_pages+0x9e/0x1b0
compact_zone+0x1f3/0x2f0
compact_zone_order+0xa2/0xe0
try_to_compact_pages+0xdf/0x110
__alloc_pages_direct_compact+0xee/0x1c0
__alloc_pages_slowpath+0x370/0x830
__alloc_pages_nodemask+0x1b1/0x1c0
alloc_pages_vma+0x9b/0x160
do_huge_pmd_anonymous_page+0x160/0x270
do_page_fault+0x207/0x4c0
page_fault+0x25/0x30
The "X" in the taint flag means that external modules were loaded but but
is unrelated to the bug triggering. The real problem was because the PFN
layout looks like this
Zone PFN ranges:
DMA 0x00000010 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x01e80000
Movable zone start PFN for each node
early_node_map[14] active PFN ranges
0: 0x00000010 -> 0x0000009b
0: 0x00000100 -> 0x0007a1ec
0: 0x0007a354 -> 0x0007a379
0: 0x0007f7ff -> 0x0007f800
0: 0x00100000 -> 0x00680000
1: 0x00680000 -> 0x00e80000
0: 0x00e80000 -> 0x01080000
1: 0x01080000 -> 0x01280000
0: 0x01280000 -> 0x01480000
1: 0x01480000 -> 0x01680000
0: 0x01680000 -> 0x01880000
1: 0x01880000 -> 0x01a80000
0: 0x01a80000 -> 0x01c80000
1: 0x01c80000 -> 0x01e80000
The fix is straight-forward. isolate_migratepages() has to make a
similar check to isolate_freepage to ensure that it never isolates pages
from a zone it does not hold the LRU lock for.
This was discovered in a 3.0-based kernel but it affects 3.1.x, 3.2.x
and current mainline.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>