Commit Graph

442837 Commits

Author SHA1 Message Date
Tom Gundersen
c03bb540d7 net: tunnels - enable module autoloading
[ Upstream commit f98f89a010 ]

Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.

This is in line with how most other netdev kinds work, and will allow userspace
to create tunnels without having CAP_SYS_MODULE.

Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26 15:17:33 -04:00
Sven Wegener
132345b29a ipv6: Fix regression caused by efe4208 in udp_v6_mcast_next()
[ Upstream commit 3bfdc59a6c ]

Commit efe4208 ("ipv6: make lookups simpler and faster") introduced a
regression in udp_v6_mcast_next(), resulting in multicast packets not
reaching the destination sockets under certain conditions.

The packet's IPv6 addresses are wrongly compared to the IPv6 addresses
from the function's socket argument, which indicates the starting point
for looping, instead of the loop variable. If the addresses from the
first socket do not match the packet's addresses, no socket in the list
will match.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26 15:17:33 -04:00
Mimi Zohar
05d659186c evm: prohibit userspace writing 'security.evm' HMAC value
commit 2fb1c9a4f2 upstream.

Calculating the 'security.evm' HMAC value requires access to the
EVM encrypted key.  Only the kernel should have access to it.  This
patch prevents userspace tools(eg. setfattr, cp --preserve=xattr)
from setting/modifying the 'security.evm' HMAC value directly.

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26 15:17:33 -04:00
Dmitry Kasatkin
51678ceba9 ima: introduce ima_kernel_read()
commit 0430e49b6e upstream.

Commit 8aac62706 "move exit_task_namespaces() outside of exit_notify"
introduced the kernel opps since the kernel v3.10, which happens when
Apparmor and IMA-appraisal are enabled at the same time.

----------------------------------------------------------------------
[  106.750167] BUG: unable to handle kernel NULL pointer dereference at
0000000000000018
[  106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30
[  106.750241] PGD 0
[  106.750254] Oops: 0000 [#1] SMP
[  106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm
bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp
kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul
ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel
snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw
snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp
parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci
pps_core
[  106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15
[  106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08
09/19/2012
[  106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti:
ffff880400fca000
[  106.750704] RIP: 0010:[<ffffffff811ec7da>]  [<ffffffff811ec7da>]
our_mnt+0x1a/0x30
[  106.750725] RSP: 0018:ffff880400fcba60  EFLAGS: 00010286
[  106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX:
ffff8800d51523e7
[  106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI:
ffff880402d20020
[  106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09:
0000000000000001
[  106.750817] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff8800d5152300
[  106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15:
ffff8800d51523e7
[  106.750871] FS:  0000000000000000(0000) GS:ffff88040d200000(0000)
knlGS:0000000000000000
[  106.750910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4:
00000000001407e0
[  106.750962] Stack:
[  106.750981]  ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18
0000000000000000
[  106.751037]  ffff8800de804920 ffffffff8101b9b9 0001800000000000
0000000000000100
[  106.751093]  0000010000000000 0000000000000002 000000000000000e
ffff8803eb8df500
[  106.751149] Call Trace:
[  106.751172]  [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430
[  106.751199]  [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10
[  106.751225]  [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170
[  106.751250]  [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80
[  106.751276]  [<ffffffff8134aa73>] aa_file_perm+0x33/0x40
[  106.751301]  [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0
[  106.751327]  [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20
[  106.751355]  [<ffffffff8130c853>] security_file_permission+0x23/0xa0
[  106.751382]  [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0
[  106.751407]  [<ffffffff811c789d>] vfs_read+0x6d/0x170
[  106.751432]  [<ffffffff811cda31>] kernel_read+0x41/0x60
[  106.751457]  [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280
[  106.751483]  [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280
[  106.751509]  [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160
[  106.751536]  [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10
[  106.751562]  [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0
[  106.751587]  [<ffffffff81352824>] ima_update_xattr+0x34/0x60
[  106.751612]  [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0
[  106.751637]  [<ffffffff811c9635>] __fput+0xd5/0x300
[  106.751662]  [<ffffffff811c98ae>] ____fput+0xe/0x10
[  106.751687]  [<ffffffff81086774>] task_work_run+0xc4/0xe0
[  106.751712]  [<ffffffff81066fad>] do_exit+0x2bd/0xa90
[  106.751738]  [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b
[  106.751763]  [<ffffffff8106780c>] do_group_exit+0x4c/0xc0
[  106.751788]  [<ffffffff81067894>] SyS_exit_group+0x14/0x20
[  106.751814]  [<ffffffff8174522d>] system_call_fastpath+0x1a/0x1f
[  106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3
0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89
e5 5d <48> 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00
[  106.752185] RIP  [<ffffffff811ec7da>] our_mnt+0x1a/0x30
[  106.752214]  RSP <ffff880400fcba60>
[  106.752236] CR2: 0000000000000018
[  106.752258] ---[ end trace 3c520748b4732721 ]---
----------------------------------------------------------------------

The reason for the oops is that IMA-appraisal uses "kernel_read()" when
file is closed. kernel_read() honors LSM security hook which calls
Apparmor handler, which uses current->nsproxy->mnt_ns. The 'guilty'
commit changed the order of cleanup code so that nsproxy->mnt_ns was
not already available for Apparmor.

Discussion about the issue with Al Viro and Eric W. Biederman suggested
that kernel_read() is too high-level for IMA. Another issue, except
security checking, that was identified is mandatory locking. kernel_read
honors it as well and it might prevent IMA from calculating necessary hash.
It was suggested to use simplified version of the function without security
and locking checks.

This patch introduces special version ima_kernel_read(), which skips security
and mandatory locking checking. It prevents the kernel oops to happen.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26 15:17:33 -04:00
Mimi Zohar
47873220e7 ima: audit log files opened with O_DIRECT flag
commit f9b2a735bd upstream.

Files are measured or appraised based on the IMA policy.  When a
file, in policy, is opened with the O_DIRECT flag, a deadlock
occurs.

The first attempt at resolving this lockdep temporarily removed the
O_DIRECT flag and restored it, after calculating the hash.  The
second attempt introduced the O_DIRECT_HAVELOCK flag. Based on this
flag, do_blockdev_direct_IO() would skip taking the i_mutex a second
time.  The third attempt, by Dmitry Kasatkin, resolves the i_mutex
locking issue, by re-introducing the IMA mutex, but uncovered
another problem.  Reading a file with O_DIRECT flag set, writes
directly to userspace pages.  A second patch allocates a user-space
like memory.  This works for all IMA hooks, except ima_file_free(),
which is called on __fput() to recalculate the file hash.

Until this last issue is addressed, do not 'collect' the
measurement for measuring, appraising, or auditing files opened
with the O_DIRECT flag set.  Based on policy, permit or deny file
access.  This patch defines a new IMA policy rule option named
'permit_directio'.  Policy rules could be defined, based on LSM
or other criteria, to permit specific applications to open files
with the O_DIRECT flag set.

Changelog v1:
- permit or deny file access based IMA policy rules

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26 15:17:33 -04:00
Nicholas Bellinger
50db1fe232 iscsi-target: Reject mutual authentication with reflected CHAP_C
commit 1d2b60a554 upstream.

This patch adds an explicit check in chap_server_compute_md5() to ensure
the CHAP_C value received from the initiator during mutual authentication
does not match the original CHAP_C provided by the target.

This is in line with RFC-3720, section 8.2.1:

   Originators MUST NOT reuse the CHAP challenge sent by the Responder
   for the other direction of a bidirectional authentication.
   Responders MUST check for this condition and close the iSCSI TCP
   connection if it occurs.

Reported-by: Tejas Vaykole <tejas.vaykole@calsoftinc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26 15:17:32 -04:00
Nicholas Bellinger
af653349f7 target: Fix NULL pointer dereference for XCOPY in target_put_sess_cmd
commit 0ed6e189e3 upstream.

This patch fixes a NULL pointer dereference regression bug that was
introduced with:

commit 1e1110c43b
Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Sat May 17 06:49:22 2014 -0400

    target: fix memory leak on XCOPY

Now that target_put_sess_cmd() -> kref_put_spinlock_irqsave() is
called with a valid se_cmd->cmd_kref, a NULL pointer dereference
is triggered because the XCOPY passthrough commands don't have
an associated se_session pointer.

To address this bug, go ahead and checking for a NULL se_sess pointer
within target_put_sess_cmd(), and call se_cmd->se_tfo->release_cmd()
to release the XCOPY's xcopy_pt_cmd memory.

Reported-by: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26 15:17:32 -04:00
Boris BREZILLON
563d554669 rtc: rtc-at91rm9200: fix infinite wait for ACKUPD irq
commit 2fe121e1f5 upstream.

The rtc user must wait at least 1 sec between each time/calandar update
(see atmel's datasheet chapter "Updating Time/Calendar").

Use the 1Hz interrupt to update the at91_rtc_upd_rdy flag and wait for
the at91_rtc_wait_upd_rdy event if the rtc is not ready.

This patch fixes a deadlock in an uninterruptible wait when the RTC is
updated more than once every second.  AFAICT the bug is here from the
beginning, but I think we should at least backport this fix to 3.10 and
the following longterm and stable releases.

Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com>
Reported-by: Bryan Evenson <bevenson@melinkcorp.com>
Tested-by: Bryan Evenson <bevenson@melinkcorp.com>
Cc: Andrew Victor <linux@maxim.org.za>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26 15:17:32 -04:00
Greg Kroah-Hartman
41b67d8f30 Linux 3.15.1 v3.15.1 2014-06-16 13:44:27 -07:00
Andreas Schrägle
a6543e2968 ahci: add PCI ID for Marvell 88SE91A0 SATA Controller
commit 754a292fe6 upstream.

Add support for Marvell Technology Group Ltd. 88SE91A0 SATA 6Gb/s
Controller by adding its PCI ID.

Signed-off-by: Andreas Schrägle <ajs124.ajs124@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:10 -07:00
Jérôme Carretero
72591c3023 ahci: Add Device ID for HighPoint RocketRaid 642L
commit d251836508 upstream.

This device normally comes with a proprietary driver, using a web GUI
to configure RAID:
 http://www.highpoint-tech.com/USA_new/series_rr600-download.htm
But thankfully it also works out of the box with the AHCI driver,
being just a Marvell 88SE9235.

Devices 640L, 644L, 644LS should also be supported but not tested here.

Signed-off-by: Jérôme Carretero <cJ-ko@zougloub.eu>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Alessandro Miceli
6dcfd1f5d4 rtl28xxu: add [1b80:d3af] Sveon STV27
commit 74a86272f0 upstream.

Added support for Sveon STV27 device (rtl2832u + FC0013 tuner)

Signed-off-by: Alessandro Miceli <angelofsky1980@gmail.com>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Alessandro Miceli
8bc4154c9a rtl28xxu: add [1b80:d39d] Sveon STV20
commit f27f5b0ee4 upstream.

Added Sveon STV20 device based on Realtek RTL2832U and FC0012 tuner

Signed-off-by: Alessandro Miceli <angelofsky1980@gmail.com>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Brian Healy
b1947d698f rtl28xxu: add 1b80:d395 Peak DVB-T USB
commit 9ca24ae408 upstream.

Add USB ID for Peak DVB-T USB.

[crope@iki.fi: fix Brian email address and indentation]
Signed-off-by: Brian Healy <healybrian@gmail.com>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-16 13:44:09 -07:00
Tomas Winkler
0cb62c9a24 mei: me: read H_CSR after asserting reset
commit c40765d919 upstream.

According the spec the host should read H_CSR again
after asserting reset H_RST to ensure that reset was
read by the firmware

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Tomas Winkler
4f104ed166 mei: me: drop harmful wait optimization
commit 07cd7be3d9 upstream.

It my take time till ME_RDY will be cleared after the reset,
so we cannot check the bit before we got the interrupt

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Tomas Winkler
b1b94ac553 mei: me: fix hw ready reset flow
commit b04ada92ff upstream.

We cleared H_RST for H_CSR on spurious interrupt generated when ME_RDY
while cleared and not while  ME_RDY is set. The spurious interrupt
is not delivered on all platforms in this case the
driver may fail to initialize.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Alexei Starovoitov
f15e38fc0f PCI/MSI: Fix memory leak in free_msi_irqs()
commit b701c0b1fe upstream.

free_msi_irqs() is leaking memory, since list_for_each_entry(entry,
&dev->msi_list, list) {...} is never executed, because dev->msi_list is
made empty by the loop just above this one.

Fix it by relying on zero termination of attribute array like
populate_msi_sysfs() does.

Fixes: 1c51b50c29 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Andy Lutomirski
e015cef702 auditsc: audit_krule mask accesses need bounds checking
commit a3c5493119 upstream.

Fixes an easy DoS and possible information disclosure.

This does nothing about the broken state of x32 auditing.

eparis: If the admin has enabled auditd and has specifically loaded
audit rules.  This bug has been around since before git.  Wow...

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Al Viro
4e358517b2 lock_parent: don't step on stale ->d_parent of all-but-freed one
commit c2338f2dc7 upstream.

Dentry that had been through (or into) __dentry_kill() might be seen
by shrink_dentry_list(); that's normal, it'll be taken off the shrink
list and freed if __dentry_kill() has already finished.  The problem
is, its ->d_parent might be pointing to already freed dentry, so
lock_parent() needs to be careful.

We need to check that dentry hasn't already gone into __dentry_kill()
*and* grab rcu_read_lock() before dropping ->d_lock - the latter makes
sure that whatever we see in ->d_parent after dropping ->d_lock it
won't be freed until we drop rcu_read_lock().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Andy Lutomirski
d3c8656bc2 fs,userns: Change inode_capable to capable_wrt_inode_uidgid
commit 23adbe12ef upstream.

The kernel has no concept of capabilities with respect to inodes; inodes
exist independently of namespaces.  For example, inode_capable(inode,
CAP_LINUX_IMMUTABLE) would be nonsense.

This patch changes inode_capable to check for uid and gid mappings and
renames it to capable_wrt_inode_uidgid, which should make it more
obvious what it does.

Fixes CVE-2014-4014.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16 13:44:09 -07:00
Linus Torvalds
1860e37987 Linux 3.15 v3.15 2014-06-08 11:19:54 -07:00
Linus Torvalds
bb077d6006 Revert "x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
This reverts commit 3e1a878b7c.

It came in very late, and already has one reported failure: Sitsofe
reports that the current tree fails to boot on his EeePC, and bisected
it down to this.  Rather than waste time trying to figure out what's
wrong, just revert it.

Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-08 10:09:49 -07:00
Linus Torvalds
c593e89787 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
 "I had this in my 3.16 merge window queue, but it is small and obvious
  enough for 3.15.  I cherry-picked and retested against current rc8"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: send, fix corrupted path strings for long paths
2014-06-07 15:12:18 -07:00
Linus Torvalds
052e5c7e28 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Here are the remaining fixes for v3.15.

  This series includes:

   - iser-target fix for ImmediateData exception reference count bug
     (Sagi + nab)
   - iscsi-target fix for MC/S login + potential iser-target MRDSL
     buffer overrun (Santosh + Roland)
   - iser-target fix for v3.15-rc multi network portal shutdown
     regression (nab)
   - target fix for allowing READ_CAPCITY during ALUA Standby access
     state (Chris + nab)
   - target fix for NULL pointer dereference of alua_access_state for
     un-configured devices (Chris + nab)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix alua_access_state attribute OOPs for un-configured devices
  target: Allow READ_CAPACITY opcode in ALUA Standby access state
  iser-target: Fix multi network portal shutdown regression
  iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value()
  iser-target: Add missing target_put_sess_cmd for ImmedateData failure
2014-06-07 15:01:39 -07:00
Linus Torvalds
813895f8dc Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "A significantly larger than I'd like set of patches for just below the
  wire.  All of these, however, fix real problems.

  The one thing that is genuinely scary in here is the change of SMP
  initialization, but that *does* fix a confirmed hang when booting
  virtual machines.

  There is also a patch to actually do the right thing about not
  offlining a CPU when there are not enough interrupt vectors available
  in the system; the accounting was done incorrectly.  The worst case
  for that patch is that we fail to offline CPUs when we should (the new
  code is strictly more conservative than the old), so is not
  particularly risky.

  Most of the rest is minor stuff; the EFI patches are all about
  exporting correct information to boot loaders and kexec"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: EFI_MIXED should not prohibit loading above 4G
  x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
  x86/smpboot: Log error on secondary CPU wakeup failure at ERR level
  x86: Fix list/memory corruption on CPU hotplug
  x86: irq: Get correct available vectors for cpu disable
  x86/efi: Do not export efi runtime map in case old map
  x86/efi: earlyprintk=efi,keep fix
2014-06-07 14:50:38 -07:00
Matt Fleming
745c51673e x86/boot: EFI_MIXED should not prohibit loading above 4G
commit 7d453eee36 ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a
regression for the functionality to load kernels above 4G. The relevant
(incorrect) reasoning behind this change can be seen in the commit
message,

  "The xloadflags field in the bzImage header is also updated to reflect
  that the kernel supports both entry points by setting both of
  XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
  XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
  guaranteed to be addressable with 32-bits."

This is obviously bogus since 32-bit EFI loaders will never place the
kernel above the 4G mark. So this restriction is entirely unnecessary.

But things are worse than that - since we want to encourage people to
always compile with CONFIG_EFI_MIXED=y so that their kernels work out of
the box for both 32-bit and 64-bit firmware, commit 7d453eee36
effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely.

Remove the overzealous and superfluous restriction and restore the
XLF_CAN_BE_LOADED_ABOVE_4G functionality.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-06-07 09:31:00 -07:00
Naoya Horiguchi
d4c54919ed mm: add !pte_present() check on existing hugetlb_entry callbacks
The age table walker doesn't check non-present hugetlb entry in common
path, so hugetlb_entry() callbacks must check it.  The reason for this
behavior is that some callers want to handle it in its own way.

[ I think that reason is bogus, btw - it should just do what the regular
  code does, which is to call the "pte_hole()" function for such hugetlb
  entries  - Linus]

However, some callers don't check it now, which causes unpredictable
result, for example when we have a race between migrating hugepage and
reading /proc/pid/numa_maps.  This patch fixes it by adding !pte_present
checks on buggy callbacks.

This bug exists for years and got visible by introducing hugepage
migration.

ChangeLog v2:
- fix if condition (check !pte_present() instead of pte_present())

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Backported to 3.15.  Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 13:21:16 -07:00
Filipe Manana
01a9a8a9e2 Btrfs: send, fix corrupted path strings for long paths
If a path has more than 230 characters, we allocate a new buffer to
use for the path, but we were forgotting to copy the contents of the
previous buffer into the new one, which has random content from the
kmalloc call.

Test:

    mkfs.btrfs -f /dev/sdd
    mount /dev/sdd /mnt

    TEST_PATH="/mnt/fdmanana/.config/google-chrome-mysetup/Default/Pepper_Data/Shockwave_Flash/WritableRoot/#SharedObjects/JSHJ4ZKN/s.wsj.net/[[IMPORT]]/players.edgesuite.net/flash/plugins/osmf/advanced-streaming-plugin/v2.7/osmf1.6/Ak#"
    mkdir -p $TEST_PATH
    echo "hello world" > $TEST_PATH/amaiAdvancedStreamingPlugin.txt

    btrfs subvolume snapshot -r /mnt /mnt/mysnap1
    btrfs send /mnt/mysnap1 -f /tmp/1.snap

A test for xfstests follows.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Cc: Marc Merlin <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-06 12:00:46 -07:00
Linus Torvalds
d54d14bfb4 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Four misc fixes: each was deemed serious enough to warrant v3.15
  inclusion"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock
  sched/dl: Fix race in dl_task_timer()
  sched: Fix sched_policy < 0 comparison
  sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled
2014-06-06 09:53:32 -07:00
Andrey Ryabinin
624483f3ea mm: rmap: fix use-after-free in __put_anon_vma
While working address sanitizer for kernel I've discovered
use-after-free bug in __put_anon_vma.

For the last anon_vma, anon_vma->root freed before child anon_vma.
Later in anon_vma_free(anon_vma) we are referencing to already freed
anon_vma->root to check rwsem.

This fixes it by freeing the child anon_vma before freeing
anon_vma->root.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 08:53:41 -07:00
Nicholas Bellinger
f145377351 target: Fix alua_access_state attribute OOPs for un-configured devices
This patch fixes a OOPs where an attempt to write to the per-device
alua_access_state configfs attribute at:

  /sys/kernel/config/target/core/$HBA/$DEV/alua/$TG_PT_GP/alua_access_state

results in an NULL pointer dereference when the backend device has not
yet been configured.

This patch adds an explicit check for DF_CONFIGURED, and fails with
-ENODEV to avoid this case.

Reported-by: Chris Boot <crb@tiger-computing.co.uk>
Reported-by: Philip Gaw <pgaw@darktech.org.uk>
Cc: Chris Boot <crb@tiger-computing.co.uk>
Cc: Philip Gaw <pgaw@darktech.org.uk>
Cc: stable@vger.kernel.org # 3.8+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-06 01:22:41 -07:00
Nicholas Bellinger
e7810c2d2c target: Allow READ_CAPACITY opcode in ALUA Standby access state
This patch allows READ_CAPACITY + SAI_READ_CAPACITY_16 opcode
processing to occur while the associated ALUA group is in Standby
access state.

This is required to avoid host side LUN probe failures during the
initial scan if an ALUA group has already implicitly changed into
Standby access state.

This addresses a bug reported by Chris + Philip using dm-multipath
+ ESX hosts configured with ALUA multipath.

Reported-by: Chris Boot <crb@tiger-computing.co.uk>
Reported-by: Philip Gaw <pgaw@darktech.org.uk>
Cc: Chris Boot <crb@tiger-computing.co.uk>
Cc: Philip Gaw <pgaw@darktech.org.uk>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-06 01:21:12 -07:00
H. Peter Anvin
177875423e Merge tag 'efi-urgent' into x86/urgent
* Fix earlyprintk=efi,keep support by switching to an ioremap() mapping
   of the framebuffer when early_ioremap() is no longer available and
   dropping __init from functions that may be invoked after
   free_initmem() - Dave Young

 * We shouldn't be exporting the EFI runtime map in sysfs if not using
   the new 1:1 EFI mapping code since in that case the mappings are not
   static across a kexec reboot - Dave Young

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-06-05 13:09:44 -07:00
Linus Torvalds
951e273060 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two last minute tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf probe: Fix perf probe to find correct variable DIE
  perf probe: Fix a segfault if asked for variable it doesn't find
2014-06-05 12:51:05 -07:00
Linus Torvalds
1c5aefb5b1 Merge branch 'futex-fixes' (futex fixes from Thomas Gleixner)
Merge futex fixes from Thomas Gleixner:
 "So with more awake and less futex wreckaged brain, I went through my
  list of points again and came up with the following 4 patches.

  1) Prevent pi requeueing on the same futex

     I kept Kees check for uaddr1 == uaddr2 as a early check for private
     futexes and added a key comparison to both futex_requeue and
     futex_wait_requeue_pi.

     Sebastian, sorry for the confusion yesterday night.  I really
     misunderstood your question.

     You are right the check is pointless for shared futexes where the
     same physical address is mapped to two different virtual addresses.

  2) Sanity check atomic acquisiton in futex_lock_pi_atomic

     That's basically what Darren suggested.

     I just simplified it to use futex_top_waiter() to find kernel
     internal state.  If state is found return -EINVAL and do not bother
     to fix up the user space variable.  It's corrupted already.

  3) Ensure state consistency in futex_unlock_pi

     The code is silly versus the owner died bit.  There is no point to
     preserve it on unlock when the user space thread owns the futex.

     What's worse is that it does not update the user space value when
     the owner died bit is set.  So the kernel itself creates observable
     inconsistency.

     Another "optimization" is to retry an atomic unlock.  That's
     pointless as in a sane environment user space would not call into
     that code if it could have unlocked it atomically.  So we always
     check whether there is kernel state around and only if there is
     none, we do the unlock by setting the user space value to 0.

  4) Sanitize lookup_pi_state

     lookup_pi_state is ambigous about TID == 0 in the user space value.

     This can be a valid state even if there is kernel state on this
     uaddr, but we miss a few corner case checks.

     I tried to come up with a smaller solution hacking the checks into
     the current cruft, but it turned out to be ugly as hell and I got
     more confused than I was before.  So I rewrote the sanity checks
     along the state documentation with awful lots of commentry"

* emailed patches from Thomas Gleixner <tglx@linutronix.de>:
  futex: Make lookup_pi_state more robust
  futex: Always cleanup owner tid in unlock_pi
  futex: Validate atomic acquisition in futex_lock_pi_atomic()
  futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
2014-06-05 12:31:32 -07:00
Thomas Gleixner
54a217887a futex: Make lookup_pi_state more robust
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex.  We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address.  This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

       Waiter | pi_state | pi->owner | uTID      | uODIED | ?

  [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
  [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid

  [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid

  [4]  Found  | Found    | NULL      | 0         | 1      | Valid
  [5]  Found  | Found    | NULL      | >0        | 1      | Invalid

  [6]  Found  | Found    | task      | 0         | 1      | Valid

  [7]  Found  | Found    | NULL      | Any       | 0      | Invalid

  [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
  [9]  Found  | Found    | task      | 0         | 0      | Invalid
  [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid

 [1] Indicates that the kernel can acquire the futex atomically. We
     came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

 [2] Valid, if TID does not belong to a kernel thread. If no matching
     thread is found then it indicates that the owner TID has died.

 [3] Invalid. The waiter is queued on a non PI futex

 [4] Valid state after exit_robust_list(), which sets the user space
     value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

 [5] The user space value got manipulated between exit_robust_list()
     and exit_pi_state_list()

 [6] Valid state after exit_pi_state_list() which sets the new owner in
     the pi_state but cannot access the user space value.

 [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.

 [8] Owner and user space value match

 [9] There is no transient state which sets the user space TID to 0
     except exit_robust_list(), but this is indicated by the
     FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
     TID out of sync.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
13fbca4c6e futex: Always cleanup owner tid in unlock_pi
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex.  So the owner TID of the current owner
(the unlocker) persists.  That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
b3eaa9fc5c futex: Validate atomic acquisition in futex_lock_pi_atomic()
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state.  If
it has, return -EINVAL.  The state is corrupted already, so no point in
cleaning it up.  Subsequent calls will fail as well.  Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
  	restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
e9c243a5a6 futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call.  If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
  	different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Igor Mammedov
3e1a878b7c x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here:

   https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP. If AP doesn't respond in 10
seconds, the master CPU will timeout and cancel
AP onlining.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:08 +02:00
Igor Mammedov
feef1e8ecb x86/smpboot: Log error on secondary CPU wakeup failure at ERR level
If system is running without debug level logging,
it will not log error if do_boot_cpu() failed to
wakeup AP. It may lead to silent AP bringup
failures at boot time.
Change message level to KERN_ERR to make error
visible to user as it's done on other architectures.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:07 +02:00
Igor Mammedov
89f898c1e1 x86: Fix list/memory corruption on CPU hotplug
currently if AP wake up is failed, master CPU marks AP as not
present in do_boot_cpu() by calling set_cpu_present(cpu, false).
That leads to following list corruption on the next physical CPU
hotplug:

[  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
[  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
[  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
[  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
[  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
[  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
[  418.178445] Call Trace:
[  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
[  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
[  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
[  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
[  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
[  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
[  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
[  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
[  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
[  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
[  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
[  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
[  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
[  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
...
Which is caused by the fact that generic_processor_info() allocates
logical CPU id by calling:

 cpu = cpumask_next_zero(-1, cpu_present_mask);

which returns id of previously failed to wake up CPU, since its
bit is cleared by do_boot_cpu() and as result register_cpu()
tries to register another CPU with the same id as already
present but failed to be onlined CPU.

Taking in account that AP will not do anything if master CPU
failed to wake it up, there is no reason to mark that AP as not
present and break next cpu hotplug attempts. As a side effect of
not marking AP as not present, user would be allowed to online
it again later.

Also fix memory corruption in acpi_unmap_lsapic()

if during CPU hotplug master CPU failed to wake up AP
it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.

However following attempt to unplug that CPU will lead to
out of bound write access to __apicid_to_node[] which is
32768 items long on x86_64 kernel.

So with above fix of cpu_present_mask make sure that a present
CPU has a valid APIC ID by not setting x86_cpu_to_apicid
to BAD_APICID in do_boot_cpu() on failure and allow
acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:07 +02:00
Roman Gushchin
09dc4ab039 sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock
tg_set_cfs_bandwidth() sets cfs_b->timer_active to 0 to
force the period timer restart. It's not safe, because
can lead to deadlock, described in commit 927b54fccb:
"__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq->lock, resulting in deadlock."

Three CPUs must be involved:

  CPU0               CPU1                         CPU2
  take rq->lock      period timer fired
  ...                take cfs_b lock
  ...                ...                          tg_set_cfs_bandwidth()
  throttle_cfs_rq()  release cfs_b lock           take cfs_b lock
  ...                distribute_cfs_runtime()     timer_active = 0
  take cfs_b->lock   wait for rq->lock            ...
  __start_cfs_bandwidth()
  {wait for timer callback
   break if timer_active == 1}

So, CPU0 and CPU1 are deadlocked.

Instead of resetting cfs_b->timer_active, tg_set_cfs_bandwidth can
wait for period timer callbacks (ignoring cfs_b->timer_active) and
restart the timer explicitly.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/87wqdi9g8e.wl\%klamm@yandex-team.ru
Cc: pjt@google.com
Cc: chris.j.arges@canonical.com
Cc: gregkh@linuxfoundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:51:34 +02:00
Kirill Tkhai
0f397f2c90 sched/dl: Fix race in dl_task_timer()
Throttled task is still on rq, and it may be moved to other cpu
if user is playing with sched_setaffinity(). Therefore, unlocked
task_rq() access makes the race.

Juri Lelli reports he got this race when dl_bandwidth_enabled()
was not set.

Other thing, pointed by Peter Zijlstra:

   "Now I suppose the problem can still actually happen when
    you change the root domain and trigger a effective affinity
    change that way".

To fix that we do the same as made in __task_rq_lock(). We do not
use __task_rq_lock() itself, because it has a useful lockdep check,
which is not correct in case of dl_task_timer(). We do not need
pi_lock locked here. This case is an exception (PeterZ):

   "The only reason we don't strictly need ->pi_lock now is because
    we're guaranteed to have p->state == TASK_RUNNING here and are
    thus free of ttwu races".

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.14+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3056991400578422@web14g.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:51:12 +02:00
Richard Weinberger
b14ed2c273 sched: Fix sched_policy < 0 comparison
attr.sched_policy is u32, therefore a comparison against < 0 is never true.
Fix this by casting sched_policy to int.

This issue was reported by coverity CID 1219934.

Fixes: dbdb22754f ("sched: Disallow sched_attr::sched_policy < 0")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1401741514-7045-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:07:43 +02:00
Steven Rostedt
e9dd685ce8 sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled
As Peter Zijlstra told me, we have the following path:

do_exit()
  exit_itimers()
    itimer_delete()
      spin_lock_irqsave(&timer->it_lock, &flags);
      timer_delete_hook(timer);
        kc->timer_del(timer) := posix_cpu_timer_del()
          put_task_struct()
            __put_task_struct()
              task_numa_free()
                spin_lock(&grp->lock);

Which means that task_numa_free() can be called with interrupts
disabled, which means that we should not be using spin_lock_irq() but
spin_lock_irqsave() instead. Otherwise we are enabling interrupts while
holding an interrupt unsafe lock!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner<tglx@linutronix.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140527182541.GH11096@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:07:41 +02:00
Ingo Molnar
22c91aa235 Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/urgent
Pull perf/urgent fixes from Jiri Olsa:

 * Fix perf probe to find correct variable DIE (Masami Hiramatsu)

 * Fix a segfault in perf probe if asked for variable it doesn't find (Masami Hiramatsu)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 09:54:01 +02:00
Linus Torvalds
54539cd217 Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu fix from Tejun Heo:
 "It is very late but this is an important percpu-refcount fix from
  Sebastian Ott.

  The problem is that percpu_ref_*() used __this_cpu_*() instead of
  this_cpu_*().  The difference between the two is that the latter is
  atomic on the local cpu while the former is not.  this_cpu_inc() is
  guaranteed to increment the percpu counter on the cpu that the
  operation is executed on without any synchronization; however,
  __this_cpu_inc() doesn't and if the local cpu invokes the function
  from different contexts (e.g.  process and irq) of the same CPU, it's
  not guaranteed to actually increment as it may be implemented as rmw.

  This bug existed from the get-go but it hasn't been noticed earlier
  probably because on x86 __this_cpu_inc() is equivalent to
  this_cpu_inc() as both get translated into single instruction;
  however, s390 uses the generic rmw implementation and gets affected by
  the bug.  Kudos to Sebastian and Heiko for diagnosing it.

  The change is very low risk and fixes a critical issue on the affected
  architectures, so I think it's a good candidate for inclusion although
  it's very late in the devel cycle.  On the other hand, this has been
  broken since v3.11, so backporting it through -stable post -rc1 won't
  be the end of the world.

  I'll ping Christoph whether __this_cpu_*() ops can be better annotated
  so that it can trigger lockdep warning when used from multiple
  contexts"

* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu-refcount: fix usage of this_cpu_ops
2014-06-04 09:56:03 -07:00
Sebastian Ott
0c36b390a5 percpu-refcount: fix usage of this_cpu_ops
The percpu-refcount infrastructure uses the underscore variants of
this_cpu_ops in order to modify percpu reference counters.
(e.g. __this_cpu_inc()).

However the underscore variants do not atomically update the percpu
variable, instead they may be implemented using read-modify-write
semantics (more than one instruction).  Therefore it is only safe to
use the underscore variant if the context is always the same (process,
softirq, or hardirq). Otherwise it is possible to lose updates.

This problem is something that Sebastian has seen within the aio
subsystem which uses percpu refcounters both in process and softirq
context leading to reference counts that never dropped to zeroes; even
though the number of "get" and "put" calls matched.

Fix this by using the non-underscore this_cpu_ops variant which
provides correct per cpu atomic semantics and fixes the corrupted
reference counts.

Cc: Kent Overstreet <kmo@daterainc.com>
Cc: <stable@vger.kernel.org> # v3.11+
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett
2014-06-04 12:12:29 -04:00