BugLink: http://bugs.launchpad.net/bugs/1187551
commit ccf5ae83a6 upstream.
It is possible for one thread to to take se_sess->sess_cmd_lock in
core_tmr_abort_task() before taking a reference count on
se_cmd->cmd_kref, while another thread in target_put_sess_cmd() drops
se_cmd->cmd_kref before taking se_sess->sess_cmd_lock.
This introduces kref_put_spinlock_irqsave() and uses it in
target_put_sess_cmd() to close the race window.
Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1187551
commit 21caf2fc19 upstream.
This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task to avoid
doing I/O inside memory allocation in the task's context.
The patch trys to solve one deadlock problem caused by block device, and
the problem may happen at least in the below situations:
- during block device runtime resume, if memory allocation with
GFP_KERNEL is called inside runtime resume callback of any one of its
ancestors(or the block device itself), the deadlock may be triggered
inside the memory allocation since it might not complete until the block
device becomes active and the involed page I/O finishes. The situation
is pointed out first by Alan Stern. It is not a good approach to
convert all GFP_KERNEL[1] in the path into GFP_NOIO because several
subsystems may be involved(for example, PCI, USB and SCSI may be
involved for usb mass stoarage device, network devices involved too in
the iSCSI case)
- during block device runtime suspend, because runtime resume need to
wait for completion of concurrent runtime suspend.
- during error handling of usb mass storage deivce, USB bus reset will
be put on the device, so there shouldn't have any memory allocation with
GFP_KERNEL during USB bus reset, otherwise the deadlock similar with
above may be triggered. Unfortunately, any usb device may include one
mass storage interface in theory, so it requires all usb interface
drivers to handle the situation. In fact, most usb drivers don't know
how to handle bus reset on the device and don't provide .pre_set() and
.post_reset() callback at all, so USB core has to unbind and bind driver
for these devices. So it is still not practical to resort to GFP_NOIO
for solving the problem.
Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.
It is not a good idea to convert all these GFP_KERNEL in the affected
path into GFP_NOIO because these functions doing that may be implemented
as library and will be called in many other contexts.
In fact, memalloc_noio_flags() can convert some of current static
GFP_NOIO allocation into GFP_KERNEL back in other non-affected contexts,
at least almost all GFP_NOIO in USB subsystem can be converted into
GFP_KERNEL after applying the approach and make allocation with GFP_NOIO
only happen in runtime resume/bus reset/block I/O transfer contexts
generally.
[1], several GFP_KERNEL allocation examples in runtime resume path
- pci subsystem
acpi_os_allocate
<-acpi_ut_allocate
<-ACPI_ALLOCATE_ZEROED
<-acpi_evaluate_object
<-__acpi_bus_set_power
<-acpi_bus_set_power
<-acpi_pci_set_power_state
<-platform_pci_set_power_state
<-pci_platform_power_transition
<-__pci_complete_power_transition
<-pci_set_power_state
<-pci_restore_standard_config
<-pci_pm_runtime_resume
- usb subsystem
usb_get_status
<-finish_port_resume
<-usb_port_resume
<-generic_resume
<-usb_resume_device
<-usb_resume_both
<-usb_runtime_resume
- some individual usb drivers
usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, ....
That is just what I have found. Unfortunately, this allocation can only
be found by human being now, and there should be many not found since
any function in the resume path(call tree) may allocate memory with
GFP_KERNEL.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Jiri Kosina <jiri.kosina@suse.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Decotigny <david.decotigny@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ kamal: backport to 3.8 ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1187551
commit cdee3904b4 upstream.
Commit b05d8447e7 (audit: inline audit_syscall_entry to reduce
burden on archs) changed audit_syscall_entry to check for a dummy
context before calling __audit_syscall_entry. Unfortunately the dummy
context state is maintained in __audit_syscall_entry so once set it
never gets cleared, even if the audit rules change.
As a result, if there are no auditing rules when a process starts
then it will never be subject to any rules added later. x86 doesn't
see this because it has an assembly fast path that calls directly into
__audit_syscall_entry.
I noticed this issue when working on audit performance optimisations.
I wrote a set of simple test cases available at:
http://ozlabs.org/~anton/junkcode/audit_tests.tar.gz
02_new_rule.py fails without the patch and passes with it. The
test case clears all rules, starts a process, adds a rule then
verifies the process produces a syscall audit record.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1179514
commit 219b47339c upstream.
Currently we have a problem with this:
1. i915: create gem object
2. i915: export gem object to prime
3. radeon: import gem object
4. close prime fd
5. radeon: unref object
6. i915: unref object
i915 has an imported object reference in its file priv, that isn't
cleaned up properly until fd close. The reference gets added at step 2,
but at step 6 we don't have enough info to clean it up.
The solution is to take a reference on the dma-buf when we export it,
and drop the reference when the gem handle goes away.
So when we export a dma_buf from a gem object, we keep track of it
with the handle, we take a reference to the dma_buf. When we close
the handle (i.e. userspace is finished with the buffer), we drop
the reference to the dma_buf, and it gets collected.
This patch isn't meant to fix any other problem or bikesheds, and it doesn't
fix any races with other scenarios.
v1.1: move export symbol line back up.
v2: okay I had to do a bit more, as the first patch showed a leak
on one of my tests, that I found using the dma-buf debugfs support,
the problem case is exporting a buffer twice with the same handle,
we'd add another export handle for it unnecessarily, however
we now fail if we try to export the same object with a different gem handle,
however I'm not sure if that is a case I want to support, and I've
gotten the code to WARN_ON if we hit something like that.
v2.1: rebase this patch, write better commit msg.
v3: cleanup error handling, track import vs export in linked list,
these two patches were separate previously, but seem to work better
like this.
v4: danvet is correct, this code is no longer useful, since the buffer
better exist, so remove it.
v5: always take a reference to the dma buf object, import or export.
(Imre Deak contributed this originally)
v6: square the circle, remove import vs export tracking now
that there is no difference
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1179514
commit 871dd9286e upstream.
linux-v3.8-rc1 and later support for plug for blkdev_issue_discard with
commit 0cfbcafcae
(block: add plug for blkdev_issue_discard )
For example,
1) DISCARD rq-1 with size size 4GB
2) DISCARD rq-2 with size size 1GB
If these 2 discard requests get merged, final request size will be 5GB.
In this case, request's __data_len field may overflow as it can store
max 4GB(unsigned int).
This issue was observed while doing mkfs.f2fs on 5GB SD card:
https://lkml.org/lkml/2013/4/1/292
Info: sector size = 512
Info: total sectors = 11370496 (in 512bytes)
Info: zone aligned segment0 blkaddr: 512
[ 257.789764] blk_update_request: bio idx 0 >= vcnt 0
mkfs process gets stuck in D state and I see the following in the dmesg:
[ 257.789733] __end_that: dev mmcblk0: type=1, flags=122c8081
[ 257.789764] sector 4194304, nr/cnr 2981888/4294959104
[ 257.789764] bio df3840c0, biotail df3848c0, buffer (null), len
1526726656
[ 257.789764] blk_update_request: bio idx 0 >= vcnt 0
[ 257.794921] request botched: dev mmcblk0: type=1, flags=122c8081
[ 257.794921] sector 4194304, nr/cnr 2981888/4294959104
[ 257.794921] bio df3840c0, biotail df3848c0, buffer (null), len
1526726656
This patch fixes this issue.
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1179514
commit af73e4d950 upstream.
The current kernel returns -EINVAL unless a given mmap length is
"almost" hugepage aligned. This is because in sys_mmap_pgoff() the
given length is passed to vm_mmap_pgoff() as it is without being aligned
with hugepage boundary.
This is a regression introduced in commit 40716e2924 ("hugetlbfs: fix
alignment of huge page requests"), where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.
To fix this, this patch partially reverts that commit, and adds
alignment code in caller side. And it also introduces hstate_sizelog()
in order to get proper hstate to specified hugepage size.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881
[akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: <iceman_dvd@yahoo.com>
Cc: Steven Truelove <steven.truelove@utoronto.ca>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1178361
commit d76a3a7711 upstream.
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.
Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().
Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.
As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: George Barnett <gbarnett@atlassian.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1178361
commit d69f3bad46 upstream.
Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB. By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.
In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
ipc/shm.c:
458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
459 {
...
465 int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
...
474 if (ns->shm_tot + numpages > ns->shm_ctlall)
475 return -ENOSPC;
[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
Signed-off-by: Robin Holt <holt@sgi.com>
Reported-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1178361
commit e56fb28740 upstream.
threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable. This doesn't look nice, the scope of
this lock in do_execve() is huge.
And as Dave pointed out this can lead to deadlock, we have the
following dependencies:
do_execve: cred_guard_mutex -> i_mutex
cgroup_mount: i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex
Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.
Note that de_thread() can't sleep with ->group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1178361
commit 712317ad97 upstream.
We should store file xattrs in struct cfent instead of struct cftype,
because cftype is a type while cfent is object instance of cftype.
For example each cgroup has a tasks file, and each tasks file is
associated with a uniq cfent, but all those files share the same
struct cftype.
Alexey Kodanev reported a crash, which can be reproduced:
# mount -t cgroup -o xattr /sys/fs/cgroup
# mkdir /sys/fs/cgroup/test
# setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
# rmdir /sys/fs/cgroup/test
# umount /sys/fs/cgroup
oops!
In this case, simple_xattrs_free() will free the same struct simple_xattrs
twice.
tj: Dropped unused local variable @cft from cgroup_diput().
Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1178361
commit 6ee8630e02 upstream.
On architectures where a pgd entry may be shared between user and kernel
(e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0.
This patch introduces a generic USER_PGTABLES_CEILING that arch code can
override. It is the responsibility of the arch code setting the ceiling
to ensure the complete freeing of the page tables (usually in
pgd_free()).
[catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steve Conklin <sconklin@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1175391
[ Upstream commit 124dff01af ]
Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.
nf_reset() is used in the following cases:
- when passing packets up the the socket layer, at which point we want to
release all netfilter references that might keep modules pinned while
the packet is queued. nf_trace doesn't matter anymore at this point.
- when encapsulating or decapsulating IPsec packets. We want to continue
tracing these packets after IPsec processing.
- when passing packets through virtual network devices. Only devices on
that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
used anymore. Its not entirely clear whether those packets should
be traced after that, however we've always done that.
- when passing packets through virtual network devices that make the
packet cross network namespace boundaries. This is the only cases
where we clearly want to reset nf_trace and is also what the
original patch intended to fix.
Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1175391
[ Upstream commit 4543fbefe6 ]
A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices. In
some cases (bond/team) a single address list is synched down
to multiple devices. At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.
Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1173170
commit b4cbb197c7 upstream.
Various drivers end up replicating the code to mmap() their memory
buffers into user space, and our core memory remapping function may be
very flexible but it is unnecessarily complicated for the common cases
to use.
Our internal VM uses pfn's ("page frame numbers") which simplifies
things for the VM, and allows us to pass physical addresses around in a
denser and more efficient format than passing a "phys_addr_t" around,
and having to shift it up and down by the page size. But it just means
that drivers end up doing that shifting instead at the interface level.
It also means that drivers end up mucking around with internal VM things
like the vma details (vm_pgoff, vm_start/end) way more than they really
need to.
So this just exports a function to map a certain physical memory range
into user space (using a phys_addr_t based interface that is much more
natural for a driver) and hides all the complexity from the driver.
Some drivers will still end up tweaking the vm_page_prot details for
things like prefetching or cacheability etc, but that's actually
relevant to the driver, rather than caring about what the page offset of
the mapping is into the particular IO memory region.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1173170
commit 8f964525a1 upstream.
This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page. If the range falls within
the same memslot, then this will be a fast operation. If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.
Tested: Test against kvm_clock unit tests.
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1173170
commit f2530dc71c upstream.
The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:
CPU0 CPU1 CPU2
unpark(T) wake_up_process(T)
clear(SHOULD_PARK) T runs
leave parkme() due to !SHOULD_PARK
bind_to(CPU2) BUG_ON(wrong CPU)
We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.
The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.
Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.
Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.
The migration thread has another related issue.
CPU0 CPU1
Bring up CPU2
create_thread(T)
park(T)
wait_for_completion()
parkme()
complete()
sched_set_stop_task()
schedule(TASK_PARKED)
The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().
Reported-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Dave Hansen <dave@sr71.net>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Nothing is using it yet, but this will allow us to delay the open-time
checks to use time, without breaking the normal UNIX permission
semantics where permissions are determined by the opener (and the file
descriptor can then be passed to a different process, or the process can
drop capabilities).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 935d8aabd4)
CVE-2013-1959
BugLink: https://launchpad.net/bugs/1174590
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1169930
commit 1de14c3c5c upstream.
This patch attempts to fix:
https://bugzilla.kernel.org/show_bug.cgi?id=56461
The symptom is a crash and messages like this:
chrome: Corrupted page table at address 34a03000
*pdpt = 0000000000000000 *pde = 0000000000000000
Bad pagetable: 000f [#1] PREEMPT SMP
Ingo guesses this got introduced by commit 611ae8e3f5 ("x86/tlb:
enable tlb flush range support for x86") since that code started to free
unused pagetables.
On x86-32 PAE kernels, that new code has the potential to free an entire
PMD page and will clear one of the four page-directory-pointer-table
(aka pgd_t entries).
The hardware aggressively "caches" these top-level entries and invlpg
does not actually affect the CPU's copy. If we clear one we *HAVE* to
do a full TLB flush, otherwise we might continue using a freed pmd page.
(note, we do this properly on the population side in pud_populate()).
This patch tracks whenever we clear one of these entries in the 'struct
mmu_gather', and ensures that we follow up with a full tlb flush.
BTW, I disassembled and checked that:
if (tlb->fullmm == 0)
and
if (!tlb->fullmm && !tlb->need_flush_all)
generate essentially the same code, so there should be zero impact there
to the !PAE case.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Artem S Tashkinov <t.artem@mailcity.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1168506
commit 386afc9114 upstream.
In UP and non-preempt respectively, the spinlocks and preemption
disable/enable points are stubbed out entirely, because there is no
regular code that can ever hit the kind of concurrency they are meant to
protect against.
However, while there is no regular code that can cause scheduling, we
_do_ end up having some exceptional (literally!) code that can do so,
and that we need to make sure does not ever get moved into the critical
region by the compiler.
In particular, get_user() and put_user() is generally implemented as
inline asm statements (even if the inline asm may then make a call
instruction to call out-of-line), and can obviously cause a page fault
and IO as a result. If that inline asm has been scheduled into the
middle of a preemption-safe (or spinlock-protected) code region, we
obviously lose.
Now, admittedly this is *very* unlikely to actually ever happen, and
we've not seen examples of actual bugs related to this. But partly
exactly because it's so hard to trigger and the resulting bug is so
subtle, we should be extra careful to get this right.
So make sure that even when preemption is disabled, and we don't have to
generate any actual *code* to explicitly tell the system that we are in
a preemption-disabled region, we need to at least tell the compiler not
to move things around the critical region.
This patch grew out of the same discussion that caused commits
79e5f05edc ("ARC: Add implicit compiler barrier to raw_local_irq*
functions") and 3e2e0d2c22 ("tile: comment assumption about
__insn_mtspr for <asm/irqflags.h>") to come about.
Note for stable: use discretion when/if applying this. As mentioned,
this bug may never have actually bitten anybody, and gcc may never have
done the required code motion for it to possibly ever trigger in
practice.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1168506
commit d8668fcb0b upstream.
The function returns type of ATAPI drives so it should return integer value.
The commit 4dce8ba94c (libata: Use 'bool' return value for ata_id_XXX) since
v2.6.39 changed the type of return value from int to bool, the change would
cause all of the ATAPI class drives to be treated as TYPE_TAPE and the
max_sectors of the drives to be set to 65535 because of the commit
f8d8e5799b7(libata: increase 128 KB / cmd limit for ATAPI tape drives), for the
function would return true for all ATAPI class drives and the TYPE_TAPE is
defined as 0x01.
Signed-off-by: Shan Hai <shan.hai@windriver.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1031163
dw_dmac driver already supports device tree but it used to have its platform
data passed the non-DT way.
This patch does following changes:
- pass platform data via DT, non-DT way still takes precedence if both are used.
- create generic filter routine
- Earlier slave information was made available by slave specific filter routines
in chan->private field. Now, this information would be passed from within dmac
DT node. Slave drivers would now be required to pass bus_id (a string) as
parameter to this generic filter(), which would be compared against the slave
data passed from DT, by the generic filter routine.
- Update binding document
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[Fixed __devinit usage]
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
(cherry picked from commit a9ddb575d6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
I despair of ever getting this upstream.
BugLink: http://bugs.launchpad.net/bugs/586386
Some init packages such as upstart find having all of the kernel parameters
passed in useful. Currently they have to open up /proc/cmdline and
reparse that to obtain this information. Add a kernel configuration
option to enable passing of all options.
Note, enabling this option will reduce the chances that a fallback from
/sbin/init to /bin/bash or /bin/sh will succeed. Though it should be
noted that there are commonly unknown options present which would already
break this fallback. init=/bin/foo provides explicit control over options
which is unaffected by this change.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
The pl320 IPC allows for interprocessor communication between the
highbank A9 and the EnergyCore Management Engine. The pl320 implements
a straightforward mailbox protocol.
Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 300586778d)
BugLink: http://bugs.launchpad.net/bugs/1031162
Now that support for RM9000 and platforms based on it has been removed,
remove the serial driver for it as well. It's really only been a quirk
for an almost 8250 compatible UART anyway.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
drivers/tty/serial/8250/8250.c | 70 +----------------------------------------
drivers/tty/serial/8250/Kconfig | 9 ------
include/linux/serial_core.h | 1 -
3 files changed, 1 insertion(+), 79 deletions(-)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a205a56dc2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1031180
When the ODD is powered off, any action the user did to the ODD that
would generate a media event will trigger an ACPI interrupt, so the
poll for media event is no longer necessary. And the poll will also
cause a runtime status change, which will stop the ODD from staying in
powered off state, so the poll should better be stopped.
But since we don't have access to the gendisk structure in LLDs, here
comes the disk_events_disable_depth for scsi device. This field is a
hint set by LLDs to convey information to upper layer drivers. A value
of 0 means media poll is necessary for the device, while values above 0
means media poll is not needed and should better be skipped. So we can
increase its value when we are to power off the ODD in ATA layer and
decrease its value when the ODD is powered on, effectively silence the
media events poll.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
(cherry picked from commit 6f4c827e68)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1031180
The ODD can be enabled for ZPODD if the following three conditions are
satisfied:
1 The ODD supports device attention;
2 The platform can runtime power off the ODD through ACPI;
3 The ODD is either slot type or drawer type.
For such ODDs, zpodd_init is called and a new structure is allocated for
it to store ZPODD related stuffs.
And the zpodd_dev_enabled function is used to test if ZPODD is currently
enabled for this ODD.
A new config CONFIG_SATA_ZPODD is added to selectively build ZPODD code.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
(cherry picked from commit afe7595118)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1031180
Commit 166a2967b4 "libata: tell scsi layer
device supports runtime power off" introduced the can_power_off flag for
scsi_device and is used to support ZPODD implementation in SCSI layer.
Since ZPODD is now implemented in ATA layer, that flag is no longer
needed, so remove it.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
(cherry picked from commit 44ec657be0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1031169
Intel LPSS SPI is pretty much the same as the PXA27xx SPI except that it
has few additional features over the original:
o FIFO depth is 256 entries
o RX FIFO has one watermark
o TX FIFO has two watermarks, low and high
o chip select can be controlled by writing to a register
The new FIFO registers follow immediately the PXA27xx registers but then there
are some additional LPSS private registers at offset 1k or 2k from the base
address. For these private registers we add new accessors that take advantage
of drv_data->lpss_base once it is resolved.
We add a new type LPSS_SSP that can be used to distinguish the LPSS devices
from others.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Lu Cao <lucao@marvell.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
(cherry picked from commit a0d2642e92)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1031169
To be able to use DMA with this driver on non-PXA platforms we implement
support for the generic DMA engine API. This lets user to use different DMA
engines with little or no modification to the driver.
Request lines and channel numbers can be passed to the driver from the
platform specific data.
The DMA engine implementation will be selected by default even on PXA
platform. User can select the legacy DMA API by enabling Kconfig option
CONFIG_SPI_PXA2XX_PXADMA.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Lu Cao <lucao@marvell.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
(cherry picked from commit 5928808ef6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1031169
The PXA SPI driver uses PXA platform specific private DMA implementation
which does not work on non-PXA platforms. In order to use this driver on
other platforms we break out the private DMA implementation into a separate
file that gets compiled only when CONFIG_SPI_PXA2XX_PXADMA is set. The DMA
functions are stubbed out if there is no DMA implementation selected (i.e
we are building on non-PXA platform).
While we are there we can kill the dummy DMA bits in pxa2xx_spi.h as they
are not needed anymore for CE4100.
Once this is done we can add the generic DMA engine support to the driver
that allows usage of any DMA controller that implements DMA engine API.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Lu Cao <lucao@marvell.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
(cherry picked from commit cd7bed0034)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1031169
Convert clk_enable() to clk_prepare_enable() and clk_disable() to
clk_disable_unprepare() respectively in order to support the common clk
framework. Otherwise we get warnings on the console as the clock is not
prepared before it is enabled.
In addition we must cache the maximum clock rate to drv_data->max_clk_rate
at probe time because clk_get_rate() cannot be called in tasklet context.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
(cherry picked from commit 3343b7a6d2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1031169
The spi-pxa2xx-pci glue driver had to implement pxa_ssp_request()/free() in
order to support the spi-pxa2xx platform driver. Since the ACPI enabled
platforms can use the same platform driver we would need to implement
pxa_ssp_request()/free() in some central place that can be shared by the
ACPI and PCI glue code.
Instead of doing that we can make pxa_ssp_request()/free() to be available
only when CONFIG_ARCH_PXA is set. On other arches these are being stubbed
out in preference to passing the ssp_device from the platform data
directly.
We also change the SPI bus number to be taken from ssp->port_id instead of
platform device id. This way the supporting code that passes the ssp can
decide the number (or it can set it to the same as pdev->id).
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
(cherry picked from commit 851bacf590)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1138440
Instead of reimplementing all of the dma_buf functionality in every driver,
create helpers drm_prime_import and drm_prime_export that implement them in
terms of new, lower-level hook functions:
gem_prime_pin: callback when a buffer is created, used to pin buffers into GTT
gem_prime_get_sg_table: convert a drm_gem_object to an sg_table for export
gem_prime_import_sg_table: convert an sg_table into a drm_gem_object
gem_prime_vmap, gem_prime_vunmap: map and unmap an object
These hooks are optional; drivers can opt in by using drm_gem_prime_import and
drm_gem_prime_export as the .gem_prime_import and .gem_prime_export fields of
struct drm_driver.
v2:
- Drop .begin_cpu_access. None of the drivers this code replaces implemented
it. Having it here was a leftover from when I was trying to include i915 in
this rework.
- Use mutex_lock instead of mutex_lock_interruptible, as these three drivers
did. This patch series shouldn't change that behavior.
- Rename helpers to gem_prime_get_sg_table and gem_prime_import_sg_table.
Rename struct sg_table* variables to 'sgt' for clarity.
- Update drm.tmpl for these new hooks.
v3:
- Pass the vaddr down to the driver. This lets drivers that just call vunmap on
the pointer avoid having to store the pointer in their GEM private structures.
- Move documentation into a /** DOC */ comment in drm_prime.c and include it in
drm.tmpl with a !P line. I tried to use !F lines to include documentation of
the individual functions from drmP.h, but the docproc / kernel-doc scripts
barf on that file, so hopefully this is good enough for now.
- apply refcount fix from commit be8a42ae60
("drm/prime: drop reference on imported dma-buf come from gem")
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 89177644a7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>