linux/compat.h does not include linux/unistd.h but the compat.h header
file contains various conditional
#ifdef __ARCH_WANT_COMPAT_...
asmlinkage long compat...()
#endif
compat system call function declarations.
If linux/unistd.h isn't included it depends on previous includes if those
__ARCH_WANT_COMPAT_... defines are defined or not. So add an additional
linux/unistd.h include.
Should fix this compile error on tile:
include/uapi/asm-generic/unistd.h:195:1: error: 'compat_sys_getdents64' undeclared
make[3]: *** [arch/tile/kernel/compat.o] Error 1
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
cgroup_tree_mutex should nest above the kernfs active_ref protection;
however, cgroup_create() and cgroup_rename() were grabbing
cgroup_tree_mutex while under kernfs active_ref protection. This has
actualy possibility to lead to deadlocks in case these operations race
against cgroup_rmdir() which invokes kernfs_remove() on directory
kernfs_node while holding cgroup_tree_mutex.
Neither cgroup_create() or cgroup_rename() requires active_ref
protection. The former already has enough synchronization through
cgroup_lock_live_group() and the latter doesn't care, so this can be
fixed by updating both functions to break all active_ref protections
before grabbing cgroup_tree_mutex.
While this patch fixes the immediate issue, it probably needs further
work in the long term - kernfs directories should enable lockdep
annotations and maybe the better way to handle this is marking
directory nodes as not needing active_ref protection rather than
breaking it in each operation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We accidentally declared pid_alive without any extern/inline connotation.
Some platforms were fine with this, some like ia64 and mips were very angry.
If the function is inline, the prototype should be inline!
on ia64:
include/linux/sched.h:1718: warning: 'pid_alive' declared inline after
being called
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Calling audit_log_lost with a \n in the format string leads to extra
newlines in dmesg. That function will eventually call audit_panic which
uses pr_err with an explicit \n included. Just make these calls match the
others that lack \n.
Reported-by: Jonathan Kamens <jik@kamens.brookline.ma.us>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
The login uid change record does not include the selinux context of the
task logging in. Add that information.
(Updated from 2011-01: RHBZ:670328 -- RGB)
Reported-by: Steve Grubb <sgrubb@redhat.com>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Still only permit the audit logging daemon and control to operate from the
initial PID namespace, but allow processes to log from another PID namespace.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
(informed by ebiederman's c776b5d2)
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Store and log all PIDs with reference to the initial PID namespace and
use the access functions task_pid_nr() and task_tgid_nr() for task->pid
and task->tgid.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
(informed by ebiederman's c776b5d2)
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
sys_getppid() returns the parent pid of the current process in its own pid
namespace. Since audit filters are based in the init pid namespace, a process
could avoid a filter or trigger an unintended one by being in an alternate pid
namespace or log meaningless information.
Switch to task_ppid_nr() for PPIDs to anchor all audit filters in the
init_pid_ns.
(informed by ebiederman's 6c621b7e)
Cc: stable@vger.kernel.org
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the lookup
of the PPID (real_parent's pid_t) of a process, including rcu locking, in the
arbitrary and init_pid_ns.
This provides an alternative to sys_getppid(), which is relative to the child
process' pid namespace.
(informed by ebiederman's 6c621b7e)
Cc: stable@vger.kernel.org
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
"get" usually implies incrementing a refcount into a structure to indicate a
reference being held by another part of code.
Change this function name to indicate it is in fact being taken from it,
returning the value while clearing it in the supplying structure.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
lib/audit.c provides a generic function for auditing system calls.
This patch extends it for compat syscall support on bi-architectures
(32/64-bit) by adding lib/compat_audit.c.
What is required to support this feature are:
* add asm/unistd32.h for compat system call names
* select CONFIG_AUDIT_ARCH_COMPAT_GENERIC
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently AUDITSYSCALL has a long list of architecture depencency:
depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
for simplicity.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com> (arm)
Acked-by: Richard Guy Briggs <rgb@redhat.com> (audit)
Acked-by: Matt Turner <mattst88@gmail.com> (alpha)
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Eric Paris <eparis@redhat.com>
In perverse cases of file descriptor passing the current network
namespace of a process and the network namespace of a socket used by
that socket may differ. Therefore use the network namespace of the
appropiate socket to ensure replies always go to the appropiate
socket.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
While reading through 3.14-rc1 I found a pretty siginficant mishandling
of network namespaces in the recent audit changes.
In struct audit_netlink_list and audit_reply add a reference to the
network namespace of the caller and remove the userspace pid of the
caller. This cleanly remembers the callers network namespace, and
removes a huge class of races and nasty failure modes that can occur
when attempting to relook up the callers network namespace from a pid_t
(including the caller's network namespace changing, pid wraparound, and
the pid simply not being present).
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
During an audit event, cache and print the value of the process's
proctitle value (proc/<pid>/cmdline). This is useful in situations
where processes are started via fork'd virtual machines where the
comm field is incorrect. Often times, setting the comm field still
is insufficient as the comm width is not very wide and most
virtual machine "package names" do not fit. Also, during execution,
many threads have their comm field set as well. By tying it back to
the global cmdline value for the process, audit records will be more
complete in systems with these properties. An example of where this
is useful and applicable is in the realm of Android. With Android,
their is no fork/exec for VM instances. The bare, preloaded Dalvik
VM listens for a fork and specialize request. When this request comes
in, the VM forks, and the loads the specific application (specializing).
This was done to take advantage of COW and to not require a load of
basic packages by the VM on very app spawn. When this spawn occurs,
the package name is set via setproctitle() and shows up in procfs.
Many of these package names are longer then 16 bytes, the historical
width of task->comm. Having the cmdline in the audit records will
couple the application back to the record directly. Also, on my
Debian development box, some audit records were more useful then
what was printed under comm.
The cached proctitle is tied to the life-cycle of the audit_context
structure and is built on demand.
Proctitle is controllable by userspace, and thus should not be trusted.
It is meant as an aid to assist in debugging. The proctitle event is
emitted during syscall audits, and can be filtered with auditctl.
Example:
type=AVC msg=audit(1391217013.924:386): avc: denied { getattr } for pid=1971 comm="mkdir" name="/" dev="selinuxfs" ino=1 scontext=system_u:system_r:consolekit_t:s0-s0:c0.c255 tcontext=system_u:object_r:security_t:s0 tclass=filesystem
type=SYSCALL msg=audit(1391217013.924:386): arch=c000003e syscall=137 success=yes exit=0 a0=7f019dfc8bd7 a1=7fffa6aed2c0 a2=fffffffffff4bd25 a3=7fffa6aed050 items=0 ppid=1967 pid=1971 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mkdir" exe="/bin/mkdir" subj=system_u:system_r:consolekit_t:s0-s0:c0.c255 key=(null)
type=UNKNOWN[1327] msg=audit(1391217013.924:386): proctitle=6D6B646972002D70002F7661722F72756E2F636F6E736F6C65
Acked-by: Steve Grubb <sgrubb@redhat.com> (wrt record formating)
Signed-off-by: William Roberts <wroberts@tresys.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The point of this measure is to gauge why a process has a lot of gem
objects in uses and why. Especially for compositors it's interesting
to know whether it's a leak of private objects or just a lot of use
from buffers shared with clients.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Add a bit of commit message flesh to address Ben's comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The idea of printing objects used by each process is to judge how each
process is using them. This means that we need to evaluate whether the
object is bound for that particular process, rather than just whether it
is bound into the global GTT.
v2: Restore the non-full-ppgtt path for simplicity as we may not even
create vma with older hardware.
v3: Tweak handling of global entries and default context entries.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the renamed RPS struct members, it's easier to skip the local
variables which no longer clarify anything, and if anything just make
the code harder to read.
The real motivation for this patch is actually the next patch, which
attempts to consolidate some of the functionality.
Cc: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The values created at initialization must always exist to use the
interface. Reading them again is confusing, and pointless.
More cleanups are coming in the next patch. Since I am not 100% certain,
moreover on BYT, (though I am extremely close to that) that there is no
need to leave the MMIO here, I wanted to make it a separate patch for
the bisectable 'just-in-case'
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The names of the struct members for RPS are stupid. Every time I need to
do anything in this code I have to spend a significant amount of time to
remember what it all means. By renaming the variables (and adding the
comments) I hope to clear up the situation. Indeed doing this make some
upcoming patches more readable.
I've avoided ILK because it's possible that the naming used for Ironlake
matches what is in the docs. I believe the ILK power docs were never
published, and I am too lazy to dig them up.
v2: leave rp0, and rp1 in the names. It is useful to have these limits
available at times. min_freq and max_freq (which may be equal to rp0, or
rp1 depending on the platform) represent the actual HW min and max.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
this leaves a temporarily awkward min_delay (the soft limit) with the
new min_freq (the hardware limit). It's fixed in the next patch.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The existing code (which I changed last) was very convoluted. I believe
it was attempting to skip the overclock portion if the previous pcode
write failed. When I last touched the code, I was preserving this
behavior. There is some benefit to doing it that way in that if the
first pcode access fails, the later is likely invalid.
Having a bit more confidence in my understanding of how things work, I
now feel it's better to have clear, readable, code than to try to skip
over this one operation in an unusual case.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If multiple redundant fsync calls are triggered, we don't need to write its
node pages with fsync mark continuously.
So, this patch adds FI_NEED_FSYNC to track whether the latest node block is
written with the fsync mark or not.
If the mark was set, a new fsync doesn't need to write a node block.
Otherwise, we should do a new node block with the mark for roll-forward
recovery.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch introduces fi->i_sem to protect fi's info that includes xattr_ver,
pino, i_nlink.
This enables to remove i_mutex during f2fs_sync_file, resulting in performance
improvement when a number of fsync calls are triggered from many concurrent
threads.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
It is more reasonable to determine the reclaiming rate of prefree segments
according to the volume size, which is set to 5% by default.
For example, if the volume is 128GB, the prefree segments are reclaimed
when the number reaches to 6.4GB.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The NM_WOUT_THRESHOLD is now obsolete since f2fs starts to control on a basis
of the memory footprint.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch introduces ram_thresh, a sysfs entry, which controls the memory
footprint used by the free nid list and the nat cache.
Previously, the free nid list was controlled by MAX_FREE_NIDS, while the nat
cache was managed by NM_WOUT_THRESHOLD.
However, this approach cannot be applied dynamically according to the system.
So, this patch adds ram_thresh that users can specify the threshold, which is
in order of 1 / 1024.
For example, if the total ram size is 4GB and the value is set to 10 by default,
f2fs tries to control the number of free nids and nat caches not to consume over
10 * (4GB / 1024) = 10MB.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The try_to_free_nats should not receive the negative nr_shrink.
Otherwise, it can drop all the nat entries by the while loop.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
If a page is on writeback, f2fs can face with deadlock due to under writepages.
This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw
merged IOs, which is implemented by f2fs_wait_on_page_writeback.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
For non-mipsr2 processors, the local_irq_disable contains an mfc0-mtc0
pair with instructions inbetween. With preemption enabled, this sequence
may get preempted and effect a stale value of CP0_STATUS when executing
the mtc0 instruction. This commit avoids this scenario by incrementing
the preempt count before the mfc0 and decrementing it after the mtc9.
[ralf@linux-mips.org: This patch is sorting out the part that were missed
by e97c5b6098 [MIPS: Make irqflags.h functions preempt-safe for non-mipsr2
cpus.] I also re-enabled the inclusion of <asm/asm-offsets.h> at the top
of <asm/asmmacro.h>].
Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: cernekee@gmail.com
Patchwork: https://patchwork.linux-mips.org/patch/6164/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).
Instead, the correct and race-free way of performing the callback
registration is:
cpu_notifier_register_begin();
for_each_online_cpu(cpu)
init_cpu(cpu);
/* Note the use of the double underscored version of the API */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_notifier_register_done();
Fix the code in net/iucv/iucv.c by using this latter form of callback
registration. Also, provide helper functions to perform the common memory
allocations and frees, to condense repetitive code.
Cc: Ursula Braun <ursula.braun@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).
Instead, the correct and race-free way of performing the callback
registration is:
cpu_notifier_register_begin();
for_each_online_cpu(cpu)
init_cpu(cpu);
/* Note the use of the double underscored version of the API */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_notifier_register_done();
Fix the code in net/core/flow.c by using this latter form of callback
registration.
Cc: Li RongQing <roy.qing.li@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).
Instead, the correct and race-free way of performing the callback
registration is:
cpu_notifier_register_begin();
for_each_online_cpu(cpu)
init_cpu(cpu);
/* Note the use of the double underscored version of the API */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_notifier_register_done();
Fix the zswap code by using this latter form of callback registration.
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).
Instead, the correct and race-free way of performing the callback
registration is:
cpu_notifier_register_begin();
for_each_online_cpu(cpu)
init_cpu(cpu);
/* Note the use of the double underscored version of the API */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_notifier_register_done();
Fix the vmstat code in the MM subsystem by using this latter form of callback
registration.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).
Instead, the correct and race-free way of performing the callback
registration is:
cpu_notifier_register_begin();
for_each_online_cpu(cpu)
init_cpu(cpu);
/* Note the use of the double underscored version of the API */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_notifier_register_done();
Fix the profile code by using this latter form of callback registration.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>