linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-02 19:23:01 +09:00

Author	SHA1	Message	Date
Jiamin Ma	8e96d0d385	printk: a fix for log output disordering PD#154008: the log output is in disorder The defination for a continues line in printk is much more strict from 3.14 to 4.9: in kernel 3.14, if the first fragment does not end with CR, and the next fragment does not start with LOG_PREFIX(KERN_ALERT, KERN_ERR and so on), then they are in a continues line eg. pr_err("foo "); printk("bar\n") or pr_err("foo "); pr_cont("bar\n"); both are printing a continues line in kernel 4.9, if only the first fragment does not end with CR, and the next fragment start with LOG_CONT, then they are in a continues line eg. pr_err("foo "); printk("bar\n"); are not printing a continues line and pr_err("foo "); pr_cont("bar\n"); are printing a continues line but in the code path of crash info dumping in kernel 4.9.y, not all of the continues line printing has been switched to the 4.9 way(aka. calling pr_cont). Only in kernel 4.13.y all of that has been updated. so in this commit, we lose the definition of continues line back to 3.14 way to sovle current issue, and need to revert it when we sync with upstream kernel 4.13.y Change-Id: I64403d3a18531ceb832b41d96dff4b79a6d7fb5a Signed-off-by: Jiamin Ma <jiamin.ma@amlogic.com>	2017-11-07 22:43:05 -07:00
Victor Wan	fa1a711427	Merge branch 'android-4.9' into amlogic-4.9-dev	2017-10-19 08:43:31 +08:00
Victor Wan	ff27e33bf8	Merge branch 'android-4.9' into amlogic-4.9-dev	2017-10-09 15:30:18 +08:00
Joel Fernandes	2b3a26c86b	FROMLIST: tracing: Add support for preempt and irq enable/disable events Preempt and irq trace events can be used for tracing the start and end of an atomic section which can be used by a trace viewer like systrace to graphically view the start and end of an atomic section and correlate them with latencies and scheduling issues. This also serves as a prelude to using synthetic events or probes to rewrite the preempt and irqsoff tracers, along with numerous benefits of using trace events features for these events. Change-Id: I4f992714c0161a415b17a03aa14cce823d8ca3bb Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zilstra <peterz@infradead.org> Cc: kernel-team@android.com Link: https://patchwork.kernel.org/patch/9988157/ Signed-off-by: Joel Fernandes <joelaf@google.com>	2017-10-06 16:20:25 -07:00
Joel Fernandes	a7e410c135	FROMLIST: tracing: Prepare to add preempt and irq trace events In preparation of adding irqsoff and preemptsoff enable and disable trace events, move required functions and code to make it easier to add these events in a later patch. This patch is just code movement and no functional change. Change-Id: I587d411da5efbc4959bcccd7a05c7a66c231e1e0 Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@android.com Link: https://patchwork.kernel.org/patch/9988159/ Signed-off-by: Joel Fernandes <joelaf@google.com>	2017-10-06 16:19:52 -07:00
Greg Kroah-Hartman	379e3b2a6d	Merge 4.9.53 into android-4.9 Changes in 4.9.53 cifs: release cifs root_cred after exit_cifs cifs: release auth_key.response for reconnect. fs/proc: Report eip/esp in /prod/PID/stat for coredumping mac80211: fix VLAN handling with TXQs mac80211_hwsim: Use proper TX power mac80211: flush hw_roc_start work before cancelling the ROC genirq: Make sparse_irq_lock protect what it should protect KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce() KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list tracing: Fix trace_pipe behavior for instance traces tracing: Erase irqsoff trace with empty write md/raid5: fix a race condition in stripe batch md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly drm/radeon: disable hard reset in hibernate for APUs crypto: drbg - fix freeing of resources crypto: talitos - Don't provide setkey for non hmac hashing algs. crypto: talitos - fix sha224 crypto: talitos - fix hashing security/keys: properly zero out sensitive key material in big_key security/keys: rewrite all of big_key crypto KEYS: fix writing past end of user-supplied buffer in keyring_read() KEYS: prevent creating a different user's keyrings KEYS: prevent KEYCTL_READ on negative key powerpc/pseries: Fix parent_dn reference leak in add_dt_node() powerpc/tm: Flush TM only if CPU has TM feature powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS s390/mm: fix write access check in gup_huge_pmd() PM: core: Fix device_pm_check_callbacks() Fix SMB3.1.1 guest authentication to Samba SMB3: Warn user if trying to sign connection that authenticated as guest SMB: Validate negotiate (to protect against downgrade) even if signing off SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets nl80211: check for the required netlink attributes presence bsg-lib: don't free job in bsg_prepare_job iw_cxgb4: remove the stid on listen create failure iw_cxgb4: put ep reference in pass_accept_req() selftests/seccomp: Support glibc 2.26 siginfo_t.h seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter() arm64: Make sure SPsel is always set arm64: fault: Route pte translation faults via do_translation_fault KVM: VMX: extract __pi_post_block KVM: VMX: avoid double list add with VT-d posted interrupts KVM: VMX: simplify and fix vmx_vcpu_pi_load kvm/x86: Handle async PF in RCU read-side critical sections KVM: VMX: Do not BUG() on out-of-bounds guest IRQ kvm: nVMX: Don't allow L2 to access the hardware CR8 xfs: validate bdev support for DAX inode flag etnaviv: fix gem object list corruption PCI: Fix race condition with driver_override btrfs: fix NULL pointer dereference from free_reloc_roots() btrfs: propagate error to btrfs_cmp_data_prepare caller btrfs: prevent to set invalid default subvolid x86/mm: Fix fault error path using unsafe vma pointer x86/fpu: Don't let userspace set bogus xcomp_bv gfs2: Fix debugfs glocks dump timer/sysclt: Restrict timer migration sysctl values to 0 and 1 KVM: VMX: do not change SN bit in vmx_update_pi_irte() KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt cxl: Fix driver use count KVM: VMX: use cmpxchg64 video: fbdev: aty: do not leak uninitialized padding in clk to userspace swiotlb-xen: implement xen_swiotlb_dma_mmap callback Linux 4.9.53 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-10-05 10:37:37 +02:00
Myungho Jung	4c00015385	timer/sysclt: Restrict timer migration sysctl values to 0 and 1 commit `b94bf594cf` upstream. timer_migration sysctl acts as a boolean switch, so the allowed values should be restricted to 0 and 1. Add the necessary extra fields to the sysctl table entry to enforce that. [ tglx: Rewrote changelog ] Signed-off-by: Myungho Jung <mhjungk@gmail.com> Link: http://lkml.kernel.org/r/1492640690-3550-1-git-send-email-mhjungk@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kazuhiro Hayashi <kazuhiro3.hayashi@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-05 09:44:04 +02:00
Oleg Nesterov	be69c4c00a	seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter() commit `66a733ea6b` upstream. As Chris explains, get_seccomp_filter() and put_seccomp_filter() can end up using different filters. Once we drop ->siglock it is possible for task->seccomp.filter to have been replaced by SECCOMP_FILTER_FLAG_TSYNC. Fixes: `f8e529ed94` ("seccomp, ptrace: add support for dumping seccomp filters") Reported-by: Chris Salls <chrissalls5@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> [tycho: add __get_seccomp_filter vs. open coding refcount_inc()] Signed-off-by: Tycho Andersen <tycho@docker.com> [kees: tweak commit log] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-05 09:44:02 +02:00
Bo Yan	5fb4be27da	tracing: Erase irqsoff trace with empty write commit `8dd33bcb70` upstream. One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com Cc: <mingo@redhat.com> Fixes: `4acd4d00f` ("tracing: give easy way to clear trace buffer") Signed-off-by: Bo Yan <byan@nvidia.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-05 09:43:59 +02:00
Tahsin Erdogan	97d402e6ee	tracing: Fix trace_pipe behavior for instance traces commit `75df6e688c` upstream. When reading data from trace_pipe, tracing_wait_pipe() performs a check to see if tracing has been turned off after some data was read. Currently, this check always looks at global trace state, but it should be checking the trace instance where trace_pipe is located at. Because of this bug, cat instances/i1/trace_pipe in the following script will immediately exit instead of waiting for data: cd /sys/kernel/debug/tracing echo 0 > tracing_on mkdir -p instances/i1 echo 1 > instances/i1/tracing_on echo 1 > instances/i1/events/sched/sched_process_exec/enable cat instances/i1/trace_pipe Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com Fixes: `10246fa35d` ("tracing: give easy way to clear trace buffer") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-05 09:43:59 +02:00
Thomas Gleixner	3d5960c8c6	genirq: Make sparse_irq_lock protect what it should protect commit `12ac1d0f6c` upstream. for_each_active_irq() iterates the sparse irq allocation bitmap. The caller must hold sparse_irq_lock. Several code pathes expect that an active bit in the sparse bitmap also has a valid interrupt descriptor. Unfortunately that's not true. The (de)allocation is a two step process, which holds the sparse_irq_lock only across the queue/remove from the radix tree and the set/clear in the allocation bitmap. If a iteration locks sparse_irq_lock between the two steps, then it might see an active bit but the corresponding irq descriptor is NULL. If that is dereferenced unconditionally, then the kernel oopses. Of course, all iterator sites could be audited and fixed, but.... There is no reason why the sparse_irq_lock needs to be dropped between the two steps, in fact the code becomes simpler when the mutex is held across both and the semantics become more straight forward, so future problems of missing NULL pointer checks in the iteration are avoided and all existing sites are fixed in one go. Expand the lock held sections so both operations are covered and the bitmap and the radixtree are in sync. Fixes: `a05a900a51` ("genirq: Make sparse_lock a mutex") Reported-and-tested-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-05 09:43:58 +02:00
Steve Muckle	96a28fcc7c	ANDROID: add script to fetch android kernel config fragments The Android kernel config fragments now live in a separate repository. To prevent others from having to search for this location, add a script to fetch and unpack the fragments. Update .gitignore to include these fragments. Change-Id: If2d4a59b86e4573b0a9b3190025dfe4191870b46 Signed-off-by: Steve Muckle <smuckle@google.com>	2017-10-03 17:19:26 +00:00
Greg Kroah-Hartman	c30c69c76c	Merge 4.9.52 into android-4.9 Changes in 4.9.52 SUNRPC: Refactor svc_set_num_threads() NFSv4: Fix callback server shutdown mm: prevent double decrease of nr_reserved_highatomic orangefs: Don't clear SGID when inheriting ACLs IB/{qib, hfi1}: Avoid flow control testing for RDMA write operation drm/sun4i: Implement drm_driver lastclose to restore fbdev console IB/addr: Fix setting source address in addr6_resolve() tty: improve tty_insert_flip_char() fast path tty: improve tty_insert_flip_char() slow path tty: fix __tty_insert_flip_char regression pinctrl/amd: save pin registers over suspend/resume Input: i8042 - add Gigabyte P57 to the keyboard reset table MIPS: math-emu: <MAX\|MAXA\|MIN\|MINA>.<D\|S>: Fix quiet NaN propagation MIPS: math-emu: <MAX\|MAXA\|MIN\|MINA>.<D\|S>: Fix cases of both inputs zero MIPS: math-emu: <MAX\|MIN>.<D\|S>: Fix cases of both inputs negative MIPS: math-emu: <MAXA\|MINA>.<D\|S>: Fix cases of input values with opposite signs MIPS: math-emu: <MAXA\|MINA>.<D\|S>: Fix cases of both infinite inputs MIPS: math-emu: MINA.<D\|S>: Fix some cases of infinity and zero inputs MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately MIPS: math-emu: <MADDF\|MSUBF>.<D\|S>: Fix NaN propagation MIPS: math-emu: <MADDF\|MSUBF>.<D\|S>: Fix some cases of infinite inputs MIPS: math-emu: <MADDF\|MSUBF>.<D\|S>: Fix some cases of zero inputs MIPS: math-emu: <MADDF\|MSUBF>.<D\|S>: Clean up "maddf_flags" enumeration MIPS: math-emu: <MADDF\|MSUBF>.S: Fix accuracy (32-bit case) MIPS: math-emu: <MADDF\|MSUBF>.D: Fix accuracy (64-bit case) crypto: ccp - Fix XTS-AES-128 support on v5 CCPs crypto: AF_ALG - remove SGL terminator indicator when chaining ext4: fix incorrect quotaoff if the quota feature is enabled ext4: fix quota inconsistency during orphan cleanup for read-only mounts powerpc: Fix DAR reporting when alignment handler faults block: Relax a check in blk_start_queue() md/bitmap: disable bitmap_resize for file-backed bitmaps. skd: Avoid that module unloading triggers a use-after-free skd: Submit requests to firmware before triggering the doorbell scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA scsi: zfcp: fix missing trace records for early returns in TMF eh handlers scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response scsi: zfcp: trace high part of "new" 64 bit SCSI LUN scsi: megaraid_sas: set minimum value of resetwaittime to be 1 secs scsi: megaraid_sas: Check valid aen class range to avoid kernel panic scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead scsi: storvsc: fix memory leak on ring buffer busy scsi: sg: remove 'save_scat_len' scsi: sg: use standard lists for sg_requests scsi: sg: off by one in sg_ioctl() scsi: sg: factor out sg_fill_request_table() scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE scsi: qla2xxx: Correction to vha->vref_count timeout scsi: qla2xxx: Fix an integer overflow in sysfs code ftrace: Fix selftest goto location on error ftrace: Fix memleak when unregistering dynamic ops when tracing disabled tracing: Add barrier to trace_printk() buffer nesting modification tracing: Apply trace_clock changes to instance max buffer ARC: Re-enable MMU upon Machine Check exception PCI: shpchp: Enable bridge bus mastering if MSI is enabled PCI: pciehp: Report power fault only once until we clear it net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() s390/mm: fix local TLB flushing vs. detach of an mm address space s390/mm: fix race on mm->context.flush_mm media: v4l2-compat-ioctl32: Fix timespec conversion media: uvcvideo: Prevent heap overflow when accessing mapped controls PM / devfreq: Fix memory leak when fail to register device bcache: initialize dirty stripes in flash_dev_run() bcache: Fix leak of bdev reference bcache: do not subtract sectors_to_gc for bypassed IO bcache: correct cache_dirty_target in __update_writeback_rate() bcache: Correct return value for sysfs attach errors bcache: fix for gc and write-back race bcache: fix bch_hprint crash and improve output Linux 4.9.52 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-09-27 14:56:06 +02:00
Baohong Liu	cf052336d0	tracing: Apply trace_clock changes to instance max buffer commit `170b3b1050` upstream. Currently trace_clock timestamps are applied to both regular and max buffers only for global trace. For instance trace, trace_clock timestamps are applied only to regular buffer. But, regular and max buffers can be swapped, for example, following a snapshot. So, for instance trace, bad timestamps can be seen following a snapshot. Let's apply trace_clock timestamps to instance max buffer as well. Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com Fixes: `277ba0446` ("tracing: Add interface to allow multiple trace buffers") Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-27 14:39:23 +02:00
Steven Rostedt (VMware)	96cf918df4	tracing: Add barrier to trace_printk() buffer nesting modification commit `3d9622c12c` upstream. trace_printk() uses 4 buffers, one for each context (normal, softirq, irq and NMI), such that it does not need to worry about one context preempting the other. There's a nesting counter that gets incremented to figure out which buffer to use. If the context gets preempted by another context which calls trace_printk() it will increment the counter and use the next buffer, and restore the counter when it is finished. The problem is that gcc may optimize the modification of the buffer nesting counter and it may not be incremented in memory before the buffer is used. If this happens, and the context gets interrupted by another context, it could pick the same buffer and corrupt the one that is being used. Compiler barriers need to be added after the nesting variable is incremented and before it is decremented to prevent usage of the context buffers by more than one context at the same time. Cc: Andy Lutomirski <luto@kernel.org> Fixes: `e2ace00117` ("tracing: Choose static tp_printk buffer by explicit nesting count") Hat-tip-to: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-27 14:39:23 +02:00
Steven Rostedt (VMware)	100553e197	ftrace: Fix memleak when unregistering dynamic ops when tracing disabled commit `edb096e007` upstream. If function tracing is disabled by the user via the function-trace option or the proc sysctl file, and a ftrace_ops that was allocated on the heap is unregistered, then the shutdown code exits out without doing the proper clean up. This was found via kmemleak and running the ftrace selftests, as one of the tests unregisters with function tracing disabled. # cat kmemleak unreferenced object 0xffffffffa0020000 (size 4096): comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s) hex dump (first 32 bytes): 55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89 U.t$.UH...t$.UH. e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c .H......H.D$PH.L backtrace: [<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0 [<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0 [<ffffffff8109697f>] module_alloc+0x4f/0x90 [<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420 [<ffffffff81249947>] ftrace_startup+0xe7/0x300 [<ffffffff81249bd2>] register_ftrace_function+0x72/0x90 [<ffffffff81263786>] trace_selftest_ops+0x204/0x397 [<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624 [<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7 [<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192 [<ffffffff81002230>] do_one_initcall+0x90/0x1e2 [<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe [<ffffffff81d61ec3>] kernel_init+0x13/0x122 [<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40 [<ffffffffffffffff>] 0xffffffffffffffff Fixes: `12cce594fa` ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-27 14:39:23 +02:00
Steven Rostedt (VMware)	df865f86b0	ftrace: Fix selftest goto location on error commit `46320a6acc` upstream. In the second iteration of trace_selftest_ops(), the error goto label is wrong in the case where trace_selftest_test_global_cnt is off. In the case of error, it leaks the dynamic ops that was allocated. Fixes: `95950c2e` ("ftrace: Add self-tests for multiple function trace users") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-27 14:39:23 +02:00
Vincent Guittot	0b4a2f1d16	UPSTREAM: sched/fair: Fix FTQ noise bench regression commit `bc4278987e` upstream. A regression of the FTQ noise has been reported by Ying Huang, on the following hardware: 8 threads Intel(R) Core(TM)i7-4770 CPU @ 3.40GHz with 8G memory ... which was caused by this commit: commit `4e5160766f` ("sched/fair: Propagate asynchrous detach") The only part of the patch that can increase the noise is the update of blocked load of group entity in update_blocked_averages(). We can optimize this call and skip the update of group entity if its load and utilization are already null and there is no pending propagation of load in the task group. This optimization partly restores the noise score. A more agressive optimization has been tried but has shown worse score. Reported-by: ying.huang@linux.intel.com Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: ying.huang@intel.com Fixes: `4e5160766f` ("sched/fair: Propagate asynchrous detach") Link: http://lkml.kernel.org/r/1489758442-2877-1-git-send-email-vincent.guittot@linaro.org [ Fixed typos, improved layout. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Change-Id: Iba93bbda71125ce14519aa30b1beac2bf3c52e1e Fixes: Change-Id: I5d1a9f273111c0bb7503ca3c9ad2406d12867cb5 ("UPSTREAM: sched/fair: Propagate asynchrous detach") Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-09-23 01:23:27 +00:00
Steve Muckle	faf1269d4b	ANDROID: configs: remove config fragments The kernel config fragments for Android have moved into their own repository located at https://android.googlesource.com/kernel/configs/ Bug: 63994171 Change-Id: I837bac54cb5c90e6a6eb0f6f0ad5c90588c1a46a Signed-off-by: Steve Muckle <smuckle@google.com>	2017-09-15 16:17:26 -07:00
Victor Wan	50d9fb95d9	Merge branch 'android-4.9' into amlogic-4.9-dev	2017-09-15 14:23:48 +08:00
Greg Kroah-Hartman	f7d2974f34	Merge 4.9.50 into android-4.9 Changes in 4.9.50 mtd: nand: mxc: Fix mxc_v1 ooblayout mtd: nand: qcom: fix read failure without complete bootchain mtd: nand: qcom: fix config error for BCH nvme-fabrics: generate spec-compliant UUID NQNs btrfs: resume qgroup rescan on rw remount selftests/x86/fsgsbase: Test selectors 1, 2, and 3 mm/memory.c: fix mem_cgroup_oom_disable() call missing locktorture: Fix potential memory leak with rw lock test ALSA: msnd: Optimize / harden DSP and MIDI loops Bluetooth: Properly check L2CAP config option output buffer length ARM64: dts: marvell: armada-37xx: Fix GIC maintenance interrupt ARM: 8692/1: mm: abort uaccess retries upon fatal signal NFS: Fix 2 use after free issues in the I/O code NFS: Sync the correct byte range during synchronous writes xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present Linux 4.9.50 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-09-14 09:14:13 -07:00
Yang Shi	d21f3eaa09	locktorture: Fix potential memory leak with rw lock test commit `f4dbba5919` upstream. When running locktorture module with the below commands with kmemleak enabled: $ modprobe locktorture torture_type=rw_lock_irq $ rmmod locktorture The below kmemleak got caught: root@10:~# echo scan > /sys/kernel/debug/kmemleak [ 323.197029] kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak) root@10:~# cat /sys/kernel/debug/kmemleak unreferenced object 0xffffffc07592d500 (size 128): comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 c3 7b 02 00 00 00 00 00 .........{...... 00 00 00 00 00 00 00 00 d7 9b 02 00 00 00 00 00 ................ backtrace: [<ffffff80081e5a88>] create_object+0x110/0x288 [<ffffff80086c6078>] kmemleak_alloc+0x58/0xa0 [<ffffff80081d5acc>] __kmalloc+0x234/0x318 [<ffffff80006fa130>] 0xffffff80006fa130 [<ffffff8008083ae4>] do_one_initcall+0x44/0x138 [<ffffff800817e28c>] do_init_module+0x68/0x1cc [<ffffff800811c848>] load_module+0x1a68/0x22e0 [<ffffff800811d340>] SyS_finit_module+0xe0/0xf0 [<ffffff80080836f0>] el0_svc_naked+0x24/0x28 [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffffffc07592d480 (size 128): comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 3b 6f 01 00 00 00 00 00 ........;o...... 00 00 00 00 00 00 00 00 23 6a 01 00 00 00 00 00 ........#j...... backtrace: [<ffffff80081e5a88>] create_object+0x110/0x288 [<ffffff80086c6078>] kmemleak_alloc+0x58/0xa0 [<ffffff80081d5acc>] __kmalloc+0x234/0x318 [<ffffff80006fa22c>] 0xffffff80006fa22c [<ffffff8008083ae4>] do_one_initcall+0x44/0x138 [<ffffff800817e28c>] do_init_module+0x68/0x1cc [<ffffff800811c848>] load_module+0x1a68/0x22e0 [<ffffff800811d340>] SyS_finit_module+0xe0/0xf0 [<ffffff80080836f0>] el0_svc_naked+0x24/0x28 [<ffffffffffffffff>] 0xffffffffffffffff It is because cxt.lwsa and cxt.lrsa don't get freed in module_exit, so free them in lock_torture_cleanup() and free writer_tasks if reader_tasks is failed at memory allocation. Signed-off-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: 石洋 <yang.s@alibaba-inc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-13 14:13:36 -07:00
Steve Muckle	9983305173	ANDROID: configs: require SYNC_FILE CONFIG_SYNC_FILE is required in the absence of CONFIG_SYNC. Change-Id: Ia52488a273c9e7b3080defe8cea19afc4ad21aaa Signed-off-by: Steve Muckle <smuckle@google.com>	2017-09-07 08:47:51 +00:00
Greg Kroah-Hartman	85e1c0178a	Merge 4.9.48 into android-4.9 Changes in 4.9.48 irqchip: mips-gic: SYNC after enabling GIC region i2c: ismt: Don't duplicate the receive length for block reads i2c: ismt: Return EMSGSIZE for block reads with bogus length crypto: algif_skcipher - only call put_page on referenced and used pages mm, uprobes: fix multiple free of ->uprobes_state.xol_area mm, madvise: ensure poisoned pages are removed from per-cpu lists ceph: fix readpage from fscache cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs cpuset: Fix incorrect memory_pressure control file mapping alpha: uapi: Add support for __SANE_USERSPACE_TYPES__ CIFS: Fix maximum SMB2 header size CIFS: remove endian related sparse warning wl1251: add a missing spin_lock_init() lib/mpi: kunmap after finishing accessing buffer xfrm: policy: check policy direction value drm/ttm: Fix accounting error when fail to get pages for pool kvm: arm/arm64: Force reading uncached stage2 PGD epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() Linux 4.9.48 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-09-07 10:32:23 +02:00
Waiman Long	309e4dbfaf	cpuset: Fix incorrect memory_pressure control file mapping commit `1c08c22c87` upstream. The memory_pressure control file was incorrectly set up without a private value (0, by default). As a result, this control file was treated like memory_migrate on read. By adding back the FILE_MEMORY_PRESSURE private value, the correct memory pressure value will be returned. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `7dbdb199d3` ("cgroup: replace cftype->mode with CFTYPE_WORLD_WRITABLE") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-07 08:35:40 +02:00
Eric Biggers	17c564f629	mm, uprobes: fix multiple free of ->uprobes_state.xol_area commit `355627f518` upstream. Commit `7c05126793` ("mm, fork: make dup_mmap wait for mmap_sem for write killable") made it possible to kill a forking task while it is waiting to acquire its ->mmap_sem for write, in dup_mmap(). However, it was overlooked that this introduced an new error path before the new mm_struct's ->uprobes_state.xol_area has been set to NULL after being copied from the old mm_struct by the memcpy in dup_mm(). For a task that has previously hit a uprobe tracepoint, this resulted in the 'struct xol_area' being freed multiple times if the task was killed at just the right time while forking. Fix it by setting ->uprobes_state.xol_area to NULL in mm_init() rather than in uprobe_dup_mmap(). With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C program given by commit `2b7e8665b4` ("fork: fix incorrect fput of ->exe_file causing use-after-free"), provided that a uprobe tracepoint has been set on the fork_thread() function. For example: $ gcc reproducer.c -o reproducer -lpthread $ nm reproducer \| grep fork_thread 0000000000400719 t fork_thread $ echo "p $PWD/reproducer:0x719" > /sys/kernel/debug/tracing/uprobe_events $ echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable $ ./reproducer Here is the use-after-free reported by KASAN: BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200 Read of size 8 at addr ffff8800320a8b88 by task reproducer/198 CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f3fb5 #255 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 Call Trace: dump_stack+0xdb/0x185 print_address_description+0x7e/0x290 kasan_report+0x23b/0x350 __asan_report_load8_noabort+0x19/0x20 uprobe_clear_state+0x1c4/0x200 mmput+0xd6/0x360 do_exit+0x740/0x1670 do_group_exit+0x13f/0x380 get_signal+0x597/0x17d0 do_signal+0x99/0x1df0 exit_to_usermode_loop+0x166/0x1e0 syscall_return_slowpath+0x258/0x2c0 entry_SYSCALL_64_fastpath+0xbc/0xbe ... Allocated by task 199: save_stack_trace+0x1b/0x20 kasan_kmalloc+0xfc/0x180 kmem_cache_alloc_trace+0xf3/0x330 __create_xol_area+0x10f/0x780 uprobe_notify_resume+0x1674/0x2210 exit_to_usermode_loop+0x150/0x1e0 prepare_exit_to_usermode+0x14b/0x180 retint_user+0x8/0x20 Freed by task 199: save_stack_trace+0x1b/0x20 kasan_slab_free+0xa8/0x1a0 kfree+0xba/0x210 uprobe_clear_state+0x151/0x200 mmput+0xd6/0x360 copy_process.part.8+0x605f/0x65d0 _do_fork+0x1a5/0xbd0 SyS_clone+0x19/0x20 do_syscall_64+0x22f/0x660 return_from_SYSCALL_64+0x0/0x7a Note: without KASAN, you may instead see a "Bad page state" message, or simply a general protection fault. Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com Fixes: `7c05126793` ("mm, fork: make dup_mmap wait for mmap_sem for write killable") Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-07 08:35:39 +02:00
Steve Muckle	d3b81b5a2f	ANDROID: configs: remove requirement for CONFIG_SYNC CONFIG_SYNC was removed in 4.8. Bug: 65268824 Change-Id: If4cbacbb55c034f2dd0f0c7bb36d638d83fcdab2 Signed-off-by: Steve Muckle <smuckle@google.com>	2017-09-06 11:06:00 -07:00
Jiamin Ma	3f88c71f4d	scheduler: remove log info for cpu capacity updating PD#138714: too much log info for cpu capacity updating Change-Id: I14164903950afc2f29d7446d743ba0b0062edc19 Signed-off-by: Jiamin Ma <jiamin.ma@amlogic.com>	2017-09-05 19:51:57 -07:00
Greg Kroah-Hartman	6cd2127851	Merge 4.9.47 into android-4.9 Changes in 4.9.47 p54: memset(0) whole array scsi: isci: avoid array subscript warning staging: wilc1000: simplify vif[i]->ndev accesses gcov: support GCC 7.1 kvm: arm/arm64: Fix race in resetting stage2 PGD arm64: mm: abort uaccess retries upon fatal signal x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl arm64: fpsimd: Prevent registers leaking across exec locking/spinlock/debug: Remove spinlock lockup detection code scsi: sg: protect accesses to 'reserved' page array scsi: sg: reset 'res_in_use' after unlinking reserved array lz4: fix bogus gcc warning Linux 4.9.47 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-09-04 13:25:05 +02:00
Waiman Long	c0c6dff923	locking/spinlock/debug: Remove spinlock lockup detection code commit `bc88c10d7e` upstream. The current spinlock lockup detection code can sometimes produce false positives because of the unfairness of the locking algorithm itself. So the lockup detection code is now removed. Instead, we are relying on the NMI watchdog to detect potential lockup. We won't have lockup detection if the watchdog isn't running. The commented-out read-write lock lockup detection code are also removed. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486583208-11038-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-02 07:07:54 +02:00
Martin Liska	b8a1532b16	gcov: support GCC 7.1 commit `0538421343` upstream. Starting from GCC 7.1, __gcov_exit is a new symbol expected to be implemented in a profiling runtime. [akpm@linux-foundation.org: coding-style fixes] [mliska@suse.cz: v2] Link: http://lkml.kernel.org/r/e63a3c59-0149-c97e-4084-20ca8f146b26@suse.cz Link: http://lkml.kernel.org/r/8c4084fa-3885-29fe-5fc4-0d4ca199c785@suse.cz Signed-off-by: Martin Liska <mliska@suse.cz> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-09-02 07:07:53 +02:00
Amit Pundir	8c972e1bb1	android: android-base.config: enable IP6_NF_MATCH_RPFILTER USB tethering has a hard dependency on ip6t_rpfilter module now. So enable IP6_NF_MATCH_RPFILTER config to make it work again, otherwise we run into following failures when we try to enable USB tethering on Hikey: W IptablesRestoreController: iptables-restore process 1893 terminated status=256 E IptablesRestoreController: iptables error: E IptablesRestoreController: ------- COMMAND ------- E IptablesRestoreController: *raw E IptablesRestoreController: -A natctrl_raw_PREROUTING -i usb0 -m rpfilter --invert ! -s fe80::/64 -j DROP E IptablesRestoreController: COMMIT E IptablesRestoreController: E IptablesRestoreController: ------- ERROR ------- E IptablesRestoreController: ip6tables-restore: line 319 failed E IptablesRestoreController: ---------------------- E NatController: Error setting forward rules E Tethering: [usb0] ERROR Exception enabling NAT: java.lang.IllegalStateException: command '55 nat enable usb0 wlan0 1 192.168.42.0/24' failed with '400 55 Nat operation failed (No such device)' Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-08-31 07:57:37 +00:00
Victor Wan	3f653b7e57	Merge branch 'android-4.9' into amlogic-4.9-dev Conflicts: arch/arm/configs/s3c2410_defconfig drivers/mmc/core/mmc.c Signed-off-by: Victor Wan <victor.wan@amlogic.com>	2017-08-31 14:01:28 +08:00
Greg Kroah-Hartman	a3840b1234	Merge 4.9.46 into android-4.9 Changes in 4.9.46 sparc64: remove unnecessary log message af_key: do not use GFP_KERNEL in atomic contexts dccp: purge write queue in dccp_destroy_sock() dccp: defer ccid_hc_tx_delete() at dismantle time ipv4: fix NULL dereference in free_fib_info_rcu() net_sched/sfq: update hierarchical backlog when drop packet net_sched: remove warning from qdisc_hash_add bpf: fix bpf_trace_printk on 32 bit archs openvswitch: fix skb_panic due to the incorrect actions attrlen ptr_ring: use kmalloc_array() ipv4: better IP_MAX_MTU enforcement nfp: fix infinite loop on umapping cleanup sctp: fully initialize the IPv6 address in sctp_v6_to_addr() tipc: fix use-after-free ipv6: reset fn->rr_ptr when replacing route ipv6: repair fib6 tree in failure case tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled irda: do not leak initialized list.dev to userspace net: sched: fix NULL pointer dereference when action calls some targets net_sched: fix order of queue length updates in qdisc_replace() bpf, verifier: add additional patterns to evaluate_reg_imm_alu bpf: adjust verifier heuristics bpf, verifier: fix alu ops against map_value{, _adj} register types bpf: fix mixed signed/unsigned derived min/max value bounds bpf/verifier: fix min/max handling in BPF_SUB Input: trackpoint - add new trackpoint firmware ID Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310 Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad KVM: s390: sthyi: fix sthyi inline assembly KVM: s390: sthyi: fix specification exception detection KVM: x86: block guest protection keys unless the host has them enabled ALSA: usb-audio: Add delay quirk for H650e/Jabra 550a USB headsets ALSA: core: Fix unexpected error at replacing user TLV ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978) ALSA: firewire: fix NULL pointer dereference when releasing uninitialized data of iso-resource ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled i2c: designware: Fix system suspend mm/madvise.c: fix freeing of locked page with MADV_FREE fork: fix incorrect fput of ->exe_file causing use-after-free mm/memblock.c: reversed logic in memblock_discard() drm: Release driver tracking before making the object available again drm/atomic: If the atomic check fails, return its value first drm: rcar-du: Fix crash in encoder failure error path drm: rcar-du: Fix display timing controller parameter drm: rcar-du: Fix H/V sync signal polarity configuration tracing: Call clear_boot_tracer() at lateinit_sync tracing: Fix kmemleak in tracing_map_array_free() tracing: Fix freeing of filter in create_filter() when set_str is false kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured cifs: Fix df output for users with quota limits cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() nfsd: Limit end of page list when decoding NFSv4 WRITE ftrace: Check for null ret_stack on profile function graph entry function perf/core: Fix group {cpu,task} validation perf probe: Fix --funcs to show correct symbols for offline module perf/x86/intel/rapl: Make package handling more robust timers: Fix excessive granularity of new timers after a nohz idle x86/mm: Fix use-after-free of ldt_struct net: sunrpc: svcsock: fix NULL-pointer exception Revert "leds: handle suspend/resume in heartbeat trigger" netfilter: nat: fix src map lookup Bluetooth: hidp: fix possible might sleep error in hidp_session_thread Bluetooth: cmtp: fix possible might sleep error in cmtp_session Bluetooth: bnep: fix possible might sleep error in bnep_session Revert "android: binder: Sanity check at binder ioctl" binder: use group leader instead of open thread binder: Use wake up hint for synchronous transactions. ANDROID: binder: fix proc->tsk check. iio: imu: adis16480: Fix acceleration scale factor for adis16480 iio: hid-sensor-trigger: Fix the race with user space powering up sensors staging: rtl8188eu: add RNX-N150NUB support Clarify (and fix) MAX_LFS_FILESIZE macros ntb_transport: fix qp count bug ntb_transport: fix bug calculating num_qps_mw NTB: ntb_test: fix bug printing ntb_perf results ntb: no sleep in ntb_async_tx_submit ntb: ntb_test: ensure the link is up before trying to configure the mws ntb: transport shouldn't disable link due to bogus values in SPADs ACPI: ioapic: Clear on-stack resource before using it ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal ACPI: EC: Fix regression related to wrong ECDT initialization order powerpc/mm: Ensure cpumask update is ordered Linux 4.9.46 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-08-30 15:24:10 +02:00
Nicholas Piggin	70b3fd5ce2	timers: Fix excessive granularity of new timers after a nohz idle commit `2fe59f507a` upstream. When a timer base is idle, it is forwarded when a new timer is added to ensure that granularity does not become excessive. When not idle, the timer tick is expected to increment the base. However there are several problems: - If an existing timer is modified, the base is forwarded only after the index is calculated. - The base is not forwarded by add_timer_on. - There is a window after a timer is restarted from a nohz idle, after it is marked not-idle and before the timer tick on this CPU, where a timer may be added but the ancient base does not get forwarded. These result in excessive granularity (a 1 jiffy timeout can blow out to 100s of jiffies), which cause the rcu lockup detector to trigger, among other things. Fix this by keeping track of whether the timer base has been idle since it was last run or forwarded, and if so then forward it before adding a new timer. There is still a case where mod_timer optimises the case of a pending timer mod with the same expiry time, where the timer can see excessive granularity relative to the new, shorter interval. A comment is added, but it's not changed because it is an important fastpath for networking. This has been tested and found to fix the RCU softlockup messages. Testing was also done with tracing to measure requested versus achieved wakeup latencies for all non-deferrable timers in an idle system (with no lockup watchdogs running). Wakeup latency relative to absolute latency is calculated (note this suffers from round-up skew at low absolute times) and analysed: max avg std upstream 506.0 1.20 4.68 patched 2.0 1.08 0.15 The bug was noticed due to the lockup detector Kconfig changes dropping it out of people's .configs and resulting in larger base clk skew When the lockup detectors are enabled, no CPU can go idle for longer than 4 seconds, which limits the granularity errors. Sub-optimal timer behaviour is observable on a smaller scale in that case: max avg std upstream 9.0 1.05 0.19 patched 2.0 1.04 0.11 Fixes: Fixes: `a683f390b9` ("timers: Forward the wheel clock whenever possible") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: David Miller <davem@davemloft.net> Cc: dzickus@redhat.com Cc: sfr@canb.auug.org.au Cc: mpe@ellerman.id.au Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: linuxarm@huawei.com Cc: abdhalee@linux.vnet.ibm.com Cc: John Stultz <john.stultz@linaro.org> Cc: akpm@linux-foundation.org Cc: paulmck@linux.vnet.ibm.com Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:51 +02:00
Mark Rutland	bde6608dd6	perf/core: Fix group {cpu,task} validation commit `64aee2a965` upstream. Regardless of which events form a group, it does not make sense for the events to target different tasks and/or CPUs, as this leaves the group inconsistent and impossible to schedule. The core perf code assumes that these are consistent across (successfully intialised) groups. Core perf code only verifies this when moving SW events into a HW context. Thus, we can violate this requirement for pure SW groups and pure HW groups, unless the relevant PMU driver happens to perform this verification itself. These mismatched groups subsequently wreak havoc elsewhere. For example, we handle watchpoints as SW events, and reserve watchpoint HW on a per-CPU basis at pmu::event_init() time to ensure that any event that is initialised is guaranteed to have a slot at pmu::add() time. However, the core code only checks the group leader's cpu filter (via event_filter_match()), and can thus install follower events onto CPUs violating thier (mismatched) CPU filters, potentially installing them into a CPU without sufficient reserved slots. This can be triggered with the below test case, resulting in warnings from arch backends. #define _GNU_SOURCE #include <linux/hw_breakpoint.h> #include <linux/perf_event.h> #include <sched.h> #include <stdio.h> #include <sys/prctl.h> #include <sys/syscall.h> #include <unistd.h> static int perf_event_open(struct perf_event_attr attr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); } char watched_char; struct perf_event_attr wp_attr = { .type = PERF_TYPE_BREAKPOINT, .bp_type = HW_BREAKPOINT_RW, .bp_addr = (unsigned long)&watched_char, .bp_len = 1, .size = sizeof(wp_attr), }; int main(int argc, char argv[]) { int leader, ret; cpu_set_t cpus; /* * Force use of CPU0 to ensure our CPU0-bound events get scheduled. / CPU_ZERO(&cpus); CPU_SET(0, &cpus); ret = sched_setaffinity(0, sizeof(cpus), &cpus); if (ret) { printf("Unable to set cpu affinity\n"); return 1; } / open leader event, bound to this task, CPU0 only / leader = perf_event_open(&wp_attr, 0, 0, -1, 0); if (leader < 0) { printf("Couldn't open leader: %d\n", leader); return 1; } / * Open a follower event that is bound to the same task, but a * different CPU. This means that the group should never be possible to * schedule. / ret = perf_event_open(&wp_attr, 0, 1, leader, 0); if (ret < 0) { printf("Couldn't open mismatched follower: %d\n", ret); return 1; } else { printf("Opened leader/follower with mismastched CPUs\n"); } / * Open as many independent events as we can, all bound to the same * task, CPU0 only. / do { ret = perf_event_open(&wp_attr, 0, 0, -1, 0); } while (ret >= 0); / * Force enable/disble all events to trigger the erronoeous * installation of the follower event. */ printf("Opened all events. Toggling..\n"); for (;;) { prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0); prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0); } return 0; } Fix this by validating this requirement regardless of whether we're moving events. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhou Chengming <zhouchengming1@huawei.com> Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:50 +02:00
Steven Rostedt (VMware)	741397d16a	ftrace: Check for null ret_stack on profile function graph entry function commit `a8f0f9e499` upstream. There's a small race when function graph shutsdown and the calling of the registered function graph entry callback. The callback must not reference the task's ret_stack without first checking that it is not NULL. Note, when a ret_stack is allocated for a task, it stays allocated until the task exits. The problem here, is that function_graph is shutdown, and a new task was created, which doesn't have its ret_stack allocated. But since some of the functions are still being traced, the callbacks can still be called. The normal function_graph code handles this, but starting with commit `8861dd303c` ("ftrace: Access ret_stack->subtime only in the function profiler") the profiler code references the ret_stack on function entry, but doesn't check if it is NULL first. Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611 Fixes: `8861dd303c` ("ftrace: Access ret_stack->subtime only in the function profiler") Reported-by: lilydjwg@gmail.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:50 +02:00
Steven Rostedt (VMware)	8838cd5c54	tracing: Fix freeing of filter in create_filter() when set_str is false commit `8b0db1a5bd` upstream. Performing the following task with kmemleak enabled: # cd /sys/kernel/tracing/events/irq/irq_handler_entry/ # echo 'enable_event:kmem:kmalloc:3 if irq >' > trigger # echo 'enable_event:kmem:kmalloc:3 if irq > 31' > trigger # echo scan > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8800b9290308 (size 32): comm "bash", pid 1114, jiffies 4294848451 (age 141.139s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81cef5aa>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81357938>] kmem_cache_alloc_trace+0x158/0x290 [<ffffffff81261c09>] create_filter_start.constprop.28+0x99/0x940 [<ffffffff812639c9>] create_filter+0xa9/0x160 [<ffffffff81263bdc>] create_event_filter+0xc/0x10 [<ffffffff812655e5>] set_trigger_filter+0xe5/0x210 [<ffffffff812660c4>] event_enable_trigger_func+0x324/0x490 [<ffffffff812652e2>] event_trigger_write+0x1a2/0x260 [<ffffffff8138cf87>] __vfs_write+0xd7/0x380 [<ffffffff8138f421>] vfs_write+0x101/0x260 [<ffffffff8139187b>] SyS_write+0xab/0x130 [<ffffffff81cfd501>] entry_SYSCALL_64_fastpath+0x1f/0xbe [<ffffffffffffffff>] 0xffffffffffffffff The function create_filter() is passed a 'filterp' pointer that gets allocated, and if "set_str" is true, it is up to the caller to free it, even on error. The problem is that the pointer is not freed by create_filter() when set_str is false. This is a bug, and it is not up to the caller to free the filter on error if it doesn't care about the string. Link: http://lkml.kernel.org/r/1502705898-27571-2-git-send-email-chuhu@redhat.com Fixes: `38b78eb85` ("tracing: Factorize filter creation") Reported-by: Chunyu Hu <chuhu@redhat.com> Tested-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:49 +02:00
Chunyu Hu	2818a7659f	tracing: Fix kmemleak in tracing_map_array_free() commit `475bb3c69a` upstream. kmemleak reported the below leak when I was doing clear of the hist trigger. With this patch, the kmeamleak is gone. unreferenced object 0xffff94322b63d760 (size 32): comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s) hex dump (first 32 bytes): 00 01 00 00 04 00 00 00 08 00 00 00 ff 00 00 00 ................ 10 00 00 00 00 00 00 00 80 a8 7a f2 31 94 ff ff ..........z.1... backtrace: [<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff9e424cba>] kmem_cache_alloc_trace+0xca/0x1d0 [<ffffffff9e377736>] tracing_map_array_alloc+0x26/0x140 [<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50 [<ffffffff9e38b935>] create_hist_data+0x535/0x750 [<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420 [<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0 [<ffffffff9e44dfc7>] __vfs_write+0x37/0x170 [<ffffffff9e44f552>] vfs_write+0xb2/0x1b0 [<ffffffff9e450b85>] SyS_write+0x55/0xc0 [<ffffffff9e203857>] do_syscall_64+0x67/0x150 [<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff9431f27aa880 (size 128): comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s) hex dump (first 32 bytes): 00 00 8c 2a 32 94 ff ff 00 f0 8b 2a 32 94 ff ff ...2......2... 00 e0 8b 2a 32 94 ff ff 00 d0 8b 2a 32 94 ff ff ...2......2... backtrace: [<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff9e425348>] __kmalloc+0xe8/0x220 [<ffffffff9e3777c1>] tracing_map_array_alloc+0xb1/0x140 [<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50 [<ffffffff9e38b935>] create_hist_data+0x535/0x750 [<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420 [<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0 [<ffffffff9e44dfc7>] __vfs_write+0x37/0x170 [<ffffffff9e44f552>] vfs_write+0xb2/0x1b0 [<ffffffff9e450b85>] SyS_write+0x55/0xc0 [<ffffffff9e203857>] do_syscall_64+0x67/0x150 [<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Link: http://lkml.kernel.org/r/1502705898-27571-1-git-send-email-chuhu@redhat.com Fixes: `08d43a5fa0` ("tracing: Add lock-free tracing_map") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:49 +02:00
Steven Rostedt (VMware)	3170d9abc5	tracing: Call clear_boot_tracer() at lateinit_sync commit `4bb0f0e73c` upstream. The clear_boot_tracer function is used to reset the default_bootup_tracer string to prevent it from being accessed after boot, as it originally points to init data. But since clear_boot_tracer() is called via the init_lateinit() call, it races with the initcall for registering the hwlat tracer. If someone adds "ftrace=hwlat" to the kernel command line, depending on how the linker sets up the text, the saved command line may be cleared, and the hwlat tracer never is initialized. Simply have the clear_boot_tracer() be called by initcall_lateinit_sync() as that's for tasks to be called after lateinit. Link: https://bugzilla.kernel.org/show_bug.cgi?id=196551 Fixes: `e7c15cd8a` ("tracing: Added hardware latency tracer") Reported-by: Zamir SUN <sztsian@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:48 +02:00
Eric Biggers	b65b6ac52e	fork: fix incorrect fput of ->exe_file causing use-after-free commit `2b7e8665b4` upstream. Commit `7c05126793` ("mm, fork: make dup_mmap wait for mmap_sem for write killable") made it possible to kill a forking task while it is waiting to acquire its ->mmap_sem for write, in dup_mmap(). However, it was overlooked that this introduced an new error path before a reference is taken on the mm_struct's ->exe_file. Since the ->exe_file of the new mm_struct was already set to the old ->exe_file by the memcpy() in dup_mm(), it was possible for the mmput() in the error path of dup_mm() to drop a reference to ->exe_file which was never taken. This caused the struct file to later be freed prematurely. Fix it by updating mm_init() to NULL out the ->exe_file, in the same place it clears other things like the list of mmaps. This bug was found by syzkaller. It can be reproduced using the following C program: #define _GNU_SOURCE #include <pthread.h> #include <stdlib.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void mmap_thread(void _arg) { for (;;) { mmap(NULL, 0x1000000, PROT_READ, MAP_POPULATE\|MAP_ANONYMOUS\|MAP_PRIVATE, -1, 0); } } static void fork_thread(void _arg) { usleep(rand() % 10000); fork(); } int main(void) { fork(); fork(); fork(); for (;;) { if (fork() == 0) { pthread_t t; pthread_create(&t, NULL, mmap_thread, NULL); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } No special kernel config options are needed. It usually causes a NULL pointer dereference in __remove_shared_vm_struct() during exit, or in dup_mmap() (which is usually inlined into copy_process()) during fork. Both are due to a vm_area_struct's ->vm_file being used after it's already been freed. Google Bug Id: 64772007 Link: http://lkml.kernel.org/r/20170823211408.31198-1-ebiggers3@gmail.com Fixes: `7c05126793` ("mm, fork: make dup_mmap wait for mmap_sem for write killable") Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:47 +02:00
Edward Cree	655da3da9b	bpf/verifier: fix min/max handling in BPF_SUB [ Upstream commit `9305706c2e` ] We have to subtract the src max from the dst min, and vice-versa, since (e.g.) the smallest result comes from the largest subtrahend. Fixes: `484611357c` ("bpf: allow access into map value arrays") Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:44 +02:00
Daniel Borkmann	bf5b91b782	bpf: fix mixed signed/unsigned derived min/max value bounds [ Upstream commit `4cabc5b186` ] Edward reported that there's an issue in min/max value bounds tracking when signed and unsigned compares both provide hints on limits when having unknown variables. E.g. a program such as the following should have been rejected: 0: (7a) (u64 )(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff8a94cda93400 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+7 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) (u64 )(r10 -16) = -8 8: (79) r1 = (u64 )(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 11: (65) if r1 s> 0x1 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 12: (0f) r0 += r1 13: (72) (u8 )(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 14: (b7) r0 = 0 15: (95) exit What happens is that in the first part ... 8: (79) r1 = (u64 )(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 ... r1 carries an unsigned value, and is compared as unsigned against a register carrying an immediate. Verifier deduces in reg_set_min_max() that since the compare is unsigned and operation is greater than (>), that in the fall-through/false case, r1's minimum bound must be 0 and maximum bound must be r2. Latter is larger than the bound and thus max value is reset back to being 'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now 'R1=inv,min_value=0'. The subsequent test ... 11: (65) if r1 s> 0x1 goto pc+2 ... is a signed compare of r1 with immediate value 1. Here, verifier deduces in reg_set_min_max() that since the compare is signed this time and operation is greater than (>), that in the fall-through/false case, we can deduce that r1's maximum bound must be 1, meaning with prior test, we result in r1 having the following state: R1=inv,min_value=0,max_value=1. Given that the actual value this holds is -8, the bounds are wrongly deduced. When this is being added to r0 which holds the map_value(_adj) type, then subsequent store access in above case will go through check_mem_access() which invokes check_map_access_adj(), that will then probe whether the map memory is in bounds based on the min_value and max_value as well as access size since the actual unknown value is min_value <= x <= max_value; commit `fce366a9dd` ("bpf, verifier: fix alu ops against map_value{, _adj} register types") provides some more explanation on the semantics. It's worth to note in this context that in the current code, min_value and max_value tracking are used for two things, i) dynamic map value access via check_map_access_adj() and since commit `06c1c04972` ("bpf: allow helpers access to variable memory") ii) also enforced at check_helper_mem_access() when passing a memory address (pointer to packet, map value, stack) and length pair to a helper and the length in this case is an unknown value defining an access range through min_value/max_value in that case. The min_value/max_value tracking is /not/ used in the direct packet access case to track ranges. However, the issue also affects case ii), for example, the following crafted program based on the same principle must be rejected as well: 0: (b7) r2 = 0 1: (bf) r3 = r10 2: (07) r3 += -512 3: (7a) (u64 )(r10 -16) = -8 4: (79) r4 = (u64 )(r10 -16) 5: (b7) r6 = -1 6: (2d) if r4 > r6 goto pc+5 R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512 R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 7: (65) if r4 s> 0x1 goto pc+4 R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512 R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 8: (07) r4 += 1 9: (b7) r5 = 0 10: (6a) (u16 )(r10 -512) = 0 11: (85) call bpf_skb_load_bytes#26 12: (b7) r0 = 0 13: (95) exit Meaning, while we initialize the max_value stack slot that the verifier thinks we access in the [1,2] range, in reality we pass -7 as length which is interpreted as u32 in the helper. Thus, this issue is relevant also for the case of helper ranges. Resetting both bounds in check_reg_overflow() in case only one of them exceeds limits is also not enough as similar test can be created that uses values which are within range, thus also here learned min value in r1 is incorrect when mixed with later signed test to create a range: 0: (7a) (u64 )(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff880ad081fa00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+7 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) (u64 )(r10 -16) = -8 8: (79) r1 = (u64 )(r10 -16) 9: (b7) r2 = 2 10: (3d) if r2 >= r1 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 11: (65) if r1 s> 0x4 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 12: (0f) r0 += r1 13: (72) (u8 )(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4 R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 14: (b7) r0 = 0 15: (95) exit This leaves us with two options for fixing this: i) to invalidate all prior learned information once we switch signed context, ii) to track min/max signed and unsigned boundaries separately as done in [0]. (Given latter introduces major changes throughout the whole verifier, it's rather net-next material, thus this patch follows option i), meaning we can derive bounds either from only signed tests or only unsigned tests.) There is still the case of adjust_reg_min_max_vals(), where we adjust bounds on ALU operations, meaning programs like the following where boundaries on the reg get mixed in context later on when bounds are merged on the dst reg must get rejected, too: 0: (7a) (u64 )(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff89b2bf87ce00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+6 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) (u64 )(r10 -16) = -8 8: (79) r1 = (u64 )(r10 -16) 9: (b7) r2 = 2 10: (3d) if r2 >= r1 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 11: (b7) r7 = 1 12: (65) if r7 s> 0x0 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp 13: (b7) r0 = 0 14: (95) exit from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp 15: (0f) r7 += r1 16: (65) if r7 s> 0x4 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp 17: (0f) r0 += r7 18: (72) (u8 )(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp 19: (b7) r0 = 0 20: (95) exit Meaning, in adjust_reg_min_max_vals() we must also reset range values on the dst when src/dst registers have mixed signed/ unsigned derived min/max value bounds with one unbounded value as otherwise they can be added together deducing false boundaries. Once both boundaries are established from either ALU ops or compare operations w/o mixing signed/unsigned insns, then they can safely be added to other regs also having both boundaries established. Adding regs with one unbounded side to a map value where the bounded side has been learned w/o mixing ops is possible, but the resulting map value won't recover from that, meaning such op is considered invalid on the time of actual access. Invalid bounds are set on the dst reg in case i) src reg, or ii) in case dst reg already had them. The only way to recover would be to perform i) ALU ops but only 'add' is allowed on map value types or ii) comparisons, but these are disallowed on pointers in case they span a range. This is fine as only BPF_JEQ and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers which potentially turn them into PTR_TO_MAP_VALUE type depending on the branch, so only here min/max value cannot be invalidated for them. In terms of state pruning, value_from_signed is considered as well in states_equal() when dealing with adjusted map values. With regards to breaking existing programs, there is a small risk, but use-cases are rather quite narrow where this could occur and mixing compares probably unlikely. Joint work with Josef and Edward. [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html Fixes: `484611357c` ("bpf: allow access into map value arrays") Reported-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:43 +02:00
Daniel Borkmann	8d674bee8f	bpf, verifier: fix alu ops against map_value{, _adj} register types [ Upstream commit `fce366a9dd` ] While looking into map_value_adj, I noticed that alu operations directly on the map_value() resp. map_value_adj() register (any alu operation on a map_value() register will turn it into a map_value_adj() typed register) are not sufficiently protected against some of the operations. Two non-exhaustive examples are provided that the verifier needs to reject: i) BPF_AND on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) (u64 )(r2 +0) = 0 3: (18) r1 = 0xbf842a00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+2 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (57) r0 &= 8 8: (7a) (u64 )(r0 +0) = 22 R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp 9: (95) exit from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp 9: (95) exit processed 10 insns ii) BPF_ADD in 32 bit mode on r0 (map_value_adj): 0: (bf) r2 = r10 1: (07) r2 += -8 2: (7a) (u64 )(r2 +0) = 0 3: (18) r1 = 0xc24eee00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+2 R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 7: (04) (u32) r0 += (u32) 0 8: (7a) (u64 )(r0 +0) = 22 R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp 9: (95) exit from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp 9: (95) exit processed 10 insns Issue is, while min_value / max_value boundaries for the access are adjusted appropriately, we change the pointer value in a way that cannot be sufficiently tracked anymore from its origin. Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact, all the test cases coming with `484611357c` ("bpf: allow access into map value arrays") perform BPF_ADD only on the destination register that is PTR_TO_MAP_VALUE_ADJ. Only for UNKNOWN_VALUE register types such operations make sense, f.e. with unknown memory content fetched initially from a constant offset from the map value memory into a register. That register is then later tested against lower / upper bounds, so that the verifier can then do the tracking of min_value / max_value, and properly check once that UNKNOWN_VALUE register is added to the destination register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the original use-case is solving. Note, tracking on what is being added is done through adjust_reg_min_max_vals() and later access to the map value enforced with these boundaries and the given offset from the insn through check_map_access_adj(). Tests will fail for non-root environment due to prohibited pointer arithmetic, in particular in check_alu_op(), we bail out on the is_pointer_value() check on the dst_reg (which is false in root case as we allow for pointer arithmetic via env->allow_ptr_leaks). Similarly to PTR_TO_PACKET, one way to fix it is to restrict the allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit mode BPF_ADD. The test_verifier suite runs fine after the patch and it also rejects mentioned test cases. Fixes: `484611357c` ("bpf: allow access into map value arrays") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:43 +02:00
Daniel Borkmann	577aa83b28	bpf: adjust verifier heuristics [ Upstream commit `3c2ce60bdd` ] Current limits with regards to processing program paths do not really reflect today's needs anymore due to programs becoming more complex and verifier smarter, keeping track of more data such as const ALU operations, alignment tracking, spilling of PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for smarter matching of what LLVM generates. This also comes with the side-effect that we result in fewer opportunities to prune search states and thus often need to do more work to prove safety than in the past due to different register states and stack layout where we mismatch. Generally, it's quite hard to determine what caused a sudden increase in complexity, it could be caused by something as trivial as a single branch somewhere at the beginning of the program where LLVM assigned a stack slot that is marked differently throughout other branches and thus causing a mismatch, where verifier then needs to prove safety for the whole rest of the program. Subsequently, programs with even less than half the insn size limit can get rejected. We noticed that while some programs load fine under pre 4.11, they get rejected due to hitting limits on more recent kernels. We saw that in the vast majority of cases (90+%) pruning failed due to register mismatches. In case of stack mismatches, majority of cases failed due to different stack slot types (invalid, spill, misc) rather than differences in spilled registers. This patch makes pruning more aggressive by also adding markers that sit at conditional jumps as well. Currently, we only mark jump targets for pruning. For example in direct packet access, these are usually error paths where we bail out. We found that adding these markers, it can reduce number of processed insns by up to 30%. Another option is to ignore reg->id in probing PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning slightly as well by up to 7% observed complexity reduction as stand-alone. Meaning, if a previous path with register type PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then in the current state a PTR_TO_MAP_VALUE_OR_NULL register for the same map X must be safe as well. Last but not least the patch also adds a scheduling point and bumps the current limit for instructions to be processed to a more adequate value. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:43 +02:00
John Fastabend	e37bdeee95	bpf, verifier: add additional patterns to evaluate_reg_imm_alu [ Upstream commit `43188702b3` ] Currently the verifier does not track imm across alu operations when the source register is of unknown type. This adds additional pattern matching to catch this and track imm. We've seen LLVM generating this pattern while working on cilium. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:43 +02:00
Daniel Borkmann	d6a6b6b4c3	bpf: fix bpf_trace_printk on 32 bit archs [ Upstream commit `88a5c690b6` ] James reported that on MIPS32 bpf_trace_printk() is currently broken while MIPS64 works fine: bpf_trace_printk() uses conditional operators to attempt to pass different types to __trace_printk() depending on the format operators. This doesn't work as intended on 32-bit architectures where u32 and long are passed differently to u64, since the result of C conditional operators follows the "usual arithmetic conversions" rules, such that the values passed to __trace_printk() will always be u64 [causing issues later in the va_list handling for vscnprintf()]. For example the samples/bpf/tracex5 test printed lines like below on MIPS32, where the fd and buf have come from the u64 fd argument, and the size from the buf argument: [...] 1180.941542: 0x00000001: write(fd=1, buf= (null), size=6258688) Instead of this: [...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512) One way to get it working is to expand various combinations of argument types into 8 different combinations for 32 bit and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64 as well that it resolves the issue. Fixes: `9c959c863f` ("tracing: Allow BPF programs to call bpf_trace_printk()") Reported-by: James Hogan <james.hogan@imgtec.com> Tested-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-30 10:21:39 +02:00
Greg Kroah-Hartman	5731c30334	Merge 4.9.45 into android-4.9 Changes in 4.9.45 netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister audit: Fix use after free in audit_remove_watch_rule() parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo crypto: ixp4xx - Fix error handling path in 'aead_perform()' crypto: x86/sha1 - Fix reads beyond the number of blocks passed Input: elan_i2c - add ELAN0608 to the ACPI table Input: elan_i2c - Add antoher Lenovo ACPI ID for upcoming Lenovo NB ALSA: seq: 2nd attempt at fixing race creating a queue ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices mm: discard memblock data later mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS mm/mempolicy: fix use after free when calling get_mempolicy mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes xen: fix bio vec merging blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC xen-blkfront: use a right index when checking requests x86/asm/64: Clear AC on NMI entries irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup() irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup() genirq: Restore trigger settings in irq_modify_status() genirq/ipi: Fixup checks against nr_cpu_ids Sanitize 'move_pages()' permission checks pids: make task_tgid_nr_ns() safe usb: optimize acpi companion search for usb port devices usb: qmi_wwan: add D-Link DWM-222 device ID Linux 4.9.45 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2017-08-29 14:51:49 +02:00
Oleg Nesterov	322cd32623	pids: make task_tgid_nr_ns() safe commit `dd1c1f2f20` upstream. This was reported many times, and this was even mentioned in commit `52ee2dfdd4` ("pids: refactor vnr/nr_ns helpers to make them safe") but somehow nobody bothered to fix the obvious problem: task_tgid_nr_ns() is not safe because task->group_leader points to nowhere after the exiting task passes exit_notify(), rcu_read_lock() can not help. We really need to change __unhash_process() to nullify group_leader, parent, and real_parent, but this needs some cleanups. Until then we can turn task_tgid_nr_ns() into another user of __task_pid_nr_ns() and fix the problem. Reported-by: Troy Kensinger <tkensinger@google.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-24 17:12:21 -07:00
Alexey Dobriyan	f9497d5125	genirq/ipi: Fixup checks against nr_cpu_ids commit `8fbbe2d7cc` upstream. Valid CPU ids are [0, nr_cpu_ids-1] inclusive. Fixes: `3b8e29a82d` ("genirq: Implement ipi_send_mask/single()") Fixes: `f9bce791ae` ("genirq: Add a new function to get IPI reverse mapping") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170819095751.GB27864@avx2 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-24 17:12:21 -07:00

1 2 3 4 5 ...

23921 Commits