linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-02 03:03:00 +09:00

Author	SHA1	Message	Date
Dietmar Eggemann	3bf7b11f4b	sched: Remove sysctl_sched_is_big_little With the new wakeup approach this sysctl is not necessary any more. Change-Id: I52114b3c918791f6a4f9f30f50002919ccbc1a9c Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> (cherry picked from commit 885c0d503bcdf0ef4e9b46822496f16b20aa3bbd) Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-06-21 16:37:28 +05:30
Morten Rasmussen	fac20743c7	UPSTREAM: sched/core: Introduce SD_ASYM_CPUCAPACITY sched_domain topology flag Add a topology flag to the sched_domain hierarchy indicating the lowest domain level where the full range of CPU capacities is represented by the domain members for asymmetric capacity topologies (e.g. ARM big.LITTLE). The flag is intended to indicate that extra care should be taken when placing tasks on CPUs and this level spans all the different types of CPUs found in the system (no need to look further up the domain hierarchy). This information is currently only available through iterating through the capacities of all the CPUs at parent levels in the sched_domain hierarchy. SD 2 [ 0 1 2 3] SD_ASYM_CPUCAPACITY SD 1 [ 0 1] [ 2 3] !SD_ASYM_CPUCAPACITY CPU: 0 1 2 3 capacity: 756 756 1024 1024 If the topology in the example above is duplicated to create an eight CPU example with third sched_domain level on top (SD 3), this level should not have the flag set (!SD_ASYM_CPUCAPACITY) as its two group would both have all CPU capacities represented within them. Change-Id: I1526407b90567cac387419719b7d7fdc8b259a85 Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1469453670-2660-6-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `1f6e6c7cb9`) [trivial merge conflict] Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-06-21 16:37:24 +05:30
Morten Rasmussen	d6f316457f	UPSTREAM: sched/core: Fix power to capacity renaming in comment It is seems that this one escaped Nico's renaming of cpu_power to cpu_capacity a while back. Change-Id: Ic2569d714db7b740f1df4ccc381ba8c1772c2793 Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: linux-kernel@vger.kernel.org Cc: mgalbraith@suse.de Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1466615004-3503-2-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `bd425d4bfc`) Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-06-21 16:37:20 +05:30
Juri Lelli	2d0ab5231c	trace/sched: add rq utilization signal for WALT It is useful to be able to check current capacity against rq utilization signal generated by WALT (to check how a cpufreq governor is behaving for example). Add rq utilization signal (same scale as capacity) to the walt_update_ task_ravg tracepoint. Change-Id: I9aae3884a741d23ac494bef80d2303f107f135ce Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2017-06-21 16:37:20 +05:30
Steve Muckle	8c7730ea97	sched: cpufreq: use rt_avg as estimate of required RT CPU capacity A policy of going to fmax on any RT activity will be detrimental for power on many platforms. Often RT accounts for only a small amount of CPU activity so sending the CPU frequency to fmax is overkill. Worse still, some platforms may not be able to even complete the CPU frequency change before the RT activity has already completed. Cpufreq governors have not treated RT activity this way in the past so it is not part of the expected semantics of the RT scheduling class. The DL class offers guarantees about task completion and could be used for this purpose. Modify the schedutil algorithm to instead use rt_avg as an estimate of RT utilization of the CPU. Based on previous work by Vincent Guittot <vincent.guittot@linaro.org>. Change-Id: I1ed605a3e2512a94d34217a8e57c3fd97cca60be Signed-off-by: Steve Muckle <smuckle@linaro.org>	2017-06-21 16:37:20 +05:30
Petr Mladek	d87de29002	BACKPORT: kthread: allow to cancel kthread work We are going to use kthread workers more widely and sometimes we will need to make sure that the work is neither pending nor running. This patch implements cancel_*_sync() operations as inspired by workqueues. Well, we are synchronized against the other operations via the worker lock, we use del_timer_sync() and a counter to count parallel cancel operations. Therefore the implementation might be easier. First, we check if a worker is assigned. If not, the work has newer been queued after it was initialized. Second, we take the worker lock. It must be the right one. The work must not be assigned to another worker unless it is initialized in between. Third, we try to cancel the timer when it exists. The timer is deleted synchronously to make sure that the timer call back is not running. We need to temporary release the worker->lock to avoid a possible deadlock with the callback. In the meantime, we set work->canceling counter to avoid any queuing. Fourth, we try to remove the work from a worker list. It might be the list of either normal or delayed works. Fifth, if the work is running, we call kthread_flush_work(). It might take an arbitrary time. We need to release the worker-lock again. In the meantime, we again block any queuing by the canceling counter. As already mentioned, the check for a pending kthread work is done under a lock. In compare with workqueues, we do not need to fight for a single PENDING bit to block other operations. Therefore we do not suffer from the thundering storm problem and all parallel canceling jobs might use kthread_flush_work(). Any queuing is blocked until the counter gets zero. Change-Id: I8a8ece0f93c828f311d0ad5c88d80db2388e4808 Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry-picked from `37be45d49d`) [major changes to the original patch while cherry-picking; only rebased the sync variant] Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2017-06-21 16:34:04 +05:30
Steve Muckle	f49a1d8ca7	BACKPORT: cpufreq: schedutil: New governor based on scheduler utilization data Add a new cpufreq scaling governor, called "schedutil", that uses scheduler-provided CPU utilization information as input for making its decisions. Doing that is possible after commit `34e2c55` (cpufreq: Add mechanism for registering utilization update callbacks) that introduced cpufreq_update_util() called by the scheduler on utilization changes (from CFS) and RT/DL task status updates. In particular, CPU frequency scaling decisions may be based on the the utilization data passed to cpufreq_update_util() by CFS. The new governor is relatively simple. The frequency selection formula used by it depends on whether or not the utilization is frequency-invariant. In the frequency-invariant case the new CPU frequency is given by next_freq = 1.25 * max_freq * util / max where util and max are the last two arguments of cpufreq_update_util(). In turn, if util is not frequency-invariant, the maximum frequency in the above formula is replaced with the current frequency of the CPU: next_freq = 1.25 * curr_freq * util / max The coefficient 1.25 corresponds to the frequency tipping point at (util / max) = 0.8. All of the computations are carried out in the utilization update handlers provided by the new governor. One of those handlers is used for cpufreq policies shared between multiple CPUs and the other one is for policies with one CPU only (and therefore it doesn't need to use any extra synchronization means). The governor supports fast frequency switching if that is supported by the cpufreq driver in use and possible for the given policy. In the fast switching case, all operations of the governor take place in its utilization update handlers. If fast switching cannot be used, the frequency switch operations are carried out with the help of a work item which only calls __cpufreq_driver_target() (under a mutex) to trigger a frequency update (to a value already computed beforehand in one of the utilization update handlers). Currently, the governor treats all of the RT and DL tasks as "unknown utilization" and sets the frequency to the allowed maximum when updated from the RT or DL sched classes. That heavy-handed approach should be replaced with something more subtle and specifically targeted at RT and DL tasks. The governor shares some tunables management code with the "ondemand" and "conservative" governors and uses some common definitions from cpufreq_governor.h, but apart from that it is stand-alone. Change-Id: I03876e622768e4b3ee4dc28682af7cce771f2f4c Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> (cherry-picked from `9bdcb44e39`) [ Backport the schedutil cpufreq governor from 4.9. Some cpufreq tunable infrastructure as well as the resolve_freq API is also backported as those are dependencies] Signed-off-by: Steve Muckle <smuckle@linaro.org> [trivial cherry-picking fixes] Signed-off-by: Juri Lelli <juri.lelli@arm.com> [fixed default governor machinery] Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-06-21 16:34:04 +05:30
Steve Muckle	dee8fa1552	sched: backport cpufreq hooks from 4.9-rc4 The scheduler cpufreq hooks are required by the schedutil cpufreq governor. Change-Id: Ied6c46262bb33b7e81bbb3d3d2761124e0c676b7 Signed-off-by: Steve Muckle <smuckle@linaro.org> [trivial cherry-picking fixes] Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>	2017-06-21 16:34:04 +05:30
Alex Shi	be74fd2af8	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-06-08 12:11:52 +08:00
Alex Shi	dfb9fdffce	Merge tag 'v4.4.71' into linux-linaro-lsk-v4.4 This is the 4.4.71 stable release	2017-06-08 12:11:49 +08:00
Vlad Yasevich	1b5286ba9f	vlan: Fix tcp checksum offloads in Q-in-Q vlans commit `35d2f80b07` upstream. It appears that TCP checksum offloading has been broken for Q-in-Q vlans. The behavior was execerbated by the series commit `afb0bc972b` ("Merge branch 'stacked_vlan_tso'") that that enabled accleleration features on stacked vlans. However, event without that series, it is possible to trigger this issue. It just requires a lot more specialized configuration. The root cause is the interaction between how netdev_intersect_features() works, the features actually set on the vlan devices and HW having the ability to run checksum with longer headers. The issue starts when netdev_interesect_features() replaces NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM \| NETIF_F_IPV6_CSUM, if the HW advertises IP\|IPV6 specific checksums. This happens for tagged and multi-tagged packets. However, HW that enables IP\|IPV6 checksum offloading doesn't gurantee that packets with arbitrarily long headers can be checksummed. This patch disables IP\|IPV6 checksums on the packet for multi-tagged packets. CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> CC: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-06-07 12:05:59 +02:00
Eric Dumazet	338f665acb	ipv4: add reference counting to metrics [ Upstream commit `3fb07daff8` ] Andrey Konovalov reported crashes in ipv4_mtu() I could reproduce the issue with KASAN kernels, between 10.246.7.151 and 10.246.7.152 : 1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 & 2) At the same time run following loop : while : do ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 done Cong Wang attempted to add back rt->fi in commit `82486aa6f1` ("ipv4: restore rt->fi for reference counting") but this proved to add some issues that were complex to solve. Instead, I suggested to add a refcount to the metrics themselves, being a standalone object (in particular, no reference to other objects) I tried to make this patch as small as possible to ease its backport, instead of being super clean. Note that we believe that only ipv4 dst need to take care of the metric refcount. But if this is wrong, this patch adds the basic infrastructure to extend this to other families. Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang for his efforts on this problem. Fixes: `2860583fe8` ("ipv4: Kill rt->fi") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-06-07 12:05:59 +02:00
Joonsoo Kim	000f3de5a9	BACKPORT: mm/slab: clean up DEBUG_PAGEALLOC processing code Currently, open code for checking DEBUG_PAGEALLOC cache is spread to some sites. It makes code unreadable and hard to change. This patch cleans up this code. The following patch will change the criteria for DEBUG_PAGEALLOC cache so this clean-up will help it, too. [akpm@linux-foundation.org: fix build with CONFIG_DEBUG_PAGEALLOC=n] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [AmitP: fix build with CONFIG_DEBUG_PAGEALLOC=n] (cherry picked from commit `40b4413797`) Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-05-26 13:01:22 +05:30
Alex Shi	bded05d18c	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-05-26 12:03:29 +08:00
Alex Shi	be8cec3816	Merge tag 'v4.4.70' into linux-linaro-lsk-v4.4 This is the 4.4.70 stable release	2017-05-26 12:03:27 +08:00
Thomas Gleixner	6384f782a6	tracing/kprobes: Enforce kprobes teardown after testing commit `30e7d894c1` upstream. Enabling the tracer selftest triggers occasionally the warning in text_poke(), which warns when the to be modified page is not marked reserved. The reason is that the tracer selftest installs kprobes on functions marked __init for testing. These probes are removed after the tests, but that removal schedules the delayed kprobes_optimizer work, which will do the actual text poke. If the work is executed after the init text is freed, then the warning triggers. The bug can be reproduced reliably when the work delay is increased. Flush the optimizer work and wait for the optimizing/unoptimizing lists to become empty before returning from the kprobes tracer selftest. That ensures that all operations which were queued due to the probes removal have completed. Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: `6274de498` ("kprobes: Support delayed unoptimizing") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-25 14:30:17 +02:00
Chenbo Feng	182e12b8af	ANDROID: Add untag hacks to inet_release function To prevent protential risk of memory leak caused by closing socket with out untag it from qtaguid module, the qtaguid module now do not hold any socket file reference count. Instead, it will increase the sk_refcnt of the sk struct to prevent a reuse of the socket pointer. And when a socket is released. It will delete the tag if the socket is previously tagged so no more resources is held by xt_qtaguid moudle. A flag is added to the untag process to prevent possible kernel crash caused by fail to delete corresponding socket_tag_entry list. Bug: 36374484 Test: compile and run test under system/extra/test/iptables, run cts -m CtsNetTestCases -t android.net.cts.SocketRefCntTest Signed-off-by: Chenbo Feng <fengc@google.com> Change-Id: Iea7c3bf0c59b9774a5114af905b2405f6bc9ee52	2017-05-25 16:37:25 +05:30
Alex Shi	ce5b94017f	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-05-25 10:05:41 +08:00
Alex Shi	c8603c0171	Merge tag 'v4.4.69' into linux-linaro-lsk-v4.4 This is the 4.4.69 stable release	2017-05-25 10:05:39 +08:00
Maxim Altshul	8ef67e0078	mac80211: RX BA support for sta max_rx_aggregation_subframes commit `480dd46b9d` upstream. The ability to change the max_rx_aggregation frames is useful in cases of IOP. There exist some devices (latest mobile phones and some AP's) that tend to not respect a BA sessions maximum size (in Kbps). These devices won't respect the AMPDU size that was negotiated during association (even though they do respect the maximal number of packets). This violation is characterized by a valid number of packets in a single AMPDU. Even so, the total size will exceed the size negotiated during association. Eventually, this will cause some undefined behavior, which in turn causes the hw to drop packets, causing the throughput to plummet. This patch will make the subframe limitation to be held by each station, instead of being held only by hw. Signed-off-by: Maxim Altshul <maxim.altshul@ti.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-20 14:27:03 +02:00
Sara Sharon	d13333edbc	mac80211: pass block ack session timeout to to driver commit `50ea05efaf` upstream. Currently mac80211 does not inform the driver of the session block ack timeout when starting a rx aggregation session. Drivers that manage the reorder buffer need to know this parameter. Seeing that there are now too many arguments for the drv_ampdu_action() function, wrap them inside a structure. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-20 14:27:03 +02:00
Sara Sharon	0fe94dd915	mac80211: pass RX aggregation window size to driver commit `fad471860c` upstream. Currently mac80211 does not inform the driver of the window size when starting an RX aggregation session. To enable managing the reorder buffer in the driver or hardware the window size is needed. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-20 14:27:02 +02:00
Nicholas Bellinger	6cd0200a95	target: Convert ACL change queue_depth se_session reference usage commit `d36ad77f70` upstream. This patch converts core_tpg_set_initiator_node_queue_depth() to use struct se_node_acl->acl_sess_list when performing explicit se_tpg_tfo->shutdown_session() for active sessions, in order for new se_node_acl->queue_depth to take effect. This follows how core_tpg_del_initiator_node_acl() currently works when invoking se_tpg_tfo->shutdown-session(), and ahead of the next patch to take se_node_acl->acl_kref during lookup, the extra get_initiator_node_acl() can go away. In order to achieve this, go ahead and change target_get_session() to use kref_get_unless_zero() and propigate up the return value to know when a session is already being released. This is because se_node_acl->acl_group is already protecting se_node_acl->acl_group reference via configfs, and shutdown within core_tpg_del_initiator_node_acl() won't occur until sys_write() to core_tpg_set_initiator_node_queue_depth() attribute returns back to user-space. Also, drop the left-over iscsi-target hack, and obtain se_portal_group->session_lock in lio_tpg_shutdown_session() internally. Remove iscsi-target wrapper and unused se_tpg + force parameters and associated code. Reported-by: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Andy Grover <agrover@redhat.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-20 14:26:58 +02:00
Alex Shi	4c940b394f	Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4	2017-05-15 17:35:45 +08:00
Alex Shi	9f3cb876f7	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-05-15 17:32:20 +08:00
Alex Shi	d15d038418	Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4	2017-05-15 17:31:58 +08:00
Ilya Dryomov	4a4c6a0890	block: get rid of blk_integrity_revalidate() commit `19b7ccf865` upstream. Commit `25520d55cd` ("block: Inline blk_integrity in struct gendisk") introduced blk_integrity_revalidate(), which seems to assume ownership of the stable pages flag and unilaterally clears it if no blk_integrity profile is registered: if (bi->profile) disk->queue->backing_dev_info->capabilities \|= BDI_CAP_STABLE_WRITES; else disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES; It's called from revalidate_disk() and rescan_partitions(), making it impossible to enable stable pages for drivers that support partitions and don't use blk_integrity: while the call in revalidate_disk() can be trivially worked around (see zram, which doesn't support partitions and hence gets away with zram_revalidate_disk()), rescan_partitions() can be triggered from userspace at any time. This breaks rbd, where the ceph messenger is responsible for generating/verifying CRCs. Since blk_integrity_{un,}register() "must" be used for (un)registering the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES setting there. This way drivers that call blk_integrity_register() and use integrity infrastructure won't interfere with drivers that don't but still want stable pages. Fixes: `25520d55cd` ("block: Inline blk_integrity in struct gendisk") Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> [idryomov@gmail.com: backport to < 4.11: bdi is embedded in queue] Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-14 13:32:59 +02:00
Jin Qian	4edbdf57bc	f2fs: sanity check segment count commit `b9dd46188e` upstream. F2FS uses 4 bytes to represent block address. As a result, supported size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments. Signed-off-by: Jin Qian <jinqian@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-14 13:32:59 +02:00
WANG Cong	5c333f84bb	ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf [ Upstream commit `242d3a49a2` ] For each netns (except init_net), we initialize its null entry in 3 places: 1) The template itself, as we use kmemdup() 2) Code around dst_init_metrics() in ip6_route_net_init() 3) ip6_route_dev_notify(), which is supposed to initialize it after loopback registers Unfortunately the last one still happens in a wrong order because we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to net->loopback_dev's idev, thus we have to do that after we add idev to loopback. However, this notifier has priority == 0 same as ipv6_dev_notf, and ipv6_dev_notf is registered after ip6_route_dev_notifier so it is called actually after ip6_route_dev_notifier. This is similar to commit `2f460933f5` ("ipv6: initialize route null entry in addrconf_init()") which fixes init_net. Fix it by picking a smaller priority for ip6_route_dev_notifier. Also, we have to release the refcnt accordingly when unregistering loopback_dev because device exit functions are called before subsys exit functions. Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-14 13:32:58 +02:00
WANG Cong	5117f03fd6	ipv6: initialize route null entry in addrconf_init() [ Upstream commit `2f460933f5` ] Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev since it is always NULL. This is clearly wrong, we have code to initialize it to loopback_dev, unfortunately the order is still not correct. loopback_dev is registered very early during boot, we lose a chance to re-initialize it in notifier. addrconf_init() is called after ip6_route_init(), which means we have no chance to correct it. Fix it by moving this initialization explicitly after ipv6_add_dev(init_net.loopback_dev) in addrconf_init(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-14 13:32:58 +02:00
Stephen Boyd	2428776eb1	usb: chipidea: Handle extcon events properly commit `a89b94b533` upstream. We're currently emulating the vbus and id interrupts in the OTGSC read API, but we also need to make sure that if we're handling the events with extcon that we don't enable the interrupts for those events in the hardware. Therefore, properly emulate this register if we're using extcon, but don't enable the interrupts. This allows me to get my cable connect/disconnect working properly without getting spurious interrupts on my device that uses an extcon for these two events. Acked-by: Peter Chen <peter.chen@nxp.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Ivan T. Ivanov" <iivanov.xz@gmail.com> Fixes: `3ecb3e09b0` ("usb: chipidea: Use extcon framework for VBUS and ID detect") Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org> Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-14 13:32:56 +02:00
Arnd Bergmann	fd79e43632	mtd: avoid stack overflow in MTD CFI code commit `fddcca5107` upstream. When map_word gets too large, we use a lot of kernel stack, and for MTD_MAP_BANK_WIDTH_32, this means we use more than the recommended 1024 bytes in a number of functions: drivers/mtd/chips/cfi_cmdset_0020.c: In function 'cfi_staa_write_buffers': drivers/mtd/chips/cfi_cmdset_0020.c:651:1: warning: the frame size of 1336 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/mtd/chips/cfi_cmdset_0020.c: In function 'cfi_staa_erase_varsize': drivers/mtd/chips/cfi_cmdset_0020.c:972:1: warning: the frame size of 1208 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/mtd/chips/cfi_cmdset_0001.c: In function 'do_write_buffer': drivers/mtd/chips/cfi_cmdset_0001.c:1835:1: warning: the frame size of 1240 bytes is larger than 1024 bytes [-Wframe-larger-than=] This can be avoided if all operations on the map word are done indirectly and the stack gets reused between the calls. We can mostly achieve this by selecting MTD_COMPLEX_MAPPINGS whenever MTD_MAP_BANK_WIDTH_32 is set, but for the case that no other bank width is enabled, we also need to use a non-constant map_bankwidth() to convince the compiler to use less stack. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [Brian: this patch mostly achieves its goal by forcing MTD_COMPLEX_MAPPINGS (and the accompanying indirection) for 256-bit mappings; the rest of the change is mostly a wash, though it helps reduce stack size slightly. If we really care about supporting 256-bit mappings though, we should consider rewriting some of this code to avoid keeping and assigning so many 256-bit objects on the stack.] Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-08 07:46:01 +02:00
Alex Shi	b4d01daf69	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-05-04 12:01:39 +08:00
Alex Shi	91d95be049	Merge tag 'v4.4.66' into linux-linaro-lsk-v4.4 This is the 4.4.66 stable release	2017-05-04 12:01:37 +08:00
David Ahern	f6b94906b4	net: ipv6: RTF_PCPU should not be settable from userspace [ Upstream commit `557c44be91` ] Andrey reported a fault in the IPv6 route code: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880069809600 task.stack: ffff880062dc8000 RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975 RSP: 0018:ffff880062dced30 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006 RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018 RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0 Call Trace: ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128 ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212 ... Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit set. Flags passed to the kernel are blindly copied to the allocated rt6_info by ip6_route_info_create making a newly inserted route appear as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set and expects rt->dst.from to be set - which it is not since it is not really a per-cpu copy. The subsequent call to __ip6_dst_alloc then generates the fault. Fix by checking for the flag and failing with EINVAL. Fixes: `d52d3997f8` ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-02 21:19:53 -07:00
David Zeuthen	1b8e8e800b	ANDROID: Update init/do_mounts_dm.c to the latest ChromiumOS version. This is needed for AVB integration work. Bug: 31796270 Test: Manually tested (other arch). Change-Id: I32fd37c1578c6414e3e6ff277d16ad94df7886b8 Signed-off-by: David Zeuthen <zeuthen@google.com>	2017-05-01 15:28:17 +05:30
Jan Kara	2ffd5f0b9d	BACKPORT: [UPSTREAM] mbcache2: reimplement mbcache (Cherry-pick from commit `f9a61eb4e2`) Original mbcache was designed to have more features than what ext? filesystems ended up using. It supported entry being in more hashes, it had a home-grown rwlocking of each entry, and one cache could cache entries from multiple filesystems. This genericity also resulted in more complex locking, larger cache entries, and generally more code complexity. This is reimplementation of the mbcache functionality to exactly fit the purpose ext? filesystems use it for. Cache entries are now considerably smaller (7 instead of 13 longs), the code is considerably smaller as well (414 vs 913 lines of code), and IMO also simpler. The new code is also much more lightweight. I have measured the speed using artificial xattr-bench benchmark, which spawns P processes, each process sets xattr for F different files, and the value of xattr is randomly chosen from a pool of V values. Averages of runtimes for 5 runs for various combinations of parameters are below. The first value in each cell is old mbache, the second value is the new mbcache. V=10 F\P 1 2 4 8 16 32 64 10 0.158,0.157 0.208,0.196 0.500,0.277 0.798,0.400 3.258,0.584 13.807,1.047 61.339,2.803 100 0.172,0.167 0.279,0.222 0.520,0.275 0.825,0.341 2.981,0.505 12.022,1.202 44.641,2.943 1000 0.185,0.174 0.297,0.239 0.445,0.283 0.767,0.340 2.329,0.480 6.342,1.198 16.440,3.888 V=100 F\P 1 2 4 8 16 32 64 10 0.162,0.153 0.200,0.186 0.362,0.257 0.671,0.496 1.433,0.943 3.801,1.345 7.938,2.501 100 0.153,0.160 0.221,0.199 0.404,0.264 0.945,0.379 1.556,0.485 3.761,1.156 7.901,2.484 1000 0.215,0.191 0.303,0.246 0.471,0.288 0.960,0.347 1.647,0.479 3.916,1.176 8.058,3.160 V=1000 F\P 1 2 4 8 16 32 64 10 0.151,0.129 0.210,0.163 0.326,0.245 0.685,0.521 1.284,0.859 3.087,2.251 6.451,4.801 100 0.154,0.153 0.211,0.191 0.276,0.282 0.687,0.506 1.202,0.877 3.259,1.954 8.738,2.887 1000 0.145,0.179 0.202,0.222 0.449,0.319 0.899,0.333 1.577,0.524 4.221,1.240 9.782,3.579 V=10000 F\P 1 2 4 8 16 32 64 10 0.161,0.154 0.198,0.190 0.296,0.256 0.662,0.480 1.192,0.818 2.989,2.200 6.362,4.746 100 0.176,0.174 0.236,0.203 0.326,0.255 0.696,0.511 1.183,0.855 4.205,3.444 19.510,17.760 1000 0.199,0.183 0.240,0.227 1.159,1.014 2.286,2.154 6.023,6.039 ---,10.933 ---,36.620 V=100000 F\P 1 2 4 8 16 32 64 10 0.171,0.162 0.204,0.198 0.285,0.230 0.692,0.500 1.225,0.881 2.990,2.243 6.379,4.771 100 0.151,0.171 0.220,0.210 0.295,0.255 0.720,0.518 1.226,0.844 3.423,2.831 19.234,17.544 1000 0.192,0.189 0.249,0.225 1.162,1.043 2.257,2.093 5.853,4.997 ---,10.399 ---,32.198 We see that the new code is faster in pretty much all the cases and starting from 4 processes there are significant gains with the new code resulting in upto 20-times shorter runtimes. Also for large numbers of cached entries all values for the old code could not be measured as the kernel started hitting softlockups and died before the test completed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Bug: 32461228	2017-05-01 15:05:27 +05:30
Alex Shi	f63c6ea120	Merge tag 'v4.4.65' into linux-linaro-lsk-v4.4 This is the 4.4.65 stable release	2017-05-01 12:02:08 +08:00
Eric W. Biederman	c50fd34e10	mnt: Add a per mount namespace limit on the number of mounts commit `d29216842a` upstream. CAI Qian <caiqian@redhat.com> pointed out that the semantics of shared subtrees make it possible to create an exponentially increasing number of mounts in a mount namespace. mkdir /tmp/1 /tmp/2 mount --make-rshared / for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done Will create create 2^20 or 1048576 mounts, which is a practical problem as some people have managed to hit this by accident. As such CVE-2016-6213 was assigned. Ian Kent <raven@themaw.net> described the situation for autofs users as follows: > The number of mounts for direct mount maps is usually not very large because of > the way they are implemented, large direct mount maps can have performance > problems. There can be anywhere from a few (likely case a few hundred) to less > than 10000, plus mounts that have been triggered and not yet expired. > > Indirect mounts have one autofs mount at the root plus the number of mounts that > have been triggered and not yet expired. > > The number of autofs indirect map entries can range from a few to the common > case of several thousand and in rare cases up to between 30000 and 50000. I've > not heard of people with maps larger than 50000 entries. > > The larger the number of map entries the greater the possibility for a large > number of active mounts so it's not hard to expect cases of a 1000 or somewhat > more active mounts. So I am setting the default number of mounts allowed per mount namespace at 100,000. This is more than enough for any use case I know of, but small enough to quickly stop an exponential increase in mounts. Which should be perfect to catch misconfigurations and malfunctioning programs. For anyone who needs a higher limit this can be changed by writing to the new /proc/sys/fs/mount-max sysctl. Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-04-30 05:49:28 +02:00
Alex Shi	35dcea132c	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2017-04-23 12:02:14 +08:00
Alex Shi	c2a5f1f420	Merge tag 'v4.4.63' into linux-linaro-lsk-v4.4 This is the 4.4.63 stable release	2017-04-23 12:02:12 +08:00
Herbert Xu	2673d1c512	crypto: ahash - Fix EINPROGRESS notification callback commit `ef0579b64e` upstream. The ahash API modifies the request's callback function in order to clean up after itself in some corner cases (unaligned final and missing finup). When the request is complete ahash will restore the original callback and everything is fine. However, when the request gets an EBUSY on a full queue, an EINPROGRESS callback is made while the request is still ongoing. In this case the ahash API will incorrectly call its own callback. This patch fixes the problem by creating a temporary request object on the stack which is used to relay EINPROGRESS back to the original completion function. This patch also adds code to preserve the original flags value. Fixes: `ab6bf4e5e5` ("crypto: hash - Fix the pointer voodoo in...") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-04-21 09:30:06 +02:00
Tejun Heo	3144d81a77	cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups commit `77f88796ce` upstream. Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-debugged-by: Chris Mason <clm@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-04-21 09:30:04 +02:00
Alex Shi	64fb55641f	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android Conflicts: arch/arm64/Kconfig: keep ARCH_MMAP_RND_BITS_MIN etc config	2017-04-13 13:07:03 +08:00
Alex Shi	4f380dc263	Merge tag 'v4.4.61' into linux-linaro-lsk-v4.4 This is the 4.4.61 stable release	2017-04-13 12:01:30 +08:00
Thomas Hellstrom	ad4ae2feef	drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces commit `fe25deb773` upstream. Previously, when a surface was opened using a legacy (non prime) handle, it was verified to have been created by a client in the same master realm. Relax this so that opening is also allowed recursively if the client already has the surface open. This works around a regression in svga mesa where opening of a shared surface is used recursively to obtain surface information. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-04-12 12:38:33 +02:00
Maciej Żenczykowski	f8dc2db873	UPSTREAM: ipv6 addrconf: implement RFC7559 router solicitation backoff This implements: https://tools.ietf.org/html/rfc7559 Backoff is performed according to RFC3315 section 14: https://tools.ietf.org/html/rfc3315#section-14 We allow setting /proc/sys/net/ipv6/conf//router_solicitations to a negative value meaning an unlimited number of retransmits, and we make this the new default (inline with the RFC). We also add a new setting: /proc/sys/net/ipv6/conf//router_solicitation_max_interval defaulting to 1 hour (per RFC recommendation). Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit `bd11f0741f` in DaveM's net-next/master, should make Linus' tree in 4.9-rc1) Change-Id: Ia32cdc5c61481893ef8040734e014bf2229fc39e	2017-04-10 13:29:27 +05:30
Chen Gang	d127906192	UPSTREAM: mm: add PHYS_PFN, use it in __phys_to_pfn() (cherry pick from commit `8f235d1a3e`) __phys_to_pfn and __pfn_to_phys are symmetric, PHYS_PFN and PFN_PHYS are semmetric: - y = (phys_addr_t)x << PAGE_SHIFT - y >> PAGE_SHIFT = (phys_add_t)x - (unsigned long)(y >> PAGE_SHIFT) = x [akpm@linux-foundation.org: use macro arg name `x'] [arnd@arndb.de: include linux/pfn.h for PHYS_PFN definition] Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 20045882 Bug: 19198045 Change-Id: If968d2246b381b9e5d6446e9d6d9fa45bb718e91	2017-04-10 13:23:34 +05:30
Joel Scherpelz	f2a4a926f8	net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs. This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). [backport of net-next `bbea124bc9`] Bug: 33333670 Test: net_test passes Signed-off-by: Joel Scherpelz <jscherpelz@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-10 13:12:16 +05:30
Jungseung Lee	f118138a6c	BACKPORT: mmc: core: Export device lifetime information through sysfs In the eMMC 5.0 version of the spec, several EXT_CSD fields about device lifetime are added. - Two types of estimated indications reflected by averaged wear out of memory - An indication reflected by average reserved blocks Export the information through sysfs. Signed-off-by: Jungseung Lee <js07.lee@samsung.com> Reviewed-by: Jaehoon Chung <jh80.chung@samsung.com> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2017-04-10 13:12:16 +05:30

1 2 3 4 5 ...

80034 Commits