Commit Graph

691501 Commits

Author SHA1 Message Date
Joerg Roedel
90b3eb03e1 iommu/amd: Rename free_on_init_error()
The function will also be used to free iommu resources when
amd_iommu=off was specified on the kernel command line. So
rename the function to reflect that.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-22 12:54:20 +02:00
Joerg Roedel
1112374153 iommu/amd: Disable IOMMUs at boot if they are enabled
When booting, make sure the IOMMUs are disabled. They could
be previously enabled if we boot into a kexec or kdump
kernel. So make sure they are off.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-22 12:54:19 +02:00
Heiko Carstens
addb63c18a KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows
For real-space designation asces the asce origin part is only a token.
The asce token origin must not be used to generate an effective
address for storage references. This however is erroneously done
within kvm_s390_shadow_tables().

Furthermore within the same function the wrong parts of virtual
addresses are used to generate a corresponding real address
(e.g. the region second index is used as region first index).

Both of the above can result in incorrect address translations. Only
for real space designations with a token origin of zero and addresses
below one megabyte the translation was correct.

Furthermore replace a "!asce.r" statement with a "!*fake" statement to
make it more obvious that a specific condition has nothing to do with
the architecture, but with the fake handling of real space designations.

Fixes: 3218f7094b ("s390/mm: support real-space for gmap shadows")
Cc: David Hildenbrand <david@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-22 12:53:34 +02:00
Arnd Bergmann
664d00d187 ASoC: es8316: add I2C dependency
Without CONFIG_I2C, we get a build failure:

sound/soc/codecs/es8316.c:633:1: error: data definition has no type or storage class [-Werror]
sound/soc/codecs/es8316.c:633:1: error: type defaults to 'int' in declaration of 'module_i2c_driver' [-Werror=implicit-int]
sound/soc/codecs/es8316.c:633:1: error: parameter names (without types) in function declaration [-Werror]
sound/soc/codecs/es8316.c:623:26: error: 'es8316_i2c_driver' defined but not used [-Werror=unused-variable]

This adds the required Kconfig dependency.

Fixes: b8b88b7087 ("ASoC: add es8316 codec driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
2017-06-22 11:46:25 +01:00
Colin Ian King
1943b06611 ASoC: max9867: make array ni_div static const
The array ni_div does not need to be in global scope and is not
modified, so make it static const.

Cleans up sparse warning:
"symbol 'ni_div' was not declared. Should it be static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-By: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2017-06-22 11:45:17 +01:00
Martin Schwidefsky
1cae025577 KVM: s390: avoid packed attribute
For naturally aligned and sized data structures avoid superfluous
packed and aligned attributes.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-22 12:41:07 +02:00
Yi Min Zhao
2c1a48f2e5 KVM: S390: add new group for flic
In some cases, userspace needs to get or set all ais states for example
migration. So we introduce a new group KVM_DEV_FLIC_AISM_ALL to provide
interfaces to get or set the adapter-interruption-suppression mode for
all ISCs. The corresponding documentation is updated.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-22 12:41:07 +02:00
Christian Borntraeger
6ae1574c2a KVM: s390: implement instruction execution protection for emulated
ifetch

While currently only used to fetch the original instruction on failure
for getting the instruction length code, we should make the page table
walking code future proof.

Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-22 12:41:06 +02:00
Claudio Imbrenda
4036e3874a KVM: s390: ioctls to get and set guest storage attributes
* Add the struct used in the ioctls to get and set CMMA attributes.
* Add the two functions needed to get and set the CMMA attributes for
  guest pages.
* Add the two ioctls that use the aforementioned functions.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-22 12:41:06 +02:00
Claudio Imbrenda
190df4a212 KVM: s390: CMMA tracking, ESSA emulation, migration mode
* Add a migration state bitmap to keep track of which pages have dirty
  CMMA information.
* Disable CMMA by default, so we can track if it's used or not. Enable
  it on first use like we do for storage keys (unless we are doing a
  migration).
* Creates a VM attribute to enter and leave migration mode.
* In migration mode, CMMA is disabled in the SIE block, so ESSA is
  always interpreted and emulated in software.
* Free the migration state on VM destroy.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-22 12:41:05 +02:00
Frederic Weisbecker
387bc8b553 sched/fair: Spare idle load balancing on nohz_full CPUs
Although idle load balancing obviously only concerns idle CPUs, it can
be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get rid
of an idle load balancing duty once a tick fires while it runs a task
and this can take a while on a nohz_full CPU.

We could fix that and escape the idle load balancing duty from the very
idle exit path but that would bring unecessary overhead. Lets just not
bother and leave that job to housekeeping CPUs (those outside nohz_full
range). The nohz_full CPUs simply don't want any disturbance.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1497838322-10913-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 11:30:02 +02:00
Frederic Weisbecker
a0db971e4e nohz: Move idle balancer registration to the idle path
The idle load balancing registration path assumes that we only stop the
tick when the CPU is idle, ignoring the nohz full case. As a result, a
nohz full CPU that is running a task may be chosen to perform idle load
balancing.

Lets make sure that only CPUs in dynticks idle mode can be picked as
idle load balancers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1497838322-10913-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 11:30:01 +02:00
Frederic Weisbecker
3c85d6db5e sched/loadavg: Generalize "_idle" naming to "_nohz"
The loadavg naming code still assumes that nohz == idle whereas its code
is actually handling well both nohz idle and nohz full.

So lets fix the naming according to what the code actually does, to
unconfuse the reader.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1497838322-10913-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 11:30:01 +02:00
Jiri Bohac
fe2d48b805 x86/debug: Extend the lower bound of crash kernel low reservations
The following change in 2013:

  0212f91596 ("x86: Add Crash kernel low reservation")

... introduced reserve_crashkernel_low(). This function is used to
reserve crash kernel memory either if crashkernel=size,low is given
on the command line or if the region reserved by reserve_crashkernel
is entirely above 4G.

reserve_crashkernel_low() tries to find a block of 'low_size' bytes.
But there seems to be no good reason to restrict the lower bound
of the range to 'low_size'.

Make memblock_find_in_range() search from the start of memory.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20170616161602.2r7birrf2y3ylv6v@dwarf.suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 11:10:23 +02:00
Kan Liang
fb3a5055cd perf/x86/intel: Add 1G DTLB load/store miss support for SKL
Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and
4M page size.
Need to extend the events to support any page size (4K/2M/4M/1G).

The complete DTLB load/store miss events are:

  DTLB_LOAD_MISSES.WALK_COMPLETED		0xe08
  DTLB_STORE_MISSES.WALK_COMPLETED		0xe49

Signed-off-by: Kan Liang <Kan.liang@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 11:07:08 +02:00
Andy Lutomirski
d54368127a x86/mm: Remove reset_lazy_tlbstate()
The only call site also calls idle_task_exit(), and idle_task_exit()
puts us into a clean state by explicitly switching to init_mm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/3acc7ad02a2ec060d2321a1e0f6de1cb90069517.1498022414.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:57:50 +02:00
Andy Lutomirski
7353425881 x86/ldt: Simplify the LDT switching logic
Originally, Linux reloaded the LDT whenever the prev mm or the next
mm had an LDT. It was changed in 2002 in:

  0bbed3beb4f2 ("[PATCH] Thread-Local Storage (TLS) support")

(commit from the historical tree), like this:

-		/* load_LDT, if either the previous or next thread
-		 * has a non-default LDT.
+		/*
+		 * load the LDT, if the LDT is different:
		 */
-		if (next->context.size+prev->context.size)
+		if (unlikely(prev->context.ldt != next->context.ldt))
			load_LDT(&next->context);

The current code is unlikely to avoid any LDT reloads, since different
mms won't share an LDT.

When we redo lazy mode to stop flush IPIs without switching to
init_mm, though, the current logic would become incorrect: it will
be possible to have real_prev == next but nonetheless have a stale
LDT descriptor.

Simplify the code to update LDTR if either the previous or the next
mm has an LDT, i.e. effectively restore the historical logic..
While we're at it, clean up the code by moving all the ifdeffery to
a header where it belongs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/2a859ac01245f9594c58f9d0a8b2ed8a7cd2507e.1498022414.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:57:50 +02:00
Ingo Molnar
a4eb8b9935 Merge branch 'linus' into x86/mm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:57:28 +02:00
Gary R Hook
30b4c54ccd crypto: ccp - Release locks before returning
krobot warning: make sure that all error return paths release locks.

Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-22 16:54:09 +08:00
Dan Carpenter
f2339eb9b9 crypto: cavium/nitrox - dma_mapping_error() returns bool
We want to return negative error codes here, but we're accidentally
propogating the "true" return from dma_mapping_error().

Fixes: 14fa93cdcd ("crypto: cavium - Add support for CNN55XX adapters.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-22 16:54:08 +08:00
Benjamin Peterson
8bd1d400f6 crypto: doc - fix typo in docs
Signed-off-by: Benjamin Peterson <bp@benjamin.pe>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-22 16:54:07 +08:00
Antoine Ténart
0f89b39bc6 Documentation/bindings: Document the SafeXel cryptographic engine driver
The Inside Secure Safexcel cryptographic engine is found on some Marvell
SoCs (7k/8k). Document the bindings used by its driver.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-22 16:54:07 +08:00
Michail Georgios Etairidis
6c782a5ea5 i2c: imx: Use correct function to write to register
The i2c-imx driver incorrectly uses readb()/writeb() to read and
write to the appropriate registers when performing a repeated start.
The appropriate imx_i2c_read_reg()/imx_i2c_write_reg() functions
should be used instead. Performing a repeated start results in
a kernel panic. The platform is imx.

Signed-off-by: Michail G Etairidis <m.etairidis@beck-ipc.com>
Fixes: ce1a78840f ("i2c: imx: add DMA support for freescale i2c driver")
Fixes: 054b62d9f2 ("i2c: imx: fix the i2c bus hang issue when do repeat restart")
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-06-22 10:52:02 +02:00
Yossi Kuperman
ca3a1b8566 esp6_offload: Fix IP6CB(skb)->nhoff for ESP GRO
IP6CB(skb)->nhoff is the offset of the nexthdr field in an IPv6
header, unless there are extension headers present, in which case
nhoff points to the nexthdr field of the last extension header.

In non-GRO code path, nhoff is set by ipv6_rcv before any XFRM code
is executed. Conversely, in GRO code path (when esp6_offload is loaded),
nhoff is not set. The following functions fail to read the correct value
and eventually the packet is dropped:

    xfrm6_transport_finish
    xfrm6_tunnel_input
    xfrm6_rcv_tnl

Set nhoff to the proper offset of nexthdr in esp6_gro_receive.

Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-22 10:49:14 +02:00
Yossi Kuperman
7c88e21aef xfrm6: Fix IPv6 payload_len in xfrm6_transport_finish
IPv6 payload length indicates the size of the payload, including any
extension headers.

In xfrm6_transport_finish, ipv6_hdr(skb)->payload_len is set to the
payload size only, regardless of the presence of any extension headers.
After ESP GRO transport mode decapsulation, ipv6_rcv trims the packet
according to the wrong payload_len, thus corrupting the packet.

Set payload_len to account for extension headers as well.

Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-22 10:49:14 +02:00
Horia Geantă
019d62db54 crypto: caam - fix gfp allocation flags (part II)
This is the 2nd part of fixing the usage of GFP_KERNEL for memory
allocations, taking care off all the places that haven't caused a real
problem / failure.
Again, the issue being fixed is that GFP_KERNEL should be used only when
MAY_SLEEP flag is set, i.e. MAY_BACKLOG flag usage is orthogonal.

Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-22 16:47:23 +08:00
Horia Geantă
42cfcafb91 crypto: caam - fix gfp allocation flags (part I)
Changes in the SW cts (ciphertext stealing) code in
commit 0605c41cc5 ("crypto: cts - Convert to skcipher")
revealed a problem in the CAAM driver:
when cts(cbc(aes)) is executed and cts runs in SW,
cbc(aes) is offloaded in CAAM; cts encrypts the last block
in atomic context and CAAM incorrectly decides to use GFP_KERNEL
for memory allocation.

Fix this by allowing GFP_KERNEL (sleeping) only when MAY_SLEEP flag is
set, i.e. remove MAY_BACKLOG flag.

We split the fix in two parts - first is sent to -stable, while the
second is not (since there is no known failure case).

Link: http://lkml.kernel.org/g/20170602122446.2427-1-david@sigma-star.at
Cc: <stable@vger.kernel.org> # 4.8+
Reported-by: David Gstir <david@sigma-star.at>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-22 16:47:22 +08:00
Stephan Mueller
b61929c654 crypto: drbg - Fixes panic in wait_for_completion call
Initialise ctr_completion variable before use.

Cc: <stable@vger.kernel.org>
Signed-off-by: Harsh Jain <harshjain.prof@gmail.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-22 16:47:21 +08:00
Dou Liyang
538ac46c64 x86/apic: Make arch_init_msi/htirq_domain __init
These two functions are only called by arch_early_irq_init(), which
is an __init function, so mark them __init as well.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498101341-10182-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:34:42 +02:00
Dou Liyang
a884d25f38 x86/apic: Make init_legacy_irqs() __init
This function is only called by arch_early_irq_init(), which is an
__init function, so mark the child function __init as well.

In addition mark it inline for the !CONFIG_X86_IO_APIC case.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498040061-5332-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:34:41 +02:00
Ingo Molnar
f9e1698831 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:19:14 +02:00
Paul Mackerras
d4cfb11387 powerpc: Convert VDSO update function to use new update_vsyscall interface
This converts the powerpc VDSO time update function to use the new
interface introduced in commit 576094b7f0 ("time: Introduce new
GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
us the time as of the last update in seconds and whole nanoseconds,
with the new interface we get the nanoseconds part effectively in
a binary fixed-point format with tk->tkr_mono.shift bits to the
right of the binary point.

With the old interface, the fractional nanoseconds got truncated,
meaning that the value returned by the VDSO clock_gettime function
would have about 1ns of jitter in it compared to the value computed
by the generic timekeeping code in the kernel.

The powerpc VDSO time functions (clock_gettime and gettimeofday)
already work in units of 2^-32 seconds, or 0.23283 ns, because that
makes it simple to split the result into seconds and fractional
seconds, and represent the fractional seconds in either microseconds
or nanoseconds.  This is good enough accuracy for now, so this patch
avoids changing how the VDSO works or the interface in the VDSO data
page.

This patch converts the powerpc update_vsyscall_old to be called
update_vsyscall and use the new interface.  We convert the fractional
second to units of 2^-32 seconds without truncating to whole nanoseconds.
(There is still a conversion to whole nanoseconds for any legacy users
of the vdso_data/systemcfg stamp_xtime field.)

In addition, this improves the accuracy of the computation of tb_to_xs
for those systems with high-frequency timebase clocks (>= 268.5 MHz)
by doing the right shift in two parts, one before the multiplication and
one after, rather than doing the right shift before the multiplication.
(We can't do all of the right shift after the multiplication unless we
use 128-bit arithmetic.)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-22 16:26:23 +10:00
Linus Torvalds
8d829b9bb8 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "This contains a set of fixes for xen-blkback by way of Konrad, and a
  performance regression fix for blk-mq for shared tags.

  The latter could account for as much as a 50x reduction in
  performance, with the test case from the user with 500 name spaces. A
  more realistic setup on my end with 32 drives showed a 3.5x drop. The
  fix has been thoroughly tested before being committed"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: fix performance regression with shared tags
  xen-blkback: don't leak stack data via response ring
  xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
  xen/blkback: don't free be structure too early
  xen/blkback: fix disconnect while I/Os in flight
2017-06-21 22:15:00 -07:00
Darrick J. Wong
eb5e248d50 xfs: don't allow bmap on rt files
bmap returns a dumb LBA address but not the block device that goes with
that LBA.  Swapfiles don't care about this and will blindly assume that
the data volume is the correct blockdev, which is totally bogus for
files on the rt subvolume.  This results in the swap code doing IOs to
arbitrary locations on the data device(!) if the passed in mapping is a
realtime file, so just turn off bmap for rt files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-06-21 20:27:35 -07:00
Tahsin Erdogan
c1a5d5f6ab ext4: improve journal credit handling in set xattr paths
Both ext4_set_acl() and ext4_set_context() need to be made aware of
ea_inode feature when it comes to credits calculation.

Also add a sufficient credits check in ext4_xattr_set_handle() right
after xattr write lock is grabbed. Original credits calculation is done
outside the lock so there is a possiblity that the initially calculated
credits are not sufficient anymore.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 22:28:40 -04:00
Tahsin Erdogan
65d3000520 ext4: ext4_xattr_delete_inode() should return accurate errors
In a few places the function returns without trying to pass the actual
error code to the caller. Fix those.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 22:24:38 -04:00
Tahsin Erdogan
b347e2bcd1 ext4: retry storing value in external inode with xattr block too
When value size is <= EXT4_XATTR_MIN_LARGE_EA_SIZE(), and it
doesn't fit in either inline or xattr block, a second try is made to
store it in an external inode while storing the entry itself in inline
area. There should also be an attempt to store the entry in xattr block.

This patch adds a retry loop to do that. It also makes the caller the
sole decider on whether to store a value in an external inode.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 22:20:32 -04:00
Tahsin Erdogan
b315529891 ext4: fix credits calculation for xattr inode
When there is no space for a value in xattr block, it may be stored
in an xattr inode even if the value length is less than
EXT4_XATTR_MIN_LARGE_EA_SIZE(). So the current assumption in credits
calculation is wrong.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 22:16:20 -04:00
Tahsin Erdogan
7cec191894 ext4: fix ext4_xattr_cmp()
When a xattr entry refers to an external inode, the value data is not
available in the inline area so we should not attempt to read it using
value offset.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 22:14:30 -04:00
Tahsin Erdogan
f6109100ba ext4: fix ext4_xattr_move_to_block()
When moving xattr entries from inline area to a xattr block, entries
that refer to external xattr inodes need special handling because
value data is not available in the inline area but rather should be
read from its external inode.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 22:11:54 -04:00
Tahsin Erdogan
9bb21cedda ext4: fix ext4_xattr_make_inode_space() value size calculation
ext4_xattr_make_inode_space() is interested in calculating the inline
space used in an inode. When a xattr entry refers to an external inode
the value size indicates the external inode size, not the value size in
the inline area. Change the function to take this into account.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 22:05:44 -04:00
Tahsin Erdogan
0bd454c04f ext4: ext4_xattr_value_same() should return false for external data
ext4_xattr_value_same() is used as a quick optimization in case the new
xattr value is identical to the previous value. When xattr value is
stored in a xattr inode the check becomes expensive so it is better to
just assume that they are not equal.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 22:02:06 -04:00
Tahsin Erdogan
990461dd85 ext4: add missing le32_to_cpu(e_value_inum) conversions
Two places in code missed converting xattr inode number using
le32_to_cpu().

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 21:59:30 -04:00
Tahsin Erdogan
9096669332 ext4: clean up ext4_xattr_inode_get()
The input and output values of *size parameter are equal on successful
return from ext4_xattr_inode_get().  On error return, the callers ignore
the output value so there is no need to update it.

Also check for NULL return from ext4_bread().  If the actual xattr inode
size happens to be smaller than the expected size, ext4_bread() may
return NULL which would indicate data corruption.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 21:57:36 -04:00
Tahsin Erdogan
bab79b0499 ext4: change ext4_xattr_inode_iget() signature
In general, kernel functions indicate success/failure through their return
values. This function returns the status as an output parameter and reserves
the return value for the inode. Make it follow the general convention.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 21:49:53 -04:00
Tahsin Erdogan
0eefb10758 ext4: extended attribute value size limit is enforced by vfs
EXT4_XATTR_MAX_LARGE_EA_SIZE definition in ext4 is currently unused.
Besides, vfs enforces its own 64k limit which makes the 1MB limit in
ext4 redundant. Remove it.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 21:41:37 -04:00
Tahsin Erdogan
1e7d359d71 ext4: fix ref counting for ea_inode
The ref count on ea_inode is incremented by
ext4_xattr_inode_orphan_add() which is supposed to be decremented by
ext4_xattr_inode_array_free(). The decrement is conditioned on whether
the ea_inode is currently on the orphan list. However, the orphan list
addition only happens when journaling is enabled. In non-journaled case,r
we fail to release the ref count causing an error message like below.

"VFS: Busy inodes after unmount of sdb. Self-destruct in 5 seconds.
Have a nice day..."

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 21:39:38 -04:00
Tahsin Erdogan
ddfa17e4ad ext4: call journal revoke when freeing ea_inode blocks
ea_inode contents are treated as metadata, that's why it is journaled
during initial writes. Failing to call revoke during freeing could cause
user data to be overwritten with original ea_inode contents during journal
replay.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 21:36:51 -04:00
Tahsin Erdogan
9e1ba00161 ext4: ea_inode owner should be the same as the inode owner
Quota charging is based on the ownership of the inode. Currently, the
xattr inode owner is set to the caller which may be different from the
parent inode owner. This is inconsistent with how quota is charged for
xattr block and regular data block writes.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-21 21:27:00 -04:00
Paul Mackerras
2ed4f9dd19 KVM: PPC: Book3S HV: Add capability to report possible virtual SMT modes
Now that userspace can set the virtual SMT mode by enabling the
KVM_CAP_PPC_SMT capability, it is useful for userspace to be able
to query the set of possible virtual SMT modes.  This provides a
new capability, KVM_CAP_PPC_SMT_POSSIBLE, to provide this
information.  The return value is a bitmap of possible modes, with
bit N set if virtual SMT mode 2^N is available.  That is, 1 indicates
SMT1 is available, 2 indicates that SMT2 is available, 3 indicates
that both SMT1 and SMT2 are available, and so on.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-22 11:25:31 +10:00