Commit Graph

737116 Commits

Author SHA1 Message Date
Anand Jain
e124ece53e btrfs: get device pointer from device_list_add()
Instead of pointer to btrfs_fs_devices as an arg in device_list_add()
better to get pointer to btrfs_device as return value, then we have
both, pointer to btrfs_device and btrfs_fs_devices. btrfs_device is
needed to handle reappearing missing device.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-29 19:31:15 +01:00
Linus Torvalds
1a9a126b50 Merge tag 'acpi-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
 "The majority of this is an update of the ACPICA kernel code to
  upstream revision 20171215 with a cosmetic change and a maintainers
  information update on top of it.

  The rest is mostly some minor fixes and cleanups in the ACPI drivers
  and cleanups to initialization on x86.

  Specifics:

   - Update the ACPICA kernel code to upstream revision 20171215 including:
      * Support for ACPI 6.0A changes in the NFIT table (Bob Moore)
      * Local 64-bit divide in string conversions (Bob Moore)
      * Fix for a regression in acpi_evaluate_object_type() (Bob Moore)
      * Fixes for memory leaks during package object resolution (Bob
        Moore)
      * Deployment of safe version of strncpy() (Bob Moore)
      * Debug and messaging updates (Bob Moore)
      * Support for PDTT, SDEV, TPM2 tables in iASL and tools (Bob
        Moore)
      * Null pointer dereference avoidance in Op and cleanups (Colin Ian
        King)
      * Fix for memory leak from building prefixed pathname (Erik
        Schmauss)
      * Coding style fixes, disassembler and compiler updates (Hanjun
        Guo, Erik Schmauss)
      * Additional PPTT flags from ACPI 6.2 (Jeremy Linton)
      * Fix for an off-by-one error in acpi_get_timer_duration()
        (Jung-uk Kim)
      * Infinite loop detection timeout and utilities cleanups (Lv
        Zheng)
      * Windows 10 version 1607 and 1703 OSI strings (Mario
        Limonciello)

   - Update ACPICA information in MAINTAINERS to reflect the current
     status of ACPICA maintenance and rename a local variable in one
     function to match the corresponding upstream code (Rafael Wysocki)

   - Clean up ACPI-related initialization on x86 (Andy Shevchenko)

   - Add support for Intel Merrifield to the ACPI GPIO code (Andy
     Shevchenko)

   - Clean up ACPI PMIC drivers (Andy Shevchenko, Arvind Yadav)

   - Fix the ACPI Generic Event Device (GED) driver to free IRQs on
     shutdown and clean up the PCI IRQ Link driver (Sinan Kaya)

   - Make the GHES code call into the AER driver on all errors and clean
     up the ACPI APEI code (Colin Ian King, Tyler Baicar)

   - Make the IA64 ACPI NUMA code parse all SRAT entries (Ganapatrao
     Kulkarni)

   - Add a lid switch blacklist to the ACPI button driver and make it
     print extra debug messages on lid events (Hans de Goede)

   - Add quirks for Asus GL502VSK and UX305LA to the ACPI battery driver
     and clean it up somewhat (Bjørn Mork, Kai-Heng Feng)

   - Add device link for CHT SD card dependency on I2C to the ACPI LPSS
     (Intel SoCs) driver and make it avoid creating platform device
     objects for devices without MMIO resources (Adrian Hunter, Hans de
     Goede)

   - Fix the ACPI GPE mask kernel command line parameter handling
     (Prarit Bhargava)

   - Fix the handling of (incorrectly exposed) backlight interfaces
     without LCD (Hans de Goede)

   - Fix the usage of debugfs_create_*() in the ACPI EC driver (Geert
     Uytterhoeven)"

* tag 'acpi-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits)
  ACPI/PCI: pci_link: reduce verbosity when IRQ is enabled
  ACPI / LPSS: Do not instiate platform_dev for devs without MMIO resources
  ACPI / PMIC: Convert to use builtin_platform_driver() macro
  ACPI / x86: boot: Propagate error code in acpi_gsi_to_irq()
  ACPICA: Update version to 20171215
  ACPICA: trivial style fix, no functional change
  ACPICA: Fix a couple memory leaks during package object resolution
  ACPICA: Recognize the Windows 10 version 1607 and 1703 OSI strings
  ACPICA: DT compiler: prevent error if optional field at the end of table is not present
  ACPICA: Rename a global variable, no functional change
  ACPICA: Create and deploy safe version of strncpy
  ACPICA: Cleanup the global variables and update comments
  ACPICA: Debugger: fix slight indentation issue
  ACPICA: Fix a regression in the acpi_evaluate_object_type() interface
  ACPICA: Update for a few debug output statements
  ACPICA: Debug output, no functional change
  ACPI: EC: Fix debugfs_create_*() usage
  ACPI / video: Default lcd_only to true on Win8-ready and newer machines
  ACPI / x86: boot: Don't setup SCI on HW-reduced platforms
  ACPI / x86: boot: Use INVALID_ACPI_IRQ instead of 0 for acpi_sci_override_gsi
  ...
2018-01-29 10:17:53 -08:00
Linus Torvalds
7f3fdd40a7 Merge tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "This includes some infrastructure changes in the PM core, mostly
  related to integration between runtime PM and system-wide suspend and
  hibernation, plus some driver changes depending on them and fixes for
  issues in that area which have become quite apparent recently.

  Also included are changes making more x86-based systems use the Low
  Power Sleep S0 _DSM interface by default, which turned out to be
  necessary to handle power button wakeups from suspend-to-idle on
  Surface Pro3.

  On the cpufreq front we have fixes and cleanups in the core, some new
  hardware support, driver updates and the removal of some unused code
  from the CPU cooling thermal driver.

  Apart from this, the Operating Performance Points (OPP) framework is
  prepared to be used with power domains in the future and there is a
  usual bunch of assorted fixes and cleanups.

  Specifics:

   - Define a PM driver flag allowing drivers to request that their
     devices be left in suspend after system-wide transitions to the
     working state if possible and add support for it to the PCI bus
     type and the ACPI PM domain (Rafael Wysocki).

   - Make the PM core carry out optimizations for devices with driver PM
     flags set in some cases and make a few drivers set those flags
     (Rafael Wysocki).

   - Fix and clean up wrapper routines allowing runtime PM device
     callbacks to be re-used for system-wide PM, change the generic
     power domains (genpd) framework to stop using those routines
     incorrectly and fix up a driver depending on that behavior of genpd
     (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).

   - Fix and clean up the PM core's device wakeup framework and
     re-factor system-wide PM core code related to device wakeup
     (Rafael Wysocki, Ulf Hansson, Brian Norris).

   - Make more x86-based systems use the Low Power Sleep S0 _DSM
     interface by default (to fix power button wakeup from
     suspend-to-idle on Surface Pro3) and add a kernel command line
     switch to tell it to ignore the system sleep blacklist in the ACPI
     core (Rafael Wysocki).

   - Fix a race condition related to cpufreq governor module removal and
     clean up the governor management code in the cpufreq core (Rafael
     Wysocki).

   - Drop the unused generic code related to the handling of the static
     power energy usage model in the CPU cooling thermal driver along
     with the corresponding documentation (Viresh Kumar).

   - Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh
     Cheng).

   - Add a new operating point to the imx6ul and imx6q cpufreq drivers
     and switch the latter to using clk_bulk_get() (Anson Huang, Dong
     Aisheng).

   - Add support for multiple regulators to the TI cpufreq driver along
     with a new DT binding related to that and clean up that driver
     somewhat (Dave Gerlach).

   - Fix a powernv cpufreq driver regression leading to incorrect CPU
     frequency reporting, fix that driver to deal with non-continguous
     P-states correctly and clean it up (Gautham Shenoy, Shilpasri
     Bhat).

   - Add support for frequency scaling on Armada 37xx SoCs through the
     generic DT cpufreq driver (Gregory CLEMENT).

   - Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).

   - Fix a transition delay setting regression in the longhaul cpufreq
     driver (Viresh Kumar).

   - Add Skylake X (server) support to the intel_pstate cpufreq driver
     and clean up that driver somewhat (Srinivas Pandruvada).

   - Clean up the cpufreq statistics collection code (Viresh Kumar).

   - Drop cluster terminology and dependency on physical_package_id from
     the PSCI driver and drop dependency on arm_big_little from the SCPI
     cpufreq driver (Sudeep Holla).

   - Add support for system-wide suspend and resume to the RAPL power
     capping driver and drop a redundant semicolon from it (Zhen Han,
     Luis de Bethencourt).

   - Make SPI domain validation (in the SCSI SPI transport driver) and
     system-wide suspend mutually exclusive as they rely on the same
     underlying mechanism and cannot be carried out at the same time
     (Bart Van Assche).

   - Fix the computation of the amount of memory to preallocate in the
     hibernation core and clean up one function in there (Rainer Fiebig,
     Kyungsik Lee).

   - Prepare the Operating Performance Points (OPP) framework for being
     used with power domains and clean up one function in it (Viresh
     Kumar, Wei Yongjun).

   - Clean up the generic sysfs interface for device PM (Andy
     Shevchenko).

   - Fix several minor issues in power management frameworks and clean
     them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
     Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
     Sergey Senozhatsky, gaurav jindal).

   - Make it easier to disable PM via Kconfig (Mark Brown).

   - Clean up the cpupower and intel_pstate_tracer utilities (Doug
     Smythies, Laura Abbott)"

* tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
  PCI / PM: Remove spurious semicolon
  cpufreq: scpi: remove arm_big_little dependency
  drivers: psci: remove cluster terminology and dependency on physical_package_id
  powercap: intel_rapl: Fix trailing semicolon
  dmaengine: rcar-dmac: Make DMAC reinit during system resume explicit
  PM / runtime: Allow no callbacks in pm_runtime_force_suspend|resume()
  PM / hibernate: Drop unused parameter of enough_swap
  PM / runtime: Check ignore_children in pm_runtime_need_not_resume()
  PM / runtime: Rework pm_runtime_force_suspend/resume()
  PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
  cpufreq: powernv: Dont assume distinct pstate values for nominal and pmin
  cpufreq: intel_pstate: Add Skylake servers support
  cpufreq: intel_pstate: Replace bxt_funcs with core_funcs
  platform/x86: surfacepro3: Support for wakeup from suspend-to-idle
  ACPI / PM: Use Low Power S0 Idle on more systems
  PM / wakeup: Print warn if device gets enabled as wakeup source during sleep
  PM / domains: Don't skip driver's ->suspend|resume_noirq() callbacks
  PM / core: Propagate wakeup_path status flag in __device_suspend_late()
  PM / core: Re-structure code for clearing the direct_complete flag
  powercap: add suspend and resume mechanism for SOC power limit
  ...
2018-01-29 09:47:41 -08:00
David S. Miller
f7dd5215b2 Merge branch 'net_sched-reflect-tx_queue_len-change-for-pfifo_fast'
Cong Wang says:

====================
net_sched: reflect tx_queue_len change for pfifo_fast

This pathcset restores the pfifo_fast qdisc behavior of dropping
packets based on latest dev->tx_queue_len. Patch 1 introduces
a helper, patch 2 introduces a new Qdisc ops which is called when
we modify tx_queue_len, patch 3 implements this ops for pfifo_fast.

Please see each patch for details.

---
v3: use skb_array_resize_multiple()
v2: handle error case for ->change_tx_queue_len()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:42:15 -05:00
Cong Wang
7007ba630e net_sched: implement ->change_tx_queue_len() for pfifo_fast
pfifo_fast used to drop based on qdisc_dev(qdisc)->tx_queue_len,
so we have to resize skb array when we change tx_queue_len.

Other qdiscs which read tx_queue_len are fine because they
all save it to sch->limit or somewhere else in qdisc during init.
They don't have to implement this, it is nicer if they do so
that users don't have to re-configure qdisc after changing
tx_queue_len.

Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:42:15 -05:00
Cong Wang
48bfd55e7e net_sched: plug in qdisc ops change_tx_queue_len
Introduce a new qdisc ops ->change_tx_queue_len() so that
each qdisc could decide how to implement this if it wants.
Previously we simply read dev->tx_queue_len, after pfifo_fast
switches to skb array, we need this API to resize the skb array
when we change dev->tx_queue_len.

To avoid handling race conditions with TX BH, we need to
deactivate all TX queues before change the value and bring them
back after we are done, this also makes implementation easier.

Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:42:15 -05:00
Cong Wang
6a643ddb56 net: introduce helper dev_change_tx_queue_len()
This patch promotes the local change_tx_queue_len() to a core
helper function, dev_change_tx_queue_len(), so that rtnetlink
and net-sysfs could share the code. This also prepares for the
following patch.

Note, the -EFAULT in the original code doesn't make sense,
we should propagate the errno from notifiers.

Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:42:15 -05:00
Linus Torvalds
1c1f395b28 Merge tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "The major changes in the core API side in this cycle are the still
  on-going ASoC componentization works. Other than that, only few small
  changes such as 20bit PCM format support are found.

  Meanwhile the rest majority of changes are for ASoC drivers:

   - Large cleanups of some of the TI CODEC drivers

   - Continued work on Intel ASoC stuff for new quirks, ACPI GPIO
     handling, Kconfigs and lots of cleanups

   - Refactoring of the Freescale SSI driver, as preliminary work for
     the upcoming changes

   - Work on ST DFSDM driver, including the required IIO patches

   - New drivers for Allwinner A83T, Maxim MAX89373, SocioNext UiniPhier
     EVEA Tempo Semiconductor TSCS42xx and TI PCM816x, TAS5722 and
     TAS6424 devices

   - Removal of dead codes for SN95031 and board drivers

  Last but not least, a few HD-audio and USB-audio quirks are included
  as usual, too"

* tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (303 commits)
  ALSA: hda - Reduce the suspend time consumption for ALC256
  ASoC: use seq_file to dump the contents of dai_list,platform_list and codec_list
  ASoC: soc-core: add missing EXPORT_SYMBOL_GPL() for snd_soc_rtdcom_lookup
  IIO: ADC: stm32-dfsdm: remove unused variable again
  ASoC: bcm2835: fix hw_params error when device is in prepared state
  ASoC: mxs-sgtl5000: Do not print error on probe deferral
  ASoC: sgtl5000: Do not print error on probe deferral
  ASoC: Intel: remove select on non-existing SND_SOC_INTEL_COMMON
  ALSA: usb-audio: Support changing input on Sound Blaster E1
  ASoC: Intel: remove second duplicated assignment to pointer 'res'
  ALSA: hda/realtek - update ALC215 depop optimize
  ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
  ALSA: pcm: Fix trailing semicolon
  ASoC: add Component level .read/.write
  ASoC: cx20442: fix regression by adding back .read/.write
  ASoC: uda1380: fix regression by adding back .read/.write
  ASoC: tlv320dac33: fix regression by adding back .read/.write
  ALSA: hda - Use IS_REACHABLE() for dependency on input
  IIO: ADC: stm32-dfsdm: fix static check warning
  IIO: ADC: stm32-dfsdm: code optimization
  ...
2018-01-29 09:41:47 -08:00
Chengguang Xu
affff07739 libceph: check kstrndup() return value
Should check result of kstrndup() in case of memory allocation failure.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:12 +01:00
Zhi Zhang
e30ee58121 ceph: try to allocate enough memory for reserved caps
ceph_reserve_caps() may not reserve enough caps under high memory
pressure, but it saved the needed caps number that expected to
be reserved. When getting caps, crash would happen due to number
mismatch.

Now we will try to trim more caps when failing to allocate memory
for caps need to be reserved, then try again. If still failing to
allocate memory, return -ENOMEM.

Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:12 +01:00
Yan, Zheng
0f439c746c ceph: fix race of queuing delayed caps
When called with CHECK_CAPS_AUTHONLY flag, ceph_check_caps() only
processes auth caps. In that case, it's unsafe to remove inode
from mdsc->cap_delay_list, because there can be delayed non-auth
caps.

Besides, ceph_check_caps() may lock/unlock i_ceph_lock several
times, when multiple threads call ceph_check_caps() at the same
time. It's possible that one thread calls __cap_delay_requeue(),
another thread calls __cap_delay_cancel(). __cap_delay_cancel()
should be called at very beginning of ceph_check_caps(), so that
it does not race with __cap_delay_requeue().

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:11 +01:00
Yan, Zheng
ee612d954f ceph: delete unreachable code in ceph_check_caps()
"revoking & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)" has already
been tested before calling try_nonblocking_invalidate()

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:11 +01:00
Yan, Zheng
d84b37f9fa ceph: limit rate of cap import/export error messages
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:10 +01:00
Yan, Zheng
7d9c9193b5 ceph: fix incorrect snaprealm when adding caps
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:09 +01:00
Yan, Zheng
314c4737a4 ceph: fix un-balanced fsc->writeback_count update
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:09 +01:00
Yan, Zheng
5d98830828 ceph: track read contexts in ceph_file_info
Previously ceph_read_iter() uses current->journal to pass context info
to ceph_readpages(), so that ceph_readpages() can distinguish read(2)
from readahead(2)/fadvise(2)/madvise(2). The problem is that page fault
can happen when copying data to userspace memory. Page fault may call
other filesystem's page_mkwrite() if the userspace memory is mapped to a
file. The later filesystem may also want to use current->journal.

The fix is define a on-stack data structure in ceph_read_iter(), add it
to context list in ceph_file_info. ceph_readpages() searches the list,
find if there is a context belongs to current thread.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:08 +01:00
Yan, Zheng
5495c2d04f ceph: avoid dereferencing invalid pointer during cached readdir
Readdir cache keeps array of dentry pointers in page cache. If any
dentry in readdir cache gets pruned, ceph_d_prune() disables readdir
cache for later readdir syscall. The problem is that ceph_d_prune()
ignores unhashed dentry. Ideally MDS should have already revoked
CEPH_CAP_FILE_SHARED (which also disables readdir cache) when dentry
gets unhashed. But if it is somehow MDS does not properly revoke
CEPH_CAP_FILE_SHARED and the unhashed dentry gets pruned later,
ceph_d_prune() will not disable readdir cache, later readdir may
reference invalid dentry pointer.

The fix is make ceph_d_prune() do extra check for unhashed dentry.
Disable readdir cache if the unhashed dentry is still referenced
by readdir cache.

Another fix in this patch is handle d_splice_alias(). If a dentry
gets spliced into new parent dentry, treat it as if it was pruned
(call ceph_d_prune() for it).

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:07 +01:00
Yan, Zheng
97aeb6bf98 ceph: use atomic_t for ceph_inode_info::i_shared_gen
It allows accessing i_shared_gen without holding i_ceph_lock. It is
preparation for later patch.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:07 +01:00
Yan, Zheng
8d8f371c83 ceph: cleanup traceless reply handling for rename
ceph_fill_trace() already calls ceph_invalidate_dir_request() for
traceless reply. No need to duplicate the code in ceph_rename().

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:06 +01:00
Yan, Zheng
87c91a965a ceph: voluntarily drop Fx cap for readdir request
MDS need to rdlock directory inode's filelock when handling readdir
request. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke
message.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:05 +01:00
Yan, Zheng
be70489eff ceph: properly drop caps for setattr request
For CEPH_SETATTR_ATIME, MDS needs to xlock filelock, Fsxrw caps
are not allowed for xlocked filelock.

For CEPH_SETATTR_SIZE request that truncates file to smaller size,
MDS needs to xlock filelock, Fsxrw caps are not allowed for xlocked
filelock.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:05 +01:00
Yan, Zheng
d19a0b5401 ceph: voluntarily drop Lx cap for link/rename requests
MDS need to xlock inode's linklock when handling link/rename requests.
Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:04 +01:00
Yan, Zheng
222b7f90ba ceph: voluntarily drop Ax cap for requests that create new inode
MDS need to rdlock directory inode's authlock when handling these
requests. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke
message.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:04 +01:00
Ilya Dryomov
e573427a44 rbd: whitelist RBD_FEATURE_OPERATIONS feature bit
This feature bit restricts older clients from performing certain
maintenance operations against an image (e.g. clone, snap create).
krbd does not perform maintenance operations.

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-01-29 18:36:03 +01:00
Jason Wang
4cd879515d vhost_net: stop device during reset owner
We don't stop device before reset owner, this means we could try to
serve any virtqueue kick before reset dev->worker. This will result a
warn since the work was pending at llist during owner resetting. Fix
this by stopping device during owner reset.

Reported-by: syzbot+eb17c6162478cc50632c@syzkaller.appspotmail.com
Fixes: 3a4d5c94e9 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:26:20 -05:00
David S. Miller
e8368d9ebb Merge branch 'net-Ease-to-follow-an-interface-that-moves-to-another-netns'
Nicolas Dichtel says:

====================
net: Ease to follow an interface that moves to another netns

The goal of this series is to ease the user to follow an interface that
moves to another netns.

After this series, with a patched iproute2:

$ ip netns
bar
foo
$ ip monitor link &
$ ip link set dummy0 netns foo
Deleted 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 6e:a7:82:35:96:46 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6

=> new nsid: 0, new ifindex: 6 (was 5 in the previous netns)

$ ip link set eth1 netns bar
Deleted 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
    link/ether 52:54:01:12:34:57 brd ff:ff:ff:ff:ff:ff new-nsid 1 new-ifindex 3

=> new nsid: 1, new ifindex: 3 (same ifindex)

$ ip netns
bar (id: 1)
foo (id: 0)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:23:53 -05:00
Nicolas Dichtel
38e01b3056 dev: advertise the new ifindex when the netns iface changes
The goal is to let the user follow an interface that moves to another
netns.

CC: Jiri Benc <jbenc@redhat.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:23:52 -05:00
Nicolas Dichtel
c36ac8e230 dev: always advertise the new nsid when the netns iface changes
The user should be able to follow any interface that moves to another
netns.  There is no reason to hide physical interfaces.

CC: Jiri Benc <jbenc@redhat.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:23:51 -05:00
Vadim Lomovtsev
6b9e65474b net: ethernet: cavium: Correct Cavium Thunderx NIC driver names accordingly to module name
It was found that ethtool provides unexisting module name while
it queries the specified network device for associated driver
information. Then user tries to unload that module by provided
module name and fails.

This happens because ethtool reads value of DRV_NAME macro,
while module name is defined at the driver's Makefile.

This patch is to correct Cavium CN88xx Thunder NIC driver names
(DRV_NAME macro) 'thunder-nicvf' to 'nicvf' and 'thunder-nic'
to 'nicpf', sync bgx and xcv driver names accordingly to their
module names.

Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:22:06 -05:00
Linus Torvalds
49f9c3552c Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull init_task initializer cleanups from David Howells:
 "It doesn't seem useful to have the init_task in a header file rather
  than in a normal source file. We could consolidate init_task handling
  instead and expand out various macros.

  Here's a series of patches that consolidate init_task handling:

   (1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
       openrisc.

   (2) Alter the INIT_TASK_DATA linker script macro to set
       init_thread_union and init_stack rather than defining these in C.

       Insert init_task and init_thread_into into the init_stack area in
       the linker script as appropriate to the configuration, with
       different section markers so that they end up correctly ordered.

       We can then get merge ia64's init_task.c into the main one.

       We then have a bunch of single-use INIT_*() macros that seem only
       to be macros because they used to be used per-arch. We can then
       expand these in place of the user and get rid of a few lines and
       a lot of backslashes.

   (3) Expand INIT_TASK() in place.

   (4) Expand in place various small INIT_*() macros that are defined
       conditionally. Expand them and surround them by #if[n]def/#endif
       in the .c file as it takes fewer lines.

   (5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.

   (6) Expand INIT_STRUCT_PID in place.

  These macros can then be discarded"

* tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  Expand INIT_STRUCT_PID and remove
  Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
  Expand various INIT_* macros and remove
  Expand INIT_TASK() in init/init_task.c and remove
  Construct init thread stack in the linker script rather than by union
  openrisc: Make THREAD_SIZE available to vmlinux.lds
  hexagon: Make THREAD_SIZE available to vmlinux.lds
  cris: Make THREAD_SIZE available to vmlinux.lds
2018-01-29 09:08:34 -08:00
David S. Miller
bfbe5bab66 Merge branch 'ptr_ring-fixes'
Michael S. Tsirkin says:

====================
ptr_ring fixes

This fixes a bunch of issues around ptr_ring use in net core.
One of these: "tap: fix use-after-free" is also needed on net,
but can't be backported cleanly.

I will post a net patch separately.

Lightly tested - Jason, could you pls confirm this
addresses the security issue you saw with ptr_ring?
Testing reports would be appreciated too.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2018-01-29 12:02:55 -05:00
Michael S. Tsirkin
491847f3b2 tools/virtio: fix smp_mb on x86
Offset 128 overlaps the last word of the redzone.
Use 132 which is always beyond that.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:55 -05:00
Michael S. Tsirkin
b4eab7de66 tools/virtio: copy READ/WRITE_ONCE
This is to make ptr_ring test build again.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:55 -05:00
Michael S. Tsirkin
6dd4215783 tools/virtio: more stubs to fix tools build
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
30f1d37074 tools/virtio: switch to __ptr_ring_empty
We don't rely on lockless guarantees, but it
seems cleaner than inverting __ptr_ring_peek.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
a07d29c672 ptr_ring: prevent queue load/store tearing
In theory compiler could tear queue loads or stores in two. It does not
seem to be happening in practice but it seems easier to convert the
cases where this would be a problem to READ/WRITE_ONCE than worry about
it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
f417dc2818 skb_array: use __ptr_ring_empty
__skb_array_empty should use __ptr_ring_empty since that's the only
legal lockless function.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
9fb582b670 Revert "net: ptr_ring: otherwise safe empty checks can overrun array bounds"
This reverts commit bcecb4bbf8.

If we try to allocate an extra entry as the above commit did, and when
the requested size is UINT_MAX, addition overflows causing zero size to
be passed to kmalloc().

kmalloc then returns ZERO_SIZE_PTR with a subsequent crash.

Reported-by: syzbot+87678bcf753b44c39b67@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
84328342a7 ptr_ring: disallow lockless __ptr_ring_full
Similar to bcecb4bbf8 ("net: ptr_ring: otherwise safe empty checks can
overrun array bounds") a lockless use of __ptr_ring_full might
cause an out of bounds access.

We can fix this, but it's easier to just disallow lockless
__ptr_ring_full for now.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
88fae87327 tap: fix use-after-free
Lockless access to __ptr_ring_full is only legal if ring is
never resized, otherwise it might cause use-after free errors.
Simply drop the lockless test, we'll drop the packet
a bit later when produce fails.

Fixes: 362899b8 ("macvtap: switch to use skb array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:53 -05:00
Michael S. Tsirkin
a259df36d1 ptr_ring: READ/WRITE_ONCE for __ptr_ring_empty
Lockless __ptr_ring_empty requires that consumer head is read and
written at once, atomically. Annotate accordingly to make sure compiler
does it correctly.  Switch locked callers to __ptr_ring_peek which does
not support the lockless operation.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:53 -05:00
Michael S. Tsirkin
8619d384eb ptr_ring: clean up documentation
The only function safe to call without locks
is __ptr_ring_empty. Move documentation about
lockless use there to make sure people do not
try to use __ptr_ring_peek outside locks.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:53 -05:00
Michael S. Tsirkin
406de75554 ptr_ring: keep consumer_head valid at all times
The comment near __ptr_ring_peek says:

 * If ring is never resized, and if the pointer is merely
 * tested, there's no need to take the lock - see e.g.  __ptr_ring_empty.

but this was in fact never possible since consumer_head would sometimes
point outside the ring. Refactor the code so that it's always
pointing within a ring.

Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:53 -05:00
Bob Peterson
2eb5909dee GFS2: Don't try to end a non-existent transaction in unlink
Before this patch, if function gfs2_unlink failed to get a valid
transaction (for example, not enough journal blocks) it would go
to label out_end_trans which did gfs2_trans_end. But if the
trans_begin failed, there's no transaction to end, and trying to
do so results in: kernel BUG at fs/gfs2/trans.c:117!

This patch changes the goto so that it does not try to end a
non-existent transaction.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-29 10:00:23 -07:00
Martin KaFai Lau
7ece54a60e ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only
If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
implicitly implies it is an ipv6only socket.  However, in inet6_bind(),
this addr_type checking and setting sk->sk_ipv6only to 1 are only done
after sk->sk_prot->get_port(sk, snum) has been completed successfully.

This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
the 'get_port()'.

In particular, when binding SO_REUSEPORT UDP sockets,
udp_reuseport_add_sock(sk,...) is called.  udp_reuseport_add_sock()
checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
sk2->sk_reuseport_cb.  In this case, ipv6_only_sock(sk2) could be
1 while ipv6_only_sock(sk) is still 0 here.  The end result is,
reuseport_alloc(sk) is called instead of adding sk to the existing
sk2->sk_reuseport_cb.

It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
IPv6 address (!ANY and !MAPPED).  Only one of the socket will
receive packet.

The fix is to set the implicit sk_ipv6only before calling get_port().
The original sk_ipv6only has to be saved such that it can be restored
in case get_port() failed.  The situation is similar to the
inet_reset_saddr(sk) after get_port() has failed.

Thanks to Calvin Owens <calvinowens@fb.com> who created an easy
reproduction which leads to a fix.

Fixes: e32ea7e747 ("soreuseport: fast reuseport UDP socket selection")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:37:40 -05:00
David S. Miller
2479c2c9f4 Merge branch 'rtnetlink-enable-IFLA_IF_NETNSID-for-RTM_DELLINK-RTM_SETINK'
Christian Brauner says:

====================
rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK

Based on the previous discussion this enables passing a IFLA_IF_NETNSID
property along with RTM_SETLINK and RTM_DELLINK requests. The patch for
RTM_NEWLINK will be sent out in a separate patch since there are more
corner-cases to think about.

Changelog 2018-01-24:
* Preserve old behavior and report -ENODEV when either ifindex or ifname is
  provided and IFLA_GROUP is set. Spotted by Wolfgang Bumiller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:31:07 -05:00
Christian Brauner
b61ad68a9f rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK
- Backwards Compatibility:
  If userspace wants to determine whether RTM_DELLINK supports the
  IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
  with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
  does not include IFLA_IF_NETNSID userspace should assume that
  IFLA_IF_NETNSID is not supported on this kernel.
  If the reply does contain an IFLA_IF_NETNSID property userspace
  can send an RTM_DELLINK with a IFLA_IF_NETNSID property. If they receive
  EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
  with RTM_DELLINK. Userpace should then fallback to other means.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:31:06 -05:00
Christian Brauner
c310bfcb6e rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK
- Backwards Compatibility:
  If userspace wants to determine whether RTM_SETLINK supports the
  IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
  with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
  does not include IFLA_IF_NETNSID userspace should assume that
  IFLA_IF_NETNSID is not supported on this kernel.
  If the reply does contain an IFLA_IF_NETNSID property userspace
  can send an RTM_SETLINK with a IFLA_IF_NETNSID property. If they receive
  EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
  with RTM_SETLINK. Userpace should then fallback to other means.

  To retain backwards compatibility the kernel will first check whether a
  IFLA_NET_NS_PID or IFLA_NET_NS_FD property has been passed. If either
  one is found it will be used to identify the target network namespace.
  This implies that users who do not care whether their running kernel
  supports IFLA_IF_NETNSID with RTM_SETLINK can pass both
  IFLA_NET_NS_{FD,PID} and IFLA_IF_NETNSID referring to the same network
  namespace.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:31:06 -05:00
Christian Brauner
7c4f63ba82 rtnetlink: enable IFLA_IF_NETNSID in do_setlink()
RTM_{NEW,SET}LINK already allow operations on other network namespaces
by identifying the target network namespace through IFLA_NET_NS_{FD,PID}
properties. This is done by looking for the corresponding properties in
do_setlink(). Extend do_setlink() to also look for the IFLA_IF_NETNSID
property. This introduces no functional changes since all callers of
do_setlink() currently block IFLA_IF_NETNSID by reporting an error before
they reach do_setlink().

This introduces the helpers:

static struct net *rtnl_link_get_net_by_nlattr(struct net *src_net, struct
                                               nlattr *tb[])

static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb,
                                             struct net *src_net,
					     struct nlattr *tb[], int cap)

to simplify permission checks and target network namespace retrieval for
RTM_* requests that already support IFLA_NET_NS_{FD,PID} but get extended
to IFLA_IF_NETNSID. To perserve backwards compatibility the helpers look
for IFLA_NET_NS_{FD,PID} properties first before checking for
IFLA_IF_NETNSID.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:31:06 -05:00
Christoph Hellwig
1e369b0e19 xfs: remove experimental tag for reflinks
But reject reflink + DAX file systems for now until the code to
support reflinks on DAX is actually implemented.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: port to 4.16]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29 07:27:24 -08:00