With 4.19, programs like ebtables fail to build when they include
"linux/netfilter_bridge.h". It is caused by commit 94276fa8a2 which
added a use of INT_MIN and INT_MAX to the header:
: In file included from /usr/include/linux/netfilter_bridge/ebtables.h:18,
: from include/ebtables_u.h:28,
: from communication.c:23:
: /usr/include/linux/netfilter_bridge.h:30:20: error: 'INT_MIN' undeclared here (not in a function)
: NF_BR_PRI_FIRST = INT_MIN,
: ^~~~~~~
Define these constants by including "limits.h" when !__KERNEL__ (the
same way as for other netfilter_* headers).
Fixes: 94276fa8a2 ("netfilter: bridge: Expose nf_tables bridge hook priorities through uapi")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value that is in use.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.
Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are
obtained one by one from the mm free list via the regular allocation
function.
Host requests the guest to report free page hints by sending a new cmd id
to the guest via the free_page_report_cmd_id configuration register. When
the guest starts to report, it first sends a start cmd to host via the
free page vq, which acks to host the cmd id received. When the guest
finishes reporting free pages, a stop cmd is sent to host via the vq.
Host may also send a stop cmd id to the guest to stop the reporting.
VIRTIO_BALLOON_CMD_ID_STOP: Host sends this cmd to stop the guest
reporting.
VIRTIO_BALLOON_CMD_ID_DONE: Host sends this cmd to tell the guest that
the reported pages are ready to be freed.
Why does the guest free the reported pages when host tells it is ready to
free?
This is because freeing pages appears to be expensive for live migration.
free_pages() dirties memory very quickly and makes the live migraion not
converge in some cases. So it is good to delay the free_page operation
when the migration is done, and host sends a command to guest about that.
Why do we need the new VIRTIO_BALLOON_CMD_ID_DONE, instead of reusing
VIRTIO_BALLOON_CMD_ID_STOP?
This is because live migration is usually done in several rounds. At the
end of each round, host needs to send a VIRTIO_BALLOON_CMD_ID_STOP cmd to
the guest to stop (or say pause) the reporting. The guest resumes the
reporting when it receives a new command id at the beginning of the next
round. So we need a new cmd id to distinguish between "stop reporting" and
"ready to free the reported pages".
TODO:
- Add a batch page allocation API to amortize the allocation overhead.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When doing a route dump across all address families, do not error out
if the table does not exist. This allows a route dump for AF_UNSPEC
with a table id that may only exist for some of the families.
Do return the table does not exist error if dumping routes for a
specific family and the table does not exist.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no reason to track the atomic state three times. Unfortunately,
this is currently what we're doing, and even worse is that there is only
one actually correct state pointer: the one in mst_state->base.state.
mgr->state never seems to be used, along with the one in
mst_state->state.
This confused me for over 4 hours until I realized there was no magic
behind these pointers. So, let's save everyone else from the trouble.
Signed-off-by: Lyude Paul <lyude@redhat.com>.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20181023231251.16883-3-lyude@redhat.com
4.19 is out, Lyude asked for a backmerge, and it's been a while. All
very good reasons on their own :-)
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Pull documentation updates from Jonathan Corbet:
"This is a fairly typical cycle for documentation. There's some welcome
readability improvements for the formatted output, some LICENSES
updates including the addition of the ISC license, the removal of the
unloved and unmaintained 00-INDEX files, the deprecated APIs document
from Kees, more MM docs from Mike Rapoport, and the usual pile of typo
fixes and corrections"
* tag 'docs-4.20' of git://git.lwn.net/linux: (41 commits)
docs: Fix typos in histogram.rst
docs: Introduce deprecated APIs list
kernel-doc: fix declaration type determination
doc: fix a typo in adding-syscalls.rst
docs/admin-guide: memory-hotplug: remove table of contents
doc: printk-formats: Remove bogus kobject references for device nodes
Documentation: preempt-locking: Use better example
dm flakey: Document "error_writes" feature
docs/completion.txt: Fix a couple of punctuation nits
LICENSES: Add ISC license text
LICENSES: Add note to CDDL-1.0 license that it should not be used
docs/core-api: memory-hotplug: add some details about locking internals
docs/core-api: rename memory-hotplug-notifier to memory-hotplug
docs: improve readability for people with poorer eyesight
yama: clarify ptrace_scope=2 in Yama documentation
docs/vm: split memory hotplug notifier description to Documentation/core-api
docs: move memory hotplug description into admin-guide/mm
doc: Fix acronym "FEKEK" in ecryptfs
docs: fix some broken documentation references
iommu: Fix passthrough option documentation
...
Pull ext4 updates from Ted Ts'o:
- further restructure ext4 documentation
- fix up ext4's delayed allocation for bigalloc file systems
- fix up some syzbot-detected races in EXT4_IOC_MOVE_EXT,
EXT4_IOC_SWAP_BOOT, and ext4_remount
- ... and a few other miscellaneous bugs and optimizations.
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
ext4: fix use-after-free race in ext4_remount()'s error path
ext4: cache NULL when both default_acl and acl are NULL
docs: promote the ext4 data structures book to top level
docs: move ext4 administrative docs to admin-guide/
jbd2: fix use after free in jbd2_log_do_checkpoint()
ext4: propagate error from dquot_initialize() in EXT4_IOC_FSSETXATTR
ext4: fix setattr project check in fssetxattr ioctl
docs: make ext4 readme tables readable
docs: fix ext4 documentation table formatting problems
docs: generate a separate ext4 pdf file from the documentation
ext4: convert fault handler to use vm_fault_t type
ext4: initialize retries variable in ext4_da_write_inline_data_begin()
ext4: fix EXT4_IOC_SWAP_BOOT
ext4: fix build error when DX_DEBUG is defined
ext4: fix argument checking in EXT4_IOC_MOVE_EXT
ext4: fix reserved cluster accounting at page invalidation time
ext4: adjust reserved cluster count when removing extents
ext4: reduce reserved cluster count by number of allocated clusters
ext4: fix reserved cluster accounting at delayed write time
ext4: add new pending reservation mechanism
...
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've added 1) superblock checksum feature, 2)
implemented new mount option which we can disable/enable checkpoint to
provide atomic updates of entire filesystem, 3) refactored quota
operations to enhance its consistency along with checkpoint, 4) fixed
subtle IO hang conditions and roll-forward recovery flow to resurrect
any fsync'ed inode metadata.
Enhancements:
- add checksum to keep superblock contents more safe
- add checkpoint=disable/enable to support A/B update of entire filesystem
- use plug for readahead IO in readdir
- add more IO counts to avoid block layer hacks
Bug fixes:
- prevent data corruption issue for hardware encryption
- fix IO hang issues when GC is heavily triggered
- add missing up_read in __write_node_page
- recover inode metadata during roll-forward recovery flow
- fix null pointer dereference issue in wrongly configured discard map
There are some more sanity checks and minor bug fixes as well"
* tag 'f2fs-for-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (62 commits)
f2fs: fix to keep project quota consistent
f2fs: guarantee journalled quota data by checkpoint
f2fs: cleanup dirty pages if recover failed
f2fs: fix data corruption issue with hardware encryption
f2fs: fix to recover inode->i_flags of inode block during POR
f2fs: spread f2fs_set_inode_flags()
f2fs: fix to spread clear_cold_data()
Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()"
f2fs: account read IOs and use IO counts for is_idle
f2fs: fix to account IO correctly for cgroup writeback
f2fs: fix to account IO correctly
f2fs: remove request_list check in is_idle()
f2fs: allow to mount, if quota is failed
f2fs: update REQ_TIME in f2fs_cross_rename()
f2fs: do not update REQ_TIME in case of error conditions
f2fs: remove unneeded disable_nat_bits()
f2fs: remove unused sbi->trigger_ssr_threshold
f2fs: shrink sbi->sb_lock coverage in set_file_temperature()
f2fs: use rb_*_cached friends
f2fs: fix to recover cold bit of inode block during POR
...
Pul xfs updates from Dave Chinner:
"There's not a huge amount of change in this cycle - Darrick has been
out of action for a couple of months (hence me sending the last few
pull requests), so we decided a quiet cycle mainly focussed on bug
fixes was a good idea. Darrick will take the helm again at the end of
this merge window.
FYI, I may be sending another update later in the cycle - there's a
pending rework of the clone/dedupe_file_range code that fixes numerous
bugs that is spread amongst the VFS, XFS and ocfs2 code. It has been
reviewed and tested, Al and I just need to work out the details of the
merge, so it may come from him rather than me.
Summary:
- only support filesystems with unwritten extents
- add definition for statfs XFS magic number
- remove unused parameters around reflink code
- more debug for dangling delalloc extents
- cancel COW extents on extent swap targets
- fix quota stats output and clean up the code
- refactor some of the attribute code in preparation for parent
pointers
- fix several buffer handling bugs"
* tag 'xfs-4.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (21 commits)
xfs: cancel COW blocks before swapext
xfs: clear ail delwri queued bufs on unmount of shutdown fs
xfs: use offsetof() in place of offset macros for __xfsstats
xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat
xfs: fix use-after-free race in xfs_buf_rele
xfs: Add attibute remove and helper functions
xfs: Add attibute set and helper functions
xfs: Add helper function xfs_attr_try_sf_addname
xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
xfs: issue log message on user force shutdown
xfs: fix buffer state management in xrep_findroot_block
xfs: always assign buffer verifiers when one is provided
xfs: xrep_findroot_block should reject root blocks with siblings
xfs: add a define for statfs magic to uapi
xfs: print dangling delalloc extents
xfs: fix fork selection in xfs_find_trim_cow_extent
xfs: remove the unused trimmed argument from xfs_reflink_trim_around_shared
xfs: remove the unused shared argument to xfs_reflink_reserve_cow
xfs: handle zeroing in xfs_file_iomap_begin_delay
xfs: remove suport for filesystems without unwritten extent flag
...
Pull btrfs updates from David Sterba:
"This is the first batch with fixes and some nice performance
improvements.
Preliminary results show eg. more files/sec in fsmark, better perf on
multi-threaded workloads (filebench, dbench), fewer context switches
and overall better memory allocation characteristics (multiple
benchmarks).
Apart from general performance, there's an improvement for qgroups +
balance workload that's been troubling our users.
Note for stable: there are 20+ patches tagged for stable, out of 90.
Not all of them apply cleanly on all stable versions but the conflicts
are mostly due to simple cleanups and resolving should be obvious. The
fixes are otherwise independent.
Performance improvements:
- transition between blocking and spinning modes of path is gone,
which originally resulted to more unnecessary wakeups and updates
to the path locks, the effects are measurable and improve latency
and scalability
- qgroups: first batch of changes that should speedup balancing with
qgroups on, skip quota accounting on unchanged subtrees, overall
gain is about 30+% in runtime
- use rb-tree with cached first node for several structures, small
improvement to avoid pointer chasing
Fixes:
- trim
- fix: some blockgroups could have been missed if their logical
address was past the total filesystem size (ie. after a lot of
balancing)
- better error reporting, after processing blockgroups and whole
device
- fix: continue trimming block groups after an error is
encountered
- check for trim support of the device earlier and avoid some
unnecessary work
- less interaction with transaction commit that improves latency
on slower storage (eg. image files over NFS)
- fsync
- fix warning when replaying log after fsync of a O_TMPFILE
- fix wrong dentries after fsync of file that got its parent
replaced
- qgroups: fix rescan that might misc some dirty groups
- don't clean dirty pages during buffered writes, this could lead to
lost updates in some corner cases
- some block groups could have been delayed in creation, if the
allocation triggered another one
- error handling improvements
Cleanups:
- removed unused struct members and variables
- function return type cleanups
- delayed refs code refactoring
- protect against deadlock that could be caused by crafted image that
tries to allocate from a tree that's locked already"
* tag 'for-4.20-part1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (93 commits)
btrfs: switch return_bigger to bool in find_ref_head
btrfs: remove fs_info from btrfs_should_throttle_delayed_refs
btrfs: remove fs_info from btrfs_check_space_for_delayed_refs
btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock
btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_head
btrfs: qgroup: move the qgroup->members check out from (!qgroup)'s else branch
btrfs: relocation: Remove redundant tree level check
btrfs: relocation: Cleanup while loop using rbtree_postorder_for_each_entry_safe
btrfs: qgroup: Avoid calling qgroup functions if qgroup is not enabled
Btrfs: fix wrong dentries after fsync of file that got its parent replaced
Btrfs: fix warning when replaying log after fsync of a tmpfile
btrfs: drop min_size from evict_refill_and_join
btrfs: assert on non-empty delayed iputs
btrfs: make sure we create all new block groups
btrfs: reset max_extent_size on clear in a bitmap
btrfs: protect space cache inode alloc with GFP_NOFS
btrfs: release metadata before running delayed refs
Btrfs: kill btrfs_clear_path_blocking
btrfs: dev-replace: remove pointless assert in write unlock
btrfs: dev-replace: move replace members out of fs_info
...
The DSI devices have a maximum operating frequency specified
in their data sheet per the MIPI specification, and DSI hosts
that can scale their frequency need this information to set
their clock dividers right.
As current panel drivers often lack this information, specify
that setting it to zero will make the DSI host use some
reasonable default.
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20181023072422.25754-1-linus.walleij@linaro.org
Pull tty ioctl updates from Al Viro:
"This is the compat_ioctl work related to tty ioctls.
Quite a bit of dead code taken out, all tty-related stuff gone from
fs/compat_ioctl.c. A bunch of compat bugs fixed - some still remain,
but all more or less generic tty-related ioctls should be covered
(remaining issues are in things like driver-private ioctls in a pcmcia
serial card driver not getting properly handled in 32bit processes on
64bit host, etc)"
* 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (53 commits)
kill TIOCSERGSTRUCT
change semantics of ldisc ->compat_ioctl()
kill TIOCSER[SG]WILD
synclink_gt(): fix compat_ioctl()
pty: fix compat ioctls
compat_ioctl - kill keyboard ioctl handling
gigaset: add ->compat_ioctl()
vt_compat_ioctl(): clean up, use compat_ptr() properly
gigaset: don't try to printk userland buffer contents
dgnc: don't bother with (empty) stub for TCXONC
dgnc: leave TIOC[GS]SOFTCAR to ldisc
remove fallback to drivers for TIOCGICOUNT
dgnc: break-related ioctls won't reach ->ioctl()
kill the rest of tty COMPAT_IOCTL() entries
dgnc: TIOCM... won't reach ->ioctl()
isdn_tty: TCSBRK{,P} won't reach ->ioctl()
kill capinc_tty_ioctl()
take compat TIOC[SG]SERIAL treatment into tty_compat_ioctl()
synclink: reduce pointless checks in ->ioctl()
complete ->[sg]et_serial() switchover
...
Pull pstore updates from Kees Cook:
"pstore improvements:
- refactor init to happen as early as possible again (Joel Fernandes)
- improve resource reservation names"
* tag 'pstore-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Clarify resource reservation labels
pstore: Refactor compression initialization
pstore: Allocate compression during late_initcall()
pstore: Centralize init/exit routines
Pull security subsystem updates from James Morris:
"In this patchset, there are a couple of minor updates, as well as some
reworking of the LSM initialization code from Kees Cook (these prepare
the way for ordered stackable LSMs, but are a valuable cleanup on
their own)"
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
LSM: Don't ignore initialization failures
LSM: Provide init debugging infrastructure
LSM: Record LSM name in struct lsm_info
LSM: Convert security_initcall() into DEFINE_LSM()
vmlinux.lds.h: Move LSM_TABLE into INIT_DATA
LSM: Convert from initcall to struct lsm_info
LSM: Remove initcall tracing
LSM: Rename .security_initcall section to .lsm_info
vmlinux.lds.h: Avoid copy/paste of security_init section
LSM: Correctly announce start of LSM initialization
security: fix LSM description location
keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h
seccomp: remove unnecessary unlikely()
security: tomoyo: Fix obsolete function
security/capabilities: remove check for -EINVAL
Pull siginfo updates from Eric Biederman:
"I have been slowly sorting out siginfo and this is the culmination of
that work.
The primary result is in several ways the signal infrastructure has
been made less error prone. The code has been updated so that manually
specifying SEND_SIG_FORCED is never necessary. The conversion to the
new siginfo sending functions is now complete, which makes it
difficult to send a signal without filling in the proper siginfo
fields.
At the tail end of the patchset comes the optimization of decreasing
the size of struct siginfo in the kernel from 128 bytes to about 48
bytes on 64bit. The fundamental observation that enables this is by
definition none of the known ways to use struct siginfo uses the extra
bytes.
This comes at the cost of a small user space observable difference.
For the rare case of siginfo being injected into the kernel only what
can be copied into kernel_siginfo is delivered to the destination, the
rest of the bytes are set to 0. For cases where the signal and the
si_code are known this is safe, because we know those bytes are not
used. For cases where the signal and si_code combination is unknown
the bits that won't fit into struct kernel_siginfo are tested to
verify they are zero, and the send fails if they are not.
I made an extensive search through userspace code and I could not find
anything that would break because of the above change. If it turns out
I did break something it will take just the revert of a single change
to restore kernel_siginfo to the same size as userspace siginfo.
Testing did reveal dependencies on preferring the signo passed to
sigqueueinfo over si->signo, so bit the bullet and added the
complexity necessary to handle that case.
Testing also revealed bad things can happen if a negative signal
number is passed into the system calls. Something no sane application
will do but something a malicious program or a fuzzer might do. So I
have fixed the code that performs the bounds checks to ensure negative
signal numbers are handled"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
signal: Guard against negative signal numbers in copy_siginfo_from_user32
signal: Guard against negative signal numbers in copy_siginfo_from_user
signal: In sigqueueinfo prefer sig not si_signo
signal: Use a smaller struct siginfo in the kernel
signal: Distinguish between kernel_siginfo and siginfo
signal: Introduce copy_siginfo_from_user and use it's return value
signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
signal: Fail sigqueueinfo if si_signo != sig
signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
signal/unicore32: Use force_sig_fault where appropriate
signal/unicore32: Generate siginfo in ucs32_notify_die
signal/unicore32: Use send_sig_fault where appropriate
signal/arc: Use force_sig_fault where appropriate
signal/arc: Push siginfo generation into unhandled_exception
signal/ia64: Use force_sig_fault where appropriate
signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
signal/ia64: Use the generic force_sigsegv in setup_frame
signal/arm/kvm: Use send_sig_mceerr
signal/arm: Use send_sig_fault where appropriate
signal/arm: Use force_sig_fault where appropriate
...
Pull networking updates from David Miller:
1) Add VF IPSEC offload support in ixgbe, from Shannon Nelson.
2) Add zero-copy AF_XDP support to i40e, from Björn Töpel.
3) All in-tree drivers are converted to {g,s}et_link_ksettings() so we
can get rid of the {g,s}et_settings ethtool callbacks, from Michal
Kubecek.
4) Add software timestamping to veth driver, from Michael Walle.
5) More work to make packet classifiers and actions lockless, from Vlad
Buslov.
6) Support sticky FDB entries in bridge, from Nikolay Aleksandrov.
7) Add ipv6 version of IP_MULTICAST_ALL sockopt, from Andre Naujoks.
8) Support batching of XDP buffers in vhost_net, from Jason Wang.
9) Add flow dissector BPF hook, from Petar Penkov.
10) i40e vf --> generic iavf conversion, from Jesse Brandeburg.
11) Add NLA_REJECT netlink attribute policy type, to signal when users
provide attributes in situations which don't make sense. From
Johannes Berg.
12) Switch TCP and fair-queue scheduler over to earliest departure time
model. From Eric Dumazet.
13) Improve guest receive performance by doing rx busy polling in tx
path of vhost networking driver, from Tonghao Zhang.
14) Add per-cgroup local storage to bpf
15) Add reference tracking to BPF, from Joe Stringer. The verifier can
now make sure that references taken to objects are properly released
by the program.
16) Support in-place encryption in TLS, from Vakul Garg.
17) Add new taprio packet scheduler, from Vinicius Costa Gomes.
18) Lots of selftests additions, too numerous to mention one by one here
but all of which are very much appreciated.
19) Support offloading of eBPF programs containing BPF to BPF calls in
nfp driver, frm Quentin Monnet.
20) Move dpaa2_ptp driver out of staging, from Yangbo Lu.
21) Lots of u32 classifier cleanups and simplifications, from Al Viro.
22) Add new strict versions of netlink message parsers, and enable them
for some situations. From David Ahern.
23) Evict neighbour entries on carrier down, also from David Ahern.
24) Support BPF sk_msg verdict programs with kTLS, from Daniel Borkmann
and John Fastabend.
25) Add support for filtering route dumps, from David Ahern.
26) New igc Intel driver for 2.5G parts, from Sasha Neftin et al.
27) Allow vxlan enslavement to bridges in mlxsw driver, from Ido
Schimmel.
28) Add queue and stack map types to eBPF, from Mauricio Vasquez B.
29) Add back byte-queue-limit support to r8169, with all the bug fixes
in other areas of the driver it works now! From Florian Westphal and
Heiner Kallweit.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2147 commits)
tcp: add tcp_reset_xmit_timer() helper
qed: Fix static checker warning
Revert "be2net: remove desc field from be_eq_obj"
Revert "net: simplify sock_poll_wait"
net: socionext: Reset tx queue in ndo_stop
net: socionext: Add dummy PHY register read in phy_write()
net: socionext: Stop PHY before resetting netsec
net: stmmac: Set OWN bit for jumbo frames
arm64: dts: stratix10: Support Ethernet Jumbo frame
tls: Add maintainers
net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode
octeontx2-af: Support for NIXLF's UCAST/PROMISC/ALLMULTI modes
octeontx2-af: Support for setting MAC address
octeontx2-af: Support for changing RSS algorithm
octeontx2-af: NIX Rx flowkey configuration for RSS
octeontx2-af: Install ucast and bcast pkt forwarding rules
octeontx2-af: Add LMAC channel info to NIXLF_ALLOC response
octeontx2-af: NPC MCAM and LDATA extract minimal configuration
octeontx2-af: Enable packet length and csum validation
octeontx2-af: Support for VTAG strip and capture
...
With EDT model, SRTT no longer is inflated by pacing delays.
This means that RTO and some other xmit timers might be setup
incorrectly. This is particularly visible with either :
- Very small enforced pacing rates (SO_MAX_PACING_RATE)
- Reduced rto (from the default 200 ms)
This can lead to TCP flows aborts in the worst case,
or spurious retransmits in other cases.
For example, this session gets far more throughput
than the requested 80kbit :
$ netperf -H 127.0.0.2 -l 100 -- -q 10000
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.2 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
540000 262144 262144 104.00 2.66
With the fix :
$ netperf -H 127.0.0.2 -l 100 -- -q 10000
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.2 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
540000 262144 262144 104.00 0.12
EDT allows for better control of rtx timers, since TCP has
a better idea of the earliest departure time of each skb
in the rtx queue. We only have to eventually add to the
timer the difference of the EDT time with current time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send probes to all the unprobed fileservers in a fileserver list on all
addresses simultaneously in an attempt to find out the fastest route whilst
not getting stuck for 20s on any server or address that we don't get a
reply from.
This alleviates the problem whereby attempting to access a new server can
take a long time because the rotation algorithm ends up rotating through
all servers and addresses until it finds one that responds.
Signed-off-by: David Howells <dhowells@redhat.com>
Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client. These implement upgraded services on
the same port as their AFS services.
YFS fileservers provide expanded capabilities over AFS.
Signed-off-by: David Howells <dhowells@redhat.com>
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.
This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious. It also requires
this data to be placed into the vnode cache key for fscache.
For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.
Signed-off-by: David Howells <dhowells@redhat.com>
afs_extract_data sets up a temporary iov_iter and passes it to AF_RXRPC
each time it is called to describe the remaining buffer to be filled.
Instead:
(1) Put an iterator in the afs_call struct.
(2) Set the iterator for each marshalling stage to load data into the
appropriate places. A number of convenience functions are provided to
this end (eg. afs_extract_to_buf()).
This iterator is then passed to afs_extract_data().
(3) Use the new ITER_DISCARD iterator to discard any excess data provided
by FetchData.
Signed-off-by: David Howells <dhowells@redhat.com>
Include the site of detection of AFS protocol errors in trace lines to
better be able to determine what went wrong.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a new iterator, ITER_DISCARD, that can only be used in READ mode and
just discards any data copied to it.
This is useful in a network filesystem for discarding any unwanted data
sent by a server.
Signed-off-by: David Howells <dhowells@redhat.com>
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.
Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements. This makes it easier to add further
iterator types. Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.
Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself. Only the direction is required.
Signed-off-by: David Howells <dhowells@redhat.com>
Use accessor functions to access an iterator's type and direction. This
allows for the possibility of using some other method of determining the
type of iterator than if-chains with bitwise-AND conditions.
Signed-off-by: David Howells <dhowells@redhat.com>
Pull x86 vdso updates from Ingo Molnar:
"Two main changes:
- Cleanups, simplifications and CLOCK_TAI support (Thomas Gleixner)
- Improve code generation (Andy Lutomirski)"
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Rearrange do_hres() to improve code generation
x86/vdso: Document vgtod_ts better
x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks
x66/vdso: Add CLOCK_TAI support
x86/vdso: Move cycle_last handling into the caller
x86/vdso: Simplify the invalid vclock case
x86/vdso: Replace the clockid switch case
x86/vdso: Collapse coarse functions
x86/vdso: Collapse high resolution functions
x86/vdso: Introduce and use vgtod_ts
x86/vdso: Use unsigned int consistently for vsyscall_gtod_data:: Seq
x86/vdso: Enforce 64bit clocksource
x86/time: Implement clocksource_arch_init()
clocksource: Provide clocksource_arch_init()
This reverts commit dd979b4df8.
This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an
internal TCP socket for the initial handshake with the remote peer.
Whenever the SMC connection can not be established this TCP socket is
used as a fallback. All socket operations on the SMC socket are then
forwarded to the TCP socket. In case of poll, the file->private_data
pointer references the SMC socket because the TCP socket has no file
assigned. This causes tcp_poll to wait on the wrong socket.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 pti updates from Ingo Molnar:
"The main changes:
- Make the IBPB barrier more strict and add STIBP support (Jiri
Kosina)
- Micro-optimize and clean up the entry code (Andy Lutomirski)
- ... plus misc other fixes"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Propagate information about RSB filling mitigation to sysfs
x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION
x86/pti/64: Remove the SYSCALL64 entry trampoline
x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space
x86/entry/64: Document idtentry
Pull x86 paravirt updates from Ingo Molnar:
"Two main changes:
- Remove no longer used parts of the paravirt infrastructure and put
large quantities of paravirt ops under a new config option
PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross)
- Enable PV spinlocks on Hyperv (Yi Sun)"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Enable PV qspinlock for Hyper-V
x86/hyperv: Add GUEST_IDLE_MSR support
x86/paravirt: Clean up native_patch()
x86/paravirt: Prevent redefinition of SAVE_FLAGS macro
x86/xen: Make xen_reservation_lock static
x86/paravirt: Remove unneeded mmu related paravirt ops bits
x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella
x86/paravirt: Introduce new config option PARAVIRT_XXL
x86/paravirt: Remove unused paravirt bits
x86/paravirt: Use a single ops structure
x86/paravirt: Remove clobbers from struct paravirt_patch_site
x86/paravirt: Remove clobbers parameter from paravirt patch functions
x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static
x86/xen: Add SPDX identifier in arch/x86/xen files
x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM
x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c
x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
We no longer need to worry about whether or not the entry is hashed in
order to figure out if the contents are valid. We only care whether or
not the refcount is non-zero.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pull x86 mm updates from Ingo Molnar:
"Lots of changes in this cycle:
- Lots of CPA (change page attribute) optimizations and related
cleanups (Thomas Gleixner, Peter Zijstra)
- Make lazy TLB mode even lazier (Rik van Riel)
- Fault handler cleanups and improvements (Dave Hansen)
- kdump, vmcore: Enable kdumping encrypted memory with AMD SME
enabled (Lianbo Jiang)
- Clean up VM layout documentation (Baoquan He, Ingo Molnar)
- ... plus misc other fixes and enhancements"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry()
x86/mm: Kill stray kernel fault handling comment
x86/mm: Do not warn about PCI BIOS W+X mappings
resource: Clean it up a bit
resource: Fix find_next_iomem_res() iteration issue
resource: Include resource end in walk_*() interfaces
x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
x86/mm: Remove spurious fault pkey check
x86/mm/vsyscall: Consider vsyscall page part of user address space
x86/mm: Add vsyscall address helper
x86/mm: Fix exception table comments
x86/mm: Add clarifying comments for user addr space
x86/mm: Break out user address space handling
x86/mm: Break out kernel address space handling
x86/mm: Clarify hardware vs. software "error_code"
x86/mm/tlb: Make lazy TLB mode lazier
x86/mm/tlb: Add freed_tables element to flush_tlb_info
x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range
smp,cpumask: introduce on_each_cpu_cond_mask
smp: use __cpumask_set_cpu in on_each_cpu_cond
...
Pull x86 cpu updates from Ingo Molnar:
"The main changes in this cycle were:
- Add support for the "Dhyana" x86 CPUs by Hygon: these are licensed
based on the AMD Zen architecture, and are built and sold in China,
for domestic datacenter use. The code is pretty close to AMD
support, mostly with a few quirks and enumeration differences. (Pu
Wen)
- Enable CPUID support on Cyrix 6x86/6x86L processors"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/cpupower: Add Hygon Dhyana support
cpufreq: Add Hygon Dhyana support
ACPI: Add Hygon Dhyana support
x86/xen: Add Hygon Dhyana support to Xen
x86/kvm: Add Hygon Dhyana support to KVM
x86/mce: Add Hygon Dhyana support to the MCA infrastructure
x86/bugs: Add Hygon Dhyana to the respective mitigation machinery
x86/apic: Add Hygon Dhyana support
x86/pci, x86/amd_nb: Add Hygon Dhyana support to PCI and northbridge
x86/amd_nb: Check vendor in AMD-only functions
x86/alternative: Init ideal_nops for Hygon Dhyana
x86/events: Add Hygon Dhyana support to PMU infrastructure
x86/smpboot: Do not use BSP INIT delay and MWAIT to idle on Dhyana
x86/cpu/mtrr: Support TOP_MEM2 and get MTRR number
x86/cpu: Get cache info and setup cache cpumap for Hygon Dhyana
x86/cpu: Create Hygon Dhyana architecture support file
x86/CPU: Change query logic so CPUID is enabled before testing
x86/CPU: Use correct macros for Cyrix calls
Pull x86 apic updates from Ingo Molnar:
"Improve the spreading of managed IRQs at allocation time"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/matrix: Spread managed interrupts on allocation
irq/matrix: Split out the CPU selection code into a helper
The way our hardware is designed doesn't seem to let us use the
MI_RECORD_PERF_COUNT command without setting up a circular buffer.
In the case where the user didn't request OA reports to be available
through the i915 perf stream, we can set the OA buffer to the minimum
size to avoid consuming memory which won't be used by the driver.
v2: Simplify oa buffer size exponent selection (Chris)
Reuse vma size field (Lionel)
v3: Restrict size opening parameter to values supported by HW (Chris)
v4: Drop out of date comment (Matt)
Add debug message when buffer size is rejected (Matt)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181023100707.31738-5-lionel.g.landwerlin@intel.com
Pull scheduler updates from Ingo Molnar:
"The main changes are:
- Migrate CPU-intense 'misfit' tasks on asymmetric capacity systems,
to better utilize (much) faster 'big core' CPUs. (Morten Rasmussen,
Valentin Schneider)
- Topology handling improvements, in particular when CPU capacity
changes and related load-balancing fixes/improvements (Morten
Rasmussen)
- ... plus misc other improvements, fixes and updates"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
sched/completions/Documentation: Add recommendation for dynamic and ONSTACK completions
sched/completions/Documentation: Clean up the document some more
sched/completions/Documentation: Fix a couple of punctuation nits
cpu/SMT: State SMT is disabled even with nosmt and without "=force"
sched/core: Fix comment regarding nr_iowait_cpu() and get_iowait_load()
sched/fair: Remove setting task's se->runnable_weight during PELT update
sched/fair: Disable LB_BIAS by default
sched/pelt: Fix warning and clean up IRQ PELT config
sched/topology: Make local variables static
sched/debug: Use symbolic names for task state constants
sched/numa: Remove unused numa_stats::nr_running field
sched/numa: Remove unused code from update_numa_stats()
sched/debug: Explicitly cast sched_feat() to bool
sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains
sched/fair: Don't move tasks to lower capacity CPUs unless necessary
sched/fair: Set rq->rd->overload when misfit
sched/fair: Wrap rq->rd->overload accesses with READ/WRITE_ONCE()
sched/core: Change root_domain->overload type to int
sched/fair: Change 'prefer_sibling' type to bool
sched/fair: Kick nohz balance if rq->misfit_task_load
...
Pull locking and misc x86 updates from Ingo Molnar:
"Lots of changes in this cycle - in part because locking/core attracted
a number of related x86 low level work which was easier to handle in a
single tree:
- Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
McKenney, Andrea Parri)
- lockdep scalability improvements and micro-optimizations (Waiman
Long)
- rwsem improvements (Waiman Long)
- spinlock micro-optimization (Matthew Wilcox)
- qspinlocks: Provide a liveness guarantee (more fairness) on x86.
(Peter Zijlstra)
- Add support for relative references in jump tables on arm64, x86
and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)
- Be a lot less permissive on weird (kernel address) uaccess faults
on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
Horn)
- macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
Amit)
- ... and a handful of other smaller changes as well"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
locking/lockdep: Make global debug_locks* variables read-mostly
locking/lockdep: Fix debug_locks off performance problem
locking/pvqspinlock: Extend node size when pvqspinlock is configured
locking/qspinlock_stat: Count instances of nested lock slowpaths
locking/qspinlock, x86: Provide liveness guarantee
x86/asm: 'Simplify' GEN_*_RMWcc() macros
locking/qspinlock: Rework some comments
locking/qspinlock: Re-order code
locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
futex: Replace spin_is_locked() with lockdep
locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
x86/refcount: Work around GCC inlining bug
x86/objtool: Use asm macros to work around GCC inlining bugs
...
Pull EFI updates from Ingo Molnar:
"The main changes are:
- Add support for enlisting the help of the EFI firmware to create
memory reservations that persist across kexec.
- Add page fault handling to the runtime services support code on x86
so we can more gracefully recover from buggy EFI firmware.
- Fix command line handling on x86 for the boot path that omits the
stub's PE/COFF entry point.
- Other assorted fixes and updates"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: boot: Fix EFI stub alignment
efi/x86: Call efi_parse_options() from efi_main()
efi/x86: earlyprintk - Add 64bit efi fb address support
efi/x86: drop task_lock() from efi_switch_mm()
efi/x86: Handle page faults occurring while running EFI runtime services
efi: Make efi_rts_work accessible to efi page fault handler
efi/efi_test: add exporting ResetSystem runtime service
efi/libstub: arm: support building with clang
efi: add API to reserve memory persistently across kexec reboot
efi/arm: libstub: add a root memreserve config table
efi: honour memory reservations passed via a linux specific config table