Commit Graph

360735 Commits

Author SHA1 Message Date
Alex Elder
87f979d390 ceph: kill ceph_osdc_writepages() "nofail" parameter
There is only one caller of ceph_osdc_writepages(), and it always
passes the value true as its "nofail" argument.  Get rid of that
argument and replace its use in ceph_osdc_writepages() with the
constant value true.

This and a number of cleanup patches that follow resolve:
    http://tracker.ceph.com/issues/4126

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18 12:19:22 -06:00
Alex Elder
e7e319a9c5 libceph: improve packing in struct ceph_osd_req_op
The layout of struct ceph_osd_req_op leaves lots of holes.
Rearranging things a little for better field alignment
reduces the size by a third.

This resolves:
    http://tracker.ceph.com/issues/4163

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18 12:19:07 -06:00
Jiri Pirko
dca9ab9274 MAINTAINERS: Jiri Pirko email change
Change email for team driver maintainership.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 13:12:30 -05:00
Linus Torvalds
7c45512df9 mm: fix pageblock bitmap allocation
Commit c060f943d0 ("mm: use aligned zone start for pfn_to_bitidx
calculation") fixed out calculation of the index into the pageblock
bitmap when a !SPARSEMEM zome was not aligned to pageblock_nr_pages.

However, the _allocation_ of that bitmap had never taken this alignment
requirement into accout, so depending on the exact size and alignment of
the zone, the use of that index could then access past the allocation,
resulting in some very subtle memory corruption.

This was reported (and bisected) by Ingo Molnar: one of his random
config builds would hang with certain very specific kernel command line
options.

In the meantime, commit c060f943d0 has been marked for stable, so this
fix needs to be back-ported to the stable kernels that backported the
commit to use the right alignment.

Bisected-and-tested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-18 09:58:02 -08:00
Joe Perches
4153577a8d tg3: Use different macros for pci_chip_rev_id accesses
Upper case macros for various chip attributes are slightly
difficult to read and are a bit out of characterto the other
tg3_<foo> attribute functions.

Convert:

GET_ASIC_REV(tp->pci_chip_rev_id)       -> tg3_asic_rev(tp)
GET_CHIP_REV(tp->pci_chip_rev_id)       -> tg3_chip_rev(tp)

Remove:
GET_METAL_REV(tp->pci_chip_rev_id)      -> tg3_metal_rev(tp) (unused)

Add:
tg3_chip_rev_id(tp) for tp->pci_chip_rev_id so access styles
are similar to tg3_asic_rev and tg3_chip_rev.

These macros are not converted to static inline functions
because gcc (tested with 4.7.2) is currently unable to
optimize the object code it produces the same way and code
is otherwise larger.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:45:53 -05:00
Joe Perches
717ff727fa tg3: Remove define and single use of GET_CHIP_REV_ID
It's the same value as tp->pci_chip_rev_id so use that
instead.  This makes all CHIPREV_ID_<foo> tests the same.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:45:52 -05:00
Hauke Mehrtens
885d299414 bgmac: fix unaligned accesses to network headers
Without this patch I get many unaligned access warnings per packet,
this patches fixes them all. This should improve performance on some
systems like mips.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:45:09 -05:00
stephen hemminger
9aac22deb1 ip: fix warning in xfrm4_mode_tunnel_input
Same problem as IPv6

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:42:48 -05:00
stephen hemminger
0d4bfa297c ipv6: fix warning in xfrm6_mode_tunnel_input
Should not use assignment in conditional:
 warning: suggest parentheses around assignment used as truth value [-Wparentheses]

Problem introduced by:
commit 14bbd6a565
Author: Pravin B Shelar <pshelar@nicira.com>
Date:   Thu Feb 14 09:44:49 2013 +0000

    net: Add skb_unclone() helper function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:42:47 -05:00
David S. Miller
2f219d5fb1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
This series contains updates to e1000, e1000e, igb, igbvf and ixgbe.
The e1000, e1000e, igb and igbvf are single patch changes and the
remaining 11 patches are all against ixgbe.

The e1000 patch is a comment cleanup to align e1000 with the code
commenting style for /drivers/net.  It also contains a few other white
space cleanups (i.e. fix lines over 80 char, remove unnecessary blank
lines and fix the use of tabs/spaces).

The e1000e patch from Koki (Fujitsu) adds a warning when link speed is
downgraded due to SmartSpeed.

The igb patch from Stefan (Red Hat) increases the timeout in the ethtool
offline self-test because some i350 adapters would sometimes fail the
self-test because link auto negotiation may take longer than the current
4 second timeout.

The igbvf patch from Alex is meant to address several race issues that
become possible because next_to_watch could possibly be set to a value
that shows that the descriptor is done when it is not.  In order to correct
that we instead make next_to_watch a pointer that is set to NULL during
cleanup, and set to the eop_desc after the descriptor rings have been written.

The remaining patches for ixgbe are a mix of fixes and added support as well
as some cleanup.  Most notably is the added support for displaying the
number of Tx/Rx channels via ethtool by Alex.  Also Aurélien adds the
ability for reading data from SFP+ modules over i2c for diagnostic
monitoring.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:35:42 -05:00
Li Zefan
f169007b27 cgroup: fail if monitored file and event_control are in different cgroup
If we pass fd of memory.usage_in_bytes of cgroup A to cgroup.event_control
of cgroup B, then we won't get memory usage notification from A but B!

What's worse, if A and B are in different mount hierarchy, we'll end up
accessing NULL pointer!

Disallow this kind of invalid usage.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-18 09:31:35 -08:00
Ying Xue
dec34fb0f5 net: fix a compile error when SOCK_REFCNT_DEBUG is enabled
When SOCK_REFCNT_DEBUG is enabled, below build error is met:

kernel/sysctl_binary.o: In function `sk_refcnt_debug_release':
include/net/sock.h:1025: multiple definition of `sk_refcnt_debug_release'
kernel/sysctl.o:include/net/sock.h:1025: first defined here
kernel/audit.o: In function `sk_refcnt_debug_release':
include/net/sock.h:1025: multiple definition of `sk_refcnt_debug_release'
kernel/sysctl.o:include/net/sock.h:1025: first defined here
make[1]: *** [kernel/built-in.o] Error 1
make: *** [kernel] Error 2

So we decide to make sk_refcnt_debug_release static to eliminate
the error.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:29:52 -05:00
Cong Wang
96b45cbd95 net: move ioctl functions into a separated file
They well deserve a separated unit.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:27:32 -05:00
Dave Young
31d1753663 net: ehea module param description fix
In ehea.h the minimal entries is 2^7 - 1:
#define EHEA_MIN_ENTRIES_QP  127

Thus change the module param description accordinglly

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:26:36 -05:00
Johannes Berg
8c6d59ee35 cfg80211: fix station change if TDLS isn't supported
Larry noticed (and bisected) that commit df881293c6
"cfg80211: Pass TDLS peer's QoS/HT/VHT information during set_station"
broke secure connections. This is is the case only for drivers that
don't support TDLS, where any kind of change, even just the change of
authorized flag that is required for normal operation, was rejected
now. To fix this, remove the checks. I have some patches that will add
proper verification for all the different cases later.

Cc: Jouni Malinen <j@w1.fi>
Bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-18 18:26:24 +01:00
Eric Dumazet
279e9f2ff7 ipv6: optimize inet6_hash_frag()
Use ipv6_addr_hash() and a single jhash invocation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:23:51 -05:00
David S. Miller
c8c5b28715 Merge branch 'tipc_net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Paul Gortmaker says:

====================
Two relatively small cleanup patches here, plus a reimplementation
of the patch Neil had questions about[1] in the last development
cycle.

Tested on today's net-next, between 32 and 64 bit x86 machines using
the server/client in tipc-utils, as usual.

[1] http://patchwork.ozlabs.org/patch/204507/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18 12:22:17 -05:00
Li Zefan
810cbee4fa cgroup: fix cgroup_rmdir() vs close(eventfd) race
commit 205a872bd6 ("cgroup: fix lockdep
warning for event_control") solved a deadlock by introducing a new
bug.

Move cgrp->event_list to a temporary list doesn't mean you can traverse
this list locklessly, because at the same time cgroup_event_wake() can
be called and remove the event from the list. The result of this race
is disastrous.

We adopt the way how kvm irqfd code implements race-free event removal,
which is now described in the comments in cgroup_event_wake().

v3:
- call eventfd_signal() no matter it's eventfd close or cgroup removal
that removes the cgroup event.

Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-18 09:17:24 -08:00
Lukas Czerner
1231b3a1eb ext4: fix xattr block allocation/release with bigalloc
Currently when new xattr block is created or released we we would call
dquot_free_block() or dquot_alloc_block() respectively, among the else
decrementing or incrementing the number of blocks assigned to the
inode by one block.

This however does not work for bigalloc file system because we always
allocate/free the whole cluster so we have to count with that in
dquot_free_block() and dquot_alloc_block() as well.

Use the clusters-to-blocks conversion EXT4_C2B() when passing number of
blocks to the dquot_alloc/free functions to fix the problem.

The problem has been revealed by xfstests #117 (and possibly others).

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable@vger.kernel.org
2013-02-18 12:12:07 -05:00
Alex Williamson
84237a826b vfio-pci: Add support for VGA region access
PCI defines display class VGA regions at I/O port address 0x3b0, 0x3c0
and MMIO address 0xa0000.  As these are non-overlapping, we can ignore
the I/O port vs MMIO difference and expose them both in a single
region.  We make use of the VGA arbiter around each access to
configure chipset access as necessary.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-02-18 10:11:13 -07:00
Alex Williamson
2dd1194833 vfio-pci: Manage user power state transitions
We give the user access to change the power state of the device but
certain transitions result in an uninitialized state which the user
cannot resolve.  To fix this we need to mark the PowerState field of
the PMCSR register read-only and effect the requested change on behalf
of the user.  This has the added benefit that pdev->current_state
remains accurate while controlled by the user.

The primary example of this bug is a QEMU guest doing a reboot where
the device it put into D3 on shutdown and becomes unusable on the next
boot because the device did a soft reset on D3->D0 (NoSoftRst-).

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-02-18 10:10:33 -07:00
Li Zefan
63f43f55c9 cpuset: fix cpuset_print_task_mems_allowed() vs rename() race
rename() will change dentry->d_name. The result of this race can
be worse than seeing partially rewritten name, but we might access
a stale pointer because rename() will re-allocate memory to hold
a longer name.

It's safe in the protection of dentry->d_lock.

v2: check NULL dentry before acquiring dentry lock.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
2013-02-18 09:08:15 -08:00
Li Zefan
71b5707e11 cgroup: fix exit() vs rmdir() race
In cgroup_exit() put_css_set_taskexit() is called without any lock,
which might lead to accessing a freed cgroup:

thread1                           thread2
---------------------------------------------
exit()
  cgroup_exit()
    put_css_set_taskexit()
      atomic_dec(cgrp->count);
                                   rmdir();
      /* not safe !! */
      check_for_release(cgrp);

rcu_read_lock() can be used to make sure the cgroup is alive.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
2013-02-18 09:08:10 -08:00
Takashi Iwai
e6e0ee507f ALSA: hda - Fix the silent speaker output on Fujitsu S7020 laptop
In the recent update, Fujitsu S7020 laptop with ALC260 codec lost the
speaker output, no matter how the amps and the pins are set.  After a
long debugging session, we found out that the default codec init code
is harmful for this machine, and we have to reset it to
ALC_INIT_NONE.

Reported-and-tested-by: Jonathan Woithe <jwoithe@just42.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-02-18 17:04:20 +01:00
Steven Rostedt (Red Hat)
7328735cbf ktest: Remove indexes from warnings check
The index of a line where a warning is tested can be returned
differently on different versions of gcc (or same version compiled
differently). That is, a tab + space can give different results. This
causes the warning check to produce a false positive. Removing the
index from the check fixes this issue.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-02-18 09:35:49 -05:00
Thomas Pedersen
c39ac036ad mac80211: don't spam mesh probe response messages
If mesh plink debugging is enabled, this gets annoying in
a crowded environment, fast.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-18 15:31:24 +01:00
Thomas Pedersen
95e48addba mac80211: stringify mesh peering events
Convert mesh peering events into strings and make the
debug output a little easier to read. Also stop printing
the llid and plid since these don't change across peering
states and are random numbers anyway so they just amount
to noise.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-18 15:31:23 +01:00
Thomas Pedersen
52ac8c4887 mac80211: clean up mesh HT operation
ieee80211_ht_cap_ie_to_sta_ht_cap() will clean up the
ht_supported flag and station bandwidth field for us
if the peer beacon doesn't have an HT capability element
(is operating as non-HT).

Also, we don't really need a special station ch_width
member to track the station operating mode any more so use
sta.bandwidth instead.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-18 15:31:23 +01:00
Johannes Berg
572078be54 mac80211: fix harmless station flush warning
If an interface is set down while authenticating or
associating, there's a station entry that will be
removed by the flushing in do_stop() and that will
cause a warning. It's otherwise harmless, but avoid
the warning by calling ieee80211_mgd_stop() first.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-18 13:48:40 +01:00
Johannes Berg
6295ca184b cfg80211: add correct docbook entries
Update the 802.11 book to list the correct
data structures.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-18 13:48:40 +01:00
David Henningsson
d06ac14399 ALSA: hda - add quirks for mute LED on two HP machines
These two machines have no mute LED string in BIOS.

BugLink: https://bugs.launchpad.net/bugs/1128934
Tested-by: Tammy Yang <tammy.yang@canonical.com>
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-02-18 11:46:46 +01:00
Dmitry Torokhov
befde0226a HID: uhid: make creating devices work on 64/32 systems
Unfortunately UHID interface, as it was introduced, is broken with 32 bit
userspace running on 64 bit kernels as it uses a pointer in its userspace
facing API.

Fix it by checking if we are executing compat task and munge the request
appropriately.

Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-02-18 11:28:16 +01:00
David Herrmann
89bdd0c6f3 HID: wiimote: fix nunchuck button parser
The buttons of the Wii Remote Nunchuck extension are actually active low.
Fix the parser to forward the inverted values. The comment in the function
always said "0 == pressed" but the implementation was wrong from the
beginning.

Cc: stable@vger.kernel.org
Reported-by: Victor Quicksilver <victor.quicksilver@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-02-18 10:41:52 +01:00
Paul Gortmaker
df63447f1a DocBook: update EXPORT_SYMBOL entry to point at export.h
Previously we used to get EXPORT_SYMBOL and friends from
module.h but we moved away from that since module.h largely
includes the entire header space one way or another.

As most users just wanted the simple export related macros,
they were spun off into a separate new header -- export.h
Update the docs to reflect that.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-02-18 10:39:48 +01:00
Paul Gortmaker
449e3a72f7 Documentation: update top level 00-INDEX file with new additions
It seems there are about 80 new, but undocumented addtions at
the top level Documentation directory.  This fixes up the top
level 00-INDEX by adding new entries and deleting a couple orphans.
Some subdirs could probably still use a check/cleanup too though.

Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-02-18 10:38:54 +01:00
Ian Abbott
30b6b7d8d3 HID: blacklist Velleman data acquisition boards
These are simple data acquistion boards, not HID devices and are handled
by the vmk80xx comedi driver.  At least one of them (10cf:5500)
misidentifies itself as a HID in its USB interface descriptor.  Ignore
all these devices.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-02-18 10:35:46 +01:00
Mika Westerberg
0d1e4b024f HID: sensor-hub: don't limit the driver only to USB bus
We now have two transport mediums: USB and I2C, where sensor hubs can
exists. So instead of constraining the driver to only these two we let it
to match any HID bus as long as the group is HID_GROUP_SENSOR_HUB.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-02-18 10:26:59 +01:00
Mika Westerberg
9ef0c5ff0c HID: sensor-hub: get rid of unused sensor_hub_grabbed_usages[] table
This table is not used anywhere in the driver so kill it.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-02-18 10:24:37 +01:00
Mika Westerberg
fa79f5ec53 HID: extend autodetect to handle I2C sensors as well
Since the advent of HID over I2C protocol, it is possible to have sensor
hubs behind I2C bus as well. We can autodetect this in a same way than USB
sensor hubs.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-02-18 10:23:48 +01:00
Benjamin Tissoires
1b474fe82d HID: ntrig: use input_configured() callback to set the name
The use of input_configured() allows the ntrig driver to actually
change the name of the input and its bitmask before it is added to the
input subsystem. Thus, the logs are coherents and udev catch the real
bitmask when the device is added.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Signed-off-by: Rafi Rubin <rafi@seas.upenn.edu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-02-18 10:16:20 +01:00
Alexander Holler
20bf062c65 x86/memtest: Shorten time for tests
By just reversing the order memtest is using the test patterns,
an additional round to zero the memory is not necessary.

This might save up to a second or even more for setups which are
doing tests on every boot.

Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1361029097-8308-1-git-send-email-holler@ahsoftware.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-18 09:28:42 +01:00
Zheng Liu
74cd15cd02 ext4: reclaim extents from extent status tree
Although extent status is loaded on-demand, we also need to reclaim
extent from the tree when we are under a heavy memory pressure because
in some cases fragmented extent tree causes status tree costs too much
memory.

Here we maintain a lru list in super_block.  When the extent status of
an inode is accessed and changed, this inode will be move to the tail
of the list.  The inode will be dropped from this list when it is
cleared.  In the inode, a counter is added to count the number of
cached objects in extent status tree.  Here only written/unwritten/hole
extent is counted because delayed extent doesn't be reclaimed due to
fiemap, bigalloc and seek_data/hole need it.  The counter will be
increased as a new extent is allocated, and it will be decreased as a
extent is freed.

In this commit we use normal shrinker framework to reclaim memory from
the status tree.  ext4_es_reclaim_extents_count() traverses the lru list
to count the number of reclaimable extents.  ext4_es_shrink() tries to
reclaim written/unwritten/hole extents from extent status tree.  The
inode that has been shrunk is moved to the tail of lru list.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>
2013-02-18 00:32:55 -05:00
Zheng Liu
bdedbb7b8d ext4: adjust some functions for reclaiming extents from extent status tree
This commit changes some interfaces in extent status tree because we
need to use inode to count the cached objects in a extent status tree.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>
2013-02-18 00:32:02 -05:00
Zheng Liu
69eb33dc24 ext4: remove single extent cache
Single extent cache could be removed because we have extent status tree
as a extent cache, and it would be better.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>
2013-02-18 00:31:07 -05:00
Zheng Liu
d100eef244 ext4: lookup block mapping in extent status tree
After tracking all extent status, we already have a extent cache in
memory.  Every time we want to lookup a block mapping, we can first
try to lookup it in extent status tree to avoid a potential disk I/O.

A new function called ext4_es_lookup_extent is defined to finish this
work.  When we try to lookup a block mapping, we always call
ext4_map_blocks and/or ext4_da_map_blocks.  So in these functions we
first try to lookup a block mapping in extent status tree.

A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks
in order not to put a hole into extent status tree because this hole
will be converted to delayed extent in the tree immediately.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>
2013-02-18 00:29:59 -05:00
Zheng Liu
f7fec032aa ext4: track all extent status in extent status tree
By recording the phycisal block and status, extent status tree is able
to track the status of every extents.  When we call _map_blocks
functions to lookup an extent or create a new written/unwritten/delayed
extent, this extent will be inserted into extent status tree.

We don't load all extents from disk in alloc_inode() because it costs
too much memory, and if a file is opened and closed frequently it will
takes too much time to load all extent information.  So currently when
we create/lookup an extent, this extent will be inserted into extent
status tree.  Hence, the extent status tree may not comprehensively
contain all of the extents found in the file.

Here a condition we need to take care is that an extent might contains
unwritten and delayed status simultaneously because an extent is delayed
allocated and could be allocated by fallocate.  At this time we need to
keep delayed status because later we need to update delayed reservation
space using it.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>
2013-02-18 00:28:47 -05:00
Zheng Liu
a25a4e1a5d ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
This commit lets ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
because in later commit ext4_map_blocks needs to use this flag to
determine the extent status.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-02-18 00:28:04 -05:00
Zheng Liu
be401363ac ext4: rename and improbe ext4_es_find_extent()
This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent
and improve this function.  First, we split input and output parameter.
Second, this function never return the first block of the next delayed
extent after 'es'.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>
2013-02-18 00:27:26 -05:00
Zheng Liu
fdc0212e86 ext4: add physical block and status member into extent status tree
This commit adds two members in extent_status structure to let it record
physical block and extent status.  Here es_pblk is used to record both
of them because physical block only has 48 bits.  So extent status could
be stashed into it so that we can save some memory.  Now written,
unwritten, delayed and hole are defined as status.

Due to new member is added into extent status tree, all interfaces need
to be adjusted.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-02-18 00:26:51 -05:00
Zheng Liu
06b0c88621 ext4: refine extent status tree
This commit refines the extent status tree code.

1) A prefix 'es_' is added to to the extent status tree structure
members.

2) Refactored es_remove_extent() so that __es_remove_extent() can be
used by es_insert_extent() to remove the old extent entry(-ies) before
inserting a new one.

3) Rename extent_status_end() to ext4_es_end()

4) ext4_es_can_be_merged() is define to check whether two extents can
be merged or not.

5) Update and clarified comments.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-02-18 00:26:51 -05:00