It seems like in the process of refactoring pwm_config() to utilize the
newly-introduced pwm_apply_state() API, some args/bounds checking was
dropped.
In particular, I noted that we are now allowing invalid period
selections, e.g.:
# echo 1 > /sys/class/pwm/pwmchip0/export
# cat /sys/class/pwm/pwmchip0/pwm1/period
100
# echo 101 > /sys/class/pwm/pwmchip0/pwm1/duty_cycle
[... driver may or may not reject the value, or trigger some logic bug ...]
It's better to see:
# echo 1 > /sys/class/pwm/pwmchip0/export
# cat /sys/class/pwm/pwmchip0/pwm1/period
100
# echo 101 > /sys/class/pwm/pwmchip0/pwm1/duty_cycle
-bash: echo: write error: Invalid argument
This patch reintroduces some bounds checks in both pwm_config() (for its
signed parameters; we don't want to convert negative values into large
unsigned values) and in pwm_apply_state() (which fix the above described
behavior, as well as other potential API misuses).
Fixes: 5ec803edcb ("pwm: Add core infrastructure to allow atomic updates")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit ef2bf4997f)
Change-Id: I7d7515a51b5c3c2a77ae9743b7ed69eedc11d4b4
Signed-off-by: David Wu <david.wu@rock-chips.com>
Add an ->apply() method to the pwm_ops struct to allow PWM drivers to
implement atomic updates. This method is preferred over the ->enable(),
->disable() and ->config() methods if available.
Add the pwm_apply_state() function to the PWM user API.
Note that the pwm_apply_state() does not guarantee the atomicity of the
update operation, it all depends on the availability and implementation
of the ->apply() method.
pwm_enable/disable/set_polarity/config() are now implemented as wrappers
around the pwm_apply_state() function.
pwm_adjust_config() is allowing smooth handover between the bootloader
and the kernel. This function tries to adapt the current PWM state to
the PWM arguments coming from a PWM lookup table or a DT definition
without changing the duty_cycle/period proportion.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
[thierry.reding@gmail.com: fix a couple of typos]
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 5ec803edcb)
Change-Id: Ib527f34ca3ce90ded3b48725d7ab9509de2053c9
Signed-off-by: David Wu <david.wu@rock-chips.com>
Add a ->get_state() function to the pwm_ops struct to let PWM drivers
initialize the PWM state attached to a PWM device.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 15fa8a43c1)
Change-Id: Ie37be2d34833cbb6cc4b4b4cb6af98ac721b953a
Signed-off-by: David Wu <david.wu@rock-chips.com>
Prepare the transition to PWM atomic update by moving the enabled and
disabled state into the pwm_state struct. This way we can easily update
the whole PWM state by copying the new state in the ->state field.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 09a7e4a3d9)
Change-Id: I38808d868c7e73f0ffd49cfb6c0b4b45fa911613
Signed-off-by: David Wu <david.wu@rock-chips.com>
The PWM state, represented by its period, duty_cycle and polarity is
currently directly stored in the PWM device. Declare a pwm_state
structure embedding those field so that we can later use this struct
to atomically update all the PWM parameters at once.
All pwm_get_xxx() helpers are now implemented as wrappers around
pwm_get_state().
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 43a276b003)
Change-Id: I44037440bcc9d9164d18f3bfa1c8ff3e593925c5
Signed-off-by: David Wu <david.wu@rock-chips.com>
Before the introduction of pwm_args, the core was resetting the PWM
period and polarity states to the reference values (those provided
through the DT, a PWM lookup table or hardcoded in the driver).
Now that all PWM users are correctly using pwm_args to configure their
PWM device, we can safely remove the pwm_apply_args() call in pwm_get()
and of_pwm_get().
We can also get rid of the pwm_set_period() call in pwm_apply_args(),
because PWM users are now directly using pargs->period instead of
pwm_get_period(). By doing that we avoid messing with the current PWM
period.
The only remaining bit in pwm_apply_args() is the initial polarity
setting, and it should go away when all PWM users have been patched to
use the atomic API (with this API the polarity will be set along with
other PWM arguments when configuring the PWM).
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit a8c3862551)
Change-Id: I5112f2d8a7b66e01184984376dcae84decf5ad28
Signed-off-by: David Wu <david.wu@rock-chips.com>
PWM devices are not protected against concurrent accesses. The lock in
struct pwm_device might let PWM users think it is, but it's actually
only protecting the enabled state.
Removing this lock should be fine as long as all PWM users are aware
that accesses to the PWM device have to be serialized, which seems to be
the case for all of them except the sysfs interface. Patch the sysfs
code by adding a lock to the pwm_export struct and making sure it's
taken for all relevant accesses to the exported PWM device.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 459a25afe9)
Change-Id: I30d97ebaf1db8b80c353da5ebebd0fc5d5c57bb0
Signed-off-by: David Wu <david.wu@rock-chips.com>
Since we will need the backlight_device_get_by_type API, we can use it
instead of the backlight_device_registered API whenever necessary so
remove the backlight_device_registered API.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 01c3664de6)
Change-Id: I12fdee74963c2cbc5b950019598ec218de1b7b5d
Signed-off-by: David Wu <david.wu@rock-chips.com>
It is useful to get the backlight device's pointer and use it to set
backlight in some cases(the following patch will make use of it) so add
the two APIs and export them.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit f6a4790a54)
Change-Id: Ia7d06bf600f86ae25939b830e49f5057f48f992d
Signed-off-by: David Wu <david.wu@rock-chips.com>
Currently the PWM core mixes the current PWM state with the per-platform
reference config (specified through the PWM lookup table, DT definition
or directly hardcoded in PWM drivers).
Create a struct pwm_args to store this reference configuration, so that
PWM users can differentiate between the current and reference
configurations.
Patch all places where pwm->args should be initialized. We keep the
pwm_set_polarity/period() calls until all PWM users are patched to use
pwm_args instead of pwm_get_period/polarity().
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
[thierry.reding@gmail.com: reword kerneldoc comments]
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit e39c0df1be)
Change-Id: Id9ec0898d92d7049813c32acbd909103e949c846
Signed-off-by: David Wu <david.wu@rock-chips.com>
Add helper function to set the state of active-discharge of
regulator using regmap. The HW regulator driver can directly
use this by providing the necessary information in the regulator
descriptor.
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 354794dacc)
Change-Id: I195290ac6ddfbadc7876a28b6df5149530954484
Signed-off-by: David Wu <david.wu@rock-chips.com>
Add support to enable/disable active discharge of regulator via
machine constraints. This configuration is done when setting
machine constraint during regulator register and if regulator
driver implemented the callback ops.
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 670666b9e0)
Change-Id: I5e22602d9a44b88482922ae0a726fd0d8f374e0a
Signed-off-by: David Wu <david.wu@rock-chips.com>
Make it possible to use the bulk API with optional supplies, by allowing
the consumer to marking supplies as optional in the regulator_bulk_data.
Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 3ff3f518a1)
Change-Id: I1bf21d36ca181932bf7c626d45fef49b50931ac9
Signed-off-by: David Wu <david.wu@rock-chips.com>
HDMI 2.0 introduces a new sampling mode called YCbCr 4:2:0.
According to the spec the EDID may contain two blocks that
signal this sampling mode:
- YCbCr 4:2:0 Video Data Block
- YCbCr 4:2:0 Video Capability Map Data Block
The video data block contains the list of vic's were
only YCbCr 4:2:0 sampling mode shall be used while the
video capability map data block contains a mask were
YCbCr 4:2:0 sampling mode may be used.
This RFC patch adds support for parsing these two new blocks
and introduces new flags to signal the drivers if the
mode is 4:2:0'only or 4:2:0'able.
The reason this is still a RFC is because there is no
reference in kernel for this new sampling mode (specially in
AVI infoframe part), so, I was hoping to hear some feedback
first.
Tested in a HDMI 2.0 compliance scenario.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Carlos Palminha <palminha@synopsys.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
(am from https://patchwork.kernel.org/patch/9495175)
Change-Id: I7c9e331b5bf5f1fbcefd4368bc4b82ff180eb91e
Signed-off-by: Zheng Yang <zhengyang@rock-chips.com>
Rather than using drm_match_cea_mode() to see if the EDID detailed
timings are supposed to represent one of the CEA/HDMI modes, add a
special version of that function that takes in an explicit clock
tolerance value (in kHz). When looking at the detailed timings specify
the tolerance as 5kHz due to the 10kHz clock resolution limit inherent
in detailed timings.
drm_match_cea_mode() uses the normal KHZ2PICOS() matching of clocks,
which only allows smaller errors for lower clocks (eg. for 25200 it
won't allow any error) and a bigger error for higher clocks (eg. for
297000 it actually matches 296913-297000). So it doesn't really match
what we want for the fixup. Using the explicit +-5kHz is much better
for this use case.
Not sure if we should change the normal mode matching to also use
something else besides KHZ2PICOS() since it allows a different
proportion of error depending on the clock. I believe VESA CVT
allows a maximum deviation of .5%, so using that for normal mode
matching might be a good idea?
Change-Id: I824ec50368ddf152c9daa747ba92aaba1ef50f4b
Cc: Adam Jackson <ajax@redhat.com>
Tested-by: nathan.d.ciobanu@linux.intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92217
Fixes: fa3a7340ea ("drm/edid: Fix up clock for CEA/HDMI modes specified via detailed timings")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 4c6bcf4454)
Signed-off-by: Zheng Yang <zhengyang@rock-chips.com>
Add support for the rk818 regulator. The regulator module consists
of 4 DCDCs, 9 LDOs, 1 switch and 1 BOOST converter which is used to
power OTG and HDMI5V.
The output voltages are configurable and are meant to supply power
to the main processor and other components.
Change-Id: I129a1f22c65684615e9ae792efaa880555f0235e
Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
(cherry picked from commit 1137529353)
Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
Adds parsing for HDMI 2.0 'HDMI Forum Vendor
Specific Data Block'. This block is present in
some HDMI 2.0 EDID's and gives information about
scrambling support, SCDC, 3D Views, and others.
Parsed parameters are stored in drm_connector
structure.
(am from: https://patchwork.kernel.org/patch/9273645)
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Change-Id: I5a1485b79a407fd27ac4754827de318175bb8f6a
Signed-off-by: Nickey Yang <nickey.yang@rock-chips.com>
SCDC is a mechanism defined in the HDMI 2.0 specification that allows
the source and sink devices to communicate.
This commit introduces helpers to access the SCDC and provides the
symbolic names for the various registers defined in the specification.
(am from: https://patchwork.kernel.org/patch/7258251/)
Change-Id: I378bc2b465a720ccfede35a93bce0d9371e78f78
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Nickey Yang <nickey.yang@rock-chips.com>
There are several SoCs (e.g. rk3228h and rk3328) that integrated
with Inno USB3 PHY, they can't toggle CP test pattern when do
USB3 compliance test by default.
This patch add a cp_test callback for USB3 controller to enable
the special USB3 PHY to toggle the CP test pattern.
Change-Id: I2d603202723a4c044d4231af10cfe2c60ec0e988
Signed-off-by: William Wu <wulf@rock-chips.com>
Some USB controllers (such as rk3328 SoC DWC3 controller with INNO
USB 3.0 PHY) don't support autosuspend well, when receive remote
wakeup signal from autosuspend, the Port Link State training failed,
the correct PLC is Resume->Recovery->U0, but when the issue happens,
the wrong PLC is Resume->Recovery->Inactive, cause resuming SS port
fail. This issue always occurs when connect with external USB 3.0 HUB.
This patch add a quirk to disable autosuspend function, and add new
'usb3_disable_autosuspend' member in xHCI platform data to support
set the quirk based on platform data.
Change-Id: Ice01d70178206e22658660361dd3a525046cbcf5
Signed-off-by: William Wu <wulf@rock-chips.com>
Some USB host controller seems to have problems with
autosuspend. For example, Rockchip rk3328 SoC USB 3.0
wouldn't handle remote wakeup correctly with external
hub after entered autosuspend, caused to resume SS
port fail.
This patch introduces a new quirk flag for hub that
should remain disabled for autosuspend.
Change-Id: I6d14222b2c5025583fea811a6afd6abd22f41cb9
Signed-off-by: William Wu <wulf@rock-chips.com>
[ Upstream commit 217e6fa24c ]
The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.
Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.
Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.
Fixes: 9ed988cd59 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f1712c7371 ]
Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()
The main problem seems that the sockets themselves are not RCU
protected.
If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.
Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.
[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0
Call Trace:
<IRQ>
[<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60
[<ffffffff81d55771>] sk_filter+0x41/0x210
[<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0
[<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0
[<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370
[<ffffffff81f07af9>] can_receive+0xd9/0x120
[<ffffffff81f07beb>] can_rcv+0xab/0x100
[<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0
[<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0
[<ffffffff81d37f67>] process_backlog+0x127/0x280
[<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0
[<ffffffff810c88d4>] __do_softirq+0x184/0x440
[<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30
<EOI>
[<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40
[<ffffffff810c8bed>] do_softirq+0x1d/0x20
[<ffffffff81d30085>] netif_rx_ni+0xe5/0x110
[<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520
[<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230
[<ffffffff810e3baf>] process_one_work+0x24f/0x670
[<ffffffff810e44ed>] worker_thread+0x9d/0x6f0
[<ffffffff810e4450>] ? rescuer_thread+0x480/0x480
[<ffffffff810ebafc>] kthread+0x12c/0x150
[<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70
Reported-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a PM domain is powered off before system suspend,
we hope do nothing in system runtime suspend noirq phase
and system runtime resume noirq phase.
Change-Id: Id72b1f92e10449c48006aced0d49612637402210
Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
If a PM domain is powered off when the first device starts its system PM
prepare phase, genpd prevents any further attempts to power on the PM
domain during the following system PM phases. Not until the system PM
complete phase is finalized for all devices in the PM domain, genpd again
allows it to be powered on.
This behaviour needs to be changed, as a subsystem/driver for a device in
the same PM domain may still need to be able to serve requests in some of
the system PM phases. Accordingly, it may need to runtime resume its
device and thus also request the corresponding PM domain to be powered on.
To deal with these scenarios, let's make the device operational in the
system PM prepare phase by runtime resuming it, no matter if the PM domain
is powered on or off. Changing this also enables us to remove genpd's
suspend_power_off flag, as it's being used to track this condition.
Additionally, we must allow the PM domain to be powered on via runtime PM
during the system PM phases.
This change also requires a fix in the AMD ACP (Audio CoProcessor) drm
driver. It registers a genpd to model the ACP as a PM domain, but
unfortunately it's also abuses genpd's "internal" suspend_power_off flag
to deal with a corner case at system PM resume.
More precisely, the so called SMU block powers on the ACP at system PM
resume, unconditionally if it's being used or not. This may lead to that
genpd's internal status of the power state, may not correctly reflect the
power state of the HW after a system PM resume.
Because of changing the behaviour of genpd, by runtime resuming devices in
the prepare phase, the AMD ACP drm driver no longer have to deal with this
corner case. So let's just drop the related code in this driver.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Maruthi Bayyavarapu <maruthi.bayyavarapu@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 39dd0f234f)
Change-Id: I1c964ebd660c8c7a8547f2206c80c25b936e7196
Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
commit 4d59b6ccf0 upstream.
Commit 513e3d2d11 ("cpumask: always use nr_cpu_ids in formatting and
parsing functions") converted both cpumask printing and parsing
functions to use nr_cpu_ids instead of nr_cpumask_bits. While this was
okay for the printing functions as it just picked one of the two output
formats that we were alternating between depending on a kernel config,
doing the same for parsing wasn't okay.
nr_cpumask_bits can be either nr_cpu_ids or NR_CPUS. We can always use
nr_cpu_ids but that is a variable while NR_CPUS is a constant, so it can
be more efficient to use NR_CPUS when we can get away with it.
Converting the printing functions to nr_cpu_ids makes sense because it
affects how the masks get presented to userspace and doesn't break
anything; however, using nr_cpu_ids for parsing functions can
incorrectly leave the higher bits uninitialized while reading in these
masks from userland. As all testing and comparison functions use
nr_cpumask_bits which can be larger than nr_cpu_ids, the parsed cpumasks
can erroneously yield false negative results.
This made the taskstats interface incorrectly return -EINVAL even when
the inputs were correct.
Fix it by restoring the parse functions to use nr_cpumask_bits instead
of nr_cpu_ids.
Link: http://lkml.kernel.org/r/20170206182442.GB31078@htj.duckdns.org
Fixes: 513e3d2d11 ("cpumask: always use nr_cpu_ids in formatting and parsing functions")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Martin Steigerwald <martin.steigerwald@teamix.de>
Debugged-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a96dfddbcc upstream.
Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().
BUG: unable to handle kernel paging request at ffffea017a000000
IP: show_valid_zones+0x6f/0x160
This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB. [1] An example of such
systems is desribed below. 0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.
BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable
Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range. show_valid_zones() then proceeds with the valid range.
[1] 'Commit bdee237c03 ("x86: mm: Use 2GB memory block size on
large-memory x86-64 systems")'
Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org> [4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 966d2b04e0 upstream.
percpu_ref_tryget() and percpu_ref_tryget_live() should return
"true" IFF they acquire a reference. But the return value from
atomic_long_inc_not_zero() is a long and may have high bits set,
e.g. PERCPU_COUNT_BIAS, and the return value of the tryget routines
is bool so the reference may actually be acquired but the routines
return "false" which results in a reference leak since the caller
assumes it does not need to do a corresponding percpu_ref_put().
This was seen when performing CPU hotplug during I/O, as hangs in
blk_mq_freeze_queue_wait where percpu_ref_kill (blk_mq_freeze_queue_start)
raced with percpu_ref_tryget (blk_mq_timeout_work).
Sample stack trace:
__switch_to+0x2c0/0x450
__schedule+0x2f8/0x970
schedule+0x48/0xc0
blk_mq_freeze_queue_wait+0x94/0x120
blk_mq_queue_reinit_work+0xb8/0x180
blk_mq_queue_reinit_prepare+0x84/0xa0
cpuhp_invoke_callback+0x17c/0x600
cpuhp_up_callbacks+0x58/0x150
_cpu_up+0xf0/0x1c0
do_cpu_up+0x120/0x150
cpu_subsys_online+0x64/0xe0
device_online+0xb4/0x120
online_store+0xb4/0xc0
dev_attr_store+0x68/0xa0
sysfs_kf_write+0x80/0xb0
kernfs_fop_write+0x17c/0x250
__vfs_write+0x6c/0x1e0
vfs_write+0xd0/0x270
SyS_write+0x6c/0x110
system_call+0x38/0xe0
Examination of the queue showed a single reference (no PERCPU_COUNT_BIAS,
and __PERCPU_REF_DEAD, __PERCPU_REF_ATOMIC set) and no requests.
However, conditions at the time of the race are count of PERCPU_COUNT_BIAS + 0
and __PERCPU_REF_DEAD and __PERCPU_REF_ATOMIC set.
The fix is to make the tryget routines use an actual boolean internally instead
of the atomic long result truncated to a int.
Fixes: e625305b39 percpu-refcount: make percpu_ref based on longs instead of ints
Link: https://bugzilla.kernel.org/show_bug.cgi?id=190751
Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Reviewed-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e625305b39 ("percpu-refcount: make percpu_ref based on longs instead of ints")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 003c941057 ]
Fix up a data alignment issue on sparc by swapping the order
of the cookie byte array field with the length field in
struct tcp_fastopen_cookie, and making it a proper union
to clean up the typecasting.
This addresses log complaints like these:
log_unaligned: 113 callbacks suppressed
Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>