mirror of
https://git.libssh.org/projects/libssh.git
synced 2026-02-06 10:27:22 +09:00
connector: Fix sftp aio read/write with ProxyJump
Addresses issue #319 The commit description explains: 1. Fix for sftp aio + read 2. Fix for sftp aio + write 1. Fix for sftp aio + read ------------------------- The reproducer provided in the issue description had a model as follows (with one jump host): fd_1---(socket_pair)---fd_2---(connector)----channel(fd_3)-----server Via debugging, it was noticed that the channel connected directly to the server stored a lot of unbuffered data (received from the server) that wasn't being written to fd_2 via the connector API. (Here on, channel refers to the channel(fd_3) in the diagram connected directly to the server) Consider the situation, where after a bit of progress in the transfer, the server has sent all the requested data (requested via outstanding requests) and all of that data is stored in channel->stdout_buffer. Say this data is 10,000 bytes. At this point, all the client (fd_1) is doing is waiting for all outstanding requests. (and processing thei responses) - POLLOUT event callback gets generated indicating that fd_2 is available for writing. - ssh_connector_fd_out_cb() gets called to handle the POLLOUT. - Assuming connector->in_available was true, 4096 (CHUNKSIZE) bytes get read from the channel. (really channel->stdout_buffer) leaving 10,000 - 4096 = 5904 bytes unread in the channel. - The read bytes are sent via fd_2 (so that fd_1 can recv them) - After this, the callback sets connector->in_available to 0 and connector->out_wontblock to 0. - Since out_wontblock has been set to 0 ssh_connector_reset_pollevents() (called after the callback returns) will consider POLLOUT events on the connector output. - (Based on assumption before) Since the client (fd_1) is eagerly awaiting responses and processing them, the received data gets processed quickly and fd_2 is available for sending/writing. - POLLOUT event gets generated for fd_2 indicating that its available for writing/sending to fd_1 - ssh_connector_fd_out_cb() gets called to handle the POLLOUT - Since connector->in_available is 0 (and ssh_connector_channel_data_cb() has not been trigerred in between as we have assumed before that all the data has already been received on the channel and is stored in the channel->stdout_buffer), ssh_connector_fd_out_cb() does nothing besides setting connector->out_wontblock to 1. - Since out_wontblock has been set to 1 ssh_connector_reset_pollevents() (called after the callback returns) will IGNORE POLLOUT events on the connector output. - So, at this point, the channel->buffer contains 5706 bytes and the fd_2 is available for writing/sending (out_wontblock is 1), but nothing happens and the transfer gets stalled/hanged. In my opinion, this hanging occurs because connector->in_available was incorrectly set to 0 despite the channel buffer having 5706 bytes in it. This commit changes that code to consider the data available to read on the channel (includes buffered data as well as polled data on channel's internal fd) and taking that into consideration to set in_available appropriately. (Instead of unconditionally setting it to 0 as the current code does) so that the next time POLLOUT gets received on fd_2 the ssh_connector_fd_out_cb() does read from the channel and write to fd_2 (as the connector->in_available flag would be set). 2. Fix for sftp aio + write ------------------------------------- On writing tests for sftp aio + proxyjump, it was encountered that file uploads were also hanging. Though I was not able to pin point the exact cause for this hanging, the nature of hanging was observed to be as follows: - sftp aio write + proxyjump blocks/hangs occasionally (not always) - It hangs at different points in the test - hang point 1: Sometimes it hangs after sending the first write request (i.e. the second write request call hangs and never returns, at this point we are not even waiting for response, just sending data). A lot of pending data to write to socket/fd was noticed at this hang point. - hang point 2: Sometimes it hangs while waiting for the second write request response. - It hangs at ssh_handle_packets_termination (i.e. this is the call that never returns), in context to hang point 1, this occurs due to trying to flush the channel during sftp_packet_write, and in context to hang point 2, this occurs due to trying to read an sftp response packet. - Not sure why, but more the verbose logging/printing I do, the lesser occasionally test hangs (e.g. 1 test in 6-7 test runs), maybe this could be a hint for a race condition / thread interaction related bug, but am not sure. Fix: On modifying the connector code to mark out_wontblock to 0 in case of output channel only when the channel's remote window is 0, the hanging no longer occured. Though, as mentioned before, I don't know the exact problem (i.e. case causing hanging) the fix addresses, but the fix is logical (if remote window is +ve data can still be written to channel and hence out_wontblock should not be reset to 0, it should be set to 1) and fixes the issue hence is added to this commit. Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com> Reviewed-by: Jakub Jelen <jjelen@redhat.com> Reviewed-by: Andreas Schneider <asn@cryptomilk.org>
This commit is contained in:
committed by
Jakub Jelen
parent
29dd7874cd
commit
dc39902006
@@ -305,6 +305,87 @@ static void ssh_connector_reset_pollevents(ssh_connector connector)
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* @internal
|
||||
*
|
||||
* @brief Update the connector's flags after a read-write io
|
||||
* operation
|
||||
*
|
||||
* This should be called after some data is successfully read from
|
||||
* connector's input and written to connector's output.
|
||||
*
|
||||
* @param[in, out] connector Connector for which the io operation occured.
|
||||
*
|
||||
* @warning This does not consider the case when the io indicated failure
|
||||
*
|
||||
* @warning This does not consider the case when the input indicated that
|
||||
* EOF was encountered.
|
||||
*/
|
||||
static void ssh_connector_update_flags_after_io(ssh_connector connector)
|
||||
{
|
||||
/*
|
||||
* With fds we can afford to mark:
|
||||
* - in_available as 0 after an fd read (even if more pending data can be
|
||||
* immediately read from the fd)
|
||||
*
|
||||
* - out_wontblock as 0 after an fd write (even if more data can
|
||||
* be written to the fd without blocking)
|
||||
*
|
||||
* since poll events set on the fd will get raised to indicate
|
||||
* possibility of read/write in case existing situation is apt
|
||||
* (i.e can read/write occur right now) or if situation becomes
|
||||
* apt in future (read data becomes available, write becomes
|
||||
* possible)
|
||||
*/
|
||||
|
||||
/*
|
||||
* On the other hand, with channels we need to be more careful
|
||||
* before claiming read/write not possible because channel callbacks
|
||||
* are called in limited scenarios.
|
||||
*
|
||||
* (e.g. connector callback to indicate read data available on input
|
||||
* channel is called only when new data is received on channel. It is
|
||||
* not called when we have some pending data in channel's buffers but
|
||||
* don't receive any new data on the channel)
|
||||
*
|
||||
* Hence, in case of channels, blindly setting flag associated with
|
||||
* read/write input/output to 0 after a read/write may not be a good
|
||||
* idea as the callback that sets it back to 1 again may not be ever
|
||||
* called again.
|
||||
*/
|
||||
|
||||
uint32_t window_size;
|
||||
|
||||
/* update in_available based on input source (fd or channel) */
|
||||
if (connector->in_fd != SSH_INVALID_SOCKET) {
|
||||
connector->in_available = 0;
|
||||
} else if (connector->in_channel != NULL) {
|
||||
if (ssh_channel_poll_timeout(connector->in_channel, 0, 0) > 0) {
|
||||
connector->in_available = 1;
|
||||
} else {
|
||||
connector->in_available = 0;
|
||||
}
|
||||
} else {
|
||||
/* connector input is invalid ! */
|
||||
return;
|
||||
}
|
||||
|
||||
/* update out_wontblock based on output source (fd or channel) */
|
||||
if (connector->out_fd != SSH_INVALID_SOCKET) {
|
||||
connector->out_wontblock = 0;
|
||||
} else if (connector->out_channel != NULL) {
|
||||
window_size = ssh_channel_window_size(connector->out_channel);
|
||||
if (window_size > 0) {
|
||||
connector->out_wontblock = 1;
|
||||
} else {
|
||||
connector->out_wontblock = 0;
|
||||
}
|
||||
} else {
|
||||
/* connector output is invalid ! */
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* @internal
|
||||
*
|
||||
@@ -390,8 +471,8 @@ static void ssh_connector_fd_in_cb(ssh_connector connector)
|
||||
ssh_set_error(connector->session, SSH_FATAL, "output socket or channel closed");
|
||||
return;
|
||||
}
|
||||
connector->out_wontblock = 0;
|
||||
connector->in_available = 0;
|
||||
|
||||
ssh_connector_update_flags_after_io(connector);
|
||||
} else {
|
||||
connector->in_available = 1;
|
||||
}
|
||||
@@ -444,8 +525,8 @@ ssh_connector_fd_out_cb(ssh_connector connector)
|
||||
"Output socket or channel closed");
|
||||
return;
|
||||
}
|
||||
connector->in_available = 0;
|
||||
connector->out_wontblock = 0;
|
||||
|
||||
ssh_connector_update_flags_after_io(connector);
|
||||
} else {
|
||||
connector->out_wontblock = 1;
|
||||
}
|
||||
@@ -566,11 +647,7 @@ static int ssh_connector_channel_data_cb(ssh_session session,
|
||||
return SSH_ERROR;
|
||||
}
|
||||
|
||||
connector->out_wontblock = 0;
|
||||
connector->in_available = 0;
|
||||
if ((unsigned int)w < len) {
|
||||
connector->in_available = 1;
|
||||
}
|
||||
ssh_connector_update_flags_after_io(connector);
|
||||
ssh_connector_reset_pollevents(connector);
|
||||
|
||||
return w;
|
||||
@@ -642,8 +719,8 @@ ssh_connector_channel_write_wontblock_cb(ssh_session session,
|
||||
|
||||
return 0;
|
||||
}
|
||||
connector->in_available = 0;
|
||||
connector->out_wontblock = 0;
|
||||
|
||||
ssh_connector_update_flags_after_io(connector);
|
||||
} else {
|
||||
connector->out_wontblock = 1;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user