Addresses issue #319
The commit description explains:
1. Fix for sftp aio + read
2. Fix for sftp aio + write
1. Fix for sftp aio + read
-------------------------
The reproducer provided in the issue description had a model
as follows (with one jump host):
fd_1---(socket_pair)---fd_2---(connector)----channel(fd_3)-----server
Via debugging, it was noticed that the channel connected directly to
the server stored a lot of unbuffered data (received from the server)
that wasn't being written to fd_2 via the connector API.
(Here on, channel refers to the channel(fd_3) in the diagram connected
directly to the server)
Consider the situation, where after a bit of progress in the transfer,
the server has sent all the requested data (requested via outstanding
requests) and all of that data is stored in channel->stdout_buffer. Say
this data is 10,000 bytes.
At this point, all the client (fd_1) is doing is waiting for all
outstanding requests. (and processing thei responses)
- POLLOUT event callback gets generated indicating that fd_2 is
available for writing.
- ssh_connector_fd_out_cb() gets called to handle the POLLOUT.
- Assuming connector->in_available was true, 4096 (CHUNKSIZE) bytes
get read from the channel. (really channel->stdout_buffer) leaving
10,000 - 4096 = 5904 bytes unread in the channel.
- The read bytes are sent via fd_2 (so that fd_1 can recv them)
- After this, the callback sets connector->in_available to 0 and
connector->out_wontblock to 0.
- Since out_wontblock has been set to 0 ssh_connector_reset_pollevents()
(called after the callback returns) will consider POLLOUT events on the
connector output.
- (Based on assumption before) Since the client (fd_1) is eagerly
awaiting responses and processing them, the received data gets
processed quickly and fd_2 is available for sending/writing.
- POLLOUT event gets generated for fd_2 indicating that its available
for writing/sending to fd_1
- ssh_connector_fd_out_cb() gets called to handle the POLLOUT
- Since connector->in_available is 0 (and
ssh_connector_channel_data_cb() has not been trigerred in between
as we have assumed before that all the data has already been received on the
channel and is stored in the channel->stdout_buffer), ssh_connector_fd_out_cb()
does nothing besides setting connector->out_wontblock to 1.
- Since out_wontblock has been set to 1 ssh_connector_reset_pollevents()
(called after the callback returns) will IGNORE POLLOUT events on the
connector output.
- So, at this point, the channel->buffer contains 5706 bytes and the
fd_2 is available for writing/sending (out_wontblock is 1), but
nothing happens and the transfer gets stalled/hanged.
In my opinion, this hanging occurs because connector->in_available was
incorrectly set to 0 despite the channel buffer having 5706 bytes in it.
This commit changes that code to consider the data available to read
on the channel (includes buffered data as well as polled data on
channel's internal fd) and taking that into consideration to set
in_available appropriately. (Instead of unconditionally setting it to 0 as the
current code does) so that the next time POLLOUT gets received on fd_2
the ssh_connector_fd_out_cb() does read from the channel and write to
fd_2 (as the connector->in_available flag would be set).
2. Fix for sftp aio + write
-------------------------------------
On writing tests for sftp aio + proxyjump, it was encountered
that file uploads were also hanging. Though I was not able to
pin point the exact cause for this hanging, the nature of hanging
was observed to be as follows:
- sftp aio write + proxyjump blocks/hangs occasionally (not always)
- It hangs at different points in the test
- hang point 1: Sometimes it hangs after sending the first write request
(i.e. the second write request call hangs and never returns, at this point
we are not even waiting for response, just sending data). A lot of pending
data to write to socket/fd was noticed at this hang point.
- hang point 2: Sometimes it hangs while waiting for the second write request
response.
- It hangs at ssh_handle_packets_termination (i.e. this is the
call that never returns), in context to hang point 1, this occurs due to
trying to flush the channel during sftp_packet_write, and in context to
hang point 2, this occurs due to trying to read an sftp response packet.
- Not sure why, but more the verbose logging/printing I do, the lesser
occasionally test hangs (e.g. 1 test in 6-7 test runs), maybe this could
be a hint for a race condition / thread interaction related bug, but am
not sure.
Fix: On modifying the connector code to mark out_wontblock
to 0 in case of output channel only when the channel's
remote window is 0, the hanging no longer occured.
Though, as mentioned before, I don't know the exact problem
(i.e. case causing hanging) the fix addresses, but the fix
is logical (if remote window is +ve data can still be written
to channel and hence out_wontblock should not be reset to 0, it should
be set to 1) and fixes the issue hence is added to this commit.
Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
Reviewed-by: Jakub Jelen <jjelen@redhat.com>
Reviewed-by: Andreas Schneider <asn@cryptomilk.org>
The jump thread was touching the main session object, which is
really not guaranteed to be thread safe.
The moving of the proxyjump strucutre was quite ineffective
as it involved moving the whole list to new list and then removing
the first item. This could be done easily by popping the head and
moving the whole remaining lists without any allocations.
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Reviewed-by: Andreas Schneider <asn@cryptomilk.org>
A proxyjump callback structure consists of three callbacks
as of this writing: before_connection, authenticate and
verify_knownhost. One or more of these callbacks can be
set as NULL by the user to indicate that libssh should use
the defaults.
The code checked the presence of the callback stucture but
not whether before_connection was available or not (non NULL)
before dereferencing it.
This could lead to undefined behaviour if the user specifies
say authenticate and verify_knownhost for a jump host but not
before_connection.
This commit fixes the code to add a check for before_connection
being non NULL before trying access it.
Signed-off-by: Eshan Kelkar <eshankelkar@galorithm.com>
Reviewed-by: Jakub Jelen <jjelen@redhat.com>
Reviewed-by: Andreas Schneider <asn@cryptomilk.org>
When `known_hosts` file contained matching valid entry followed by
invalid entry, the first record was already allocated in
`ssh_known_hosts_read_entries()`, but not freed on error.
This could cause possible memory leaks in client, but we do not
consider them as security relevant as the leaks do not add up and
successful exploitaition is hard or impossible.
Originally reported by Kang Yang.
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Reviewed-by: Norbert Pocs <norbertpocs0@gmail.com>
Originally reported with this patch by Brian Carpenter from Deep Fork Cyber.
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Reviewed-by: Pavol Žáčik <pzacik@redhat.com>
The version 0.4.0 fixed the issues of multi-digit version numbers
which we hit with releaseing libssh ABI version 4_10 with last
release.
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Reviewed-by: Pavol Žáčik <pzacik@redhat.com>
When we use empty configuration file, some stuff go south in c10s
and for example fips mode detection does not work anymore.
Providing minimal configuration file avoids the issues of loading
the provider too early, while keeping fips mode activation working
and tests happy.
It also configures the pkcs11-provider to assume the token provides
FIPS approved crypto so the tests can work.
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Reviewed-by: Pavol Žáčik <pzacik@redhat.com>
Reviewed-by: Andreas Schneider <asn@cryptomilk.org>
The maximal lenght of unix domain socket path is 108 characters. When
the build directory (and UID wrapper home directories) are too deep
in the filesystem, OpenSSH will fail to create the socket file,
which is failing this test.
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Reviewed-by: Pavol Žáčik <pzacik@redhat.com>
Reviewed-by: Andreas Schneider <asn@cryptomilk.org>
without explicitly setting the algorithms, they might be set by
some other configuration file, for example crypto policies pulled
from `/etc/libssh/libssh_server.config` during RPM build.
Log also the generated configuration file and change the other case
to use standard logging mechanism instead of fprintf.
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Reviewed-by: Pavol Žáčik <pzacik@redhat.com>
Reviewed-by: Andreas Schneider <asn@cryptomilk.org>