SSH2 Connect Timeouts While Issuing Remote Commands

I’m running a script that issues remote commands over SSH to a few production boxes. There are RSA keys in place for the user to connect to these boxes and in most cases there are no problems. Intermittently, I get connection timeouts, maybe once out of every 20 connections.

I’m using SSH2 as the protocol, mainly because SSH2 is available on all of the machines I’m connecting to. This is what the debugging output looks like, up to the point where the connection hangs and then closes.

[mike@dev ~]$ ssh -v -o "Protocol=2" 10.60.###.### "w"
OpenSSH_4.3p2, OpenSSL 0.9.8b 04 May 2006
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to 10.60.###.### [10.60.###.###] port 22.
debug1: Connection established.
debug1: identity file /home/mike/.ssh/id_rsa type 1
debug1: identity file /home/mike/.ssh/id_dsa type -1
debug1: loaded 2 keys
debug1: Remote protocol version 1.99, remote software version OpenSSH_3.9p1
debug1: match: OpenSSH_3.9p1 pat OpenSSH_3.*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_4.3
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-cbc hmac-md5 none
debug1: kex: client->server aes128-cbc hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
Connection closed by 10.60.###.###

Connection hangs here, and then closes.

On some occasions, the connection gets a little bit further, but again hangs.

...
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
Connection closed by 10.60.###.###

The server’s log ends something like this, in cases where the connection dies:

Jul 23 16:53:07 host02 sshd[4989]: debug1: Client protocol version 2.0; client software version OpenSSH_4.3
Jul 23 16:53:07 host02 sshd[4989]: debug1: match: OpenSSH_4.3 pat OpenSSH*
Jul 23 16:53:07 host02 sshd[4989]: debug1: Enabling compatibility mode for protocol 2.0
Jul 23 16:53:07 host02 sshd[4989]: debug1: Local version string SSH-1.99-OpenSSH_3.9p1
Jul 23 16:53:07 host02 sshd[4990]: debug1: permanently_set_uid: 74/74
Jul 23 16:53:07 host02 sshd[4990]: debug1: list_hostkey_types: ssh-rsa,ssh-dss
Jul 23 16:53:07 host02 sshd[4990]: debug1: SSH2_MSG_KEXINIT sent
Jul 23 16:53:07 host02 sshd[4990]: debug1: SSH2_MSG_KEXINIT received
Jul 23 16:53:07 host02 sshd[4990]: debug1: kex: client->server aes128-cbc hmac-md5 none
Jul 23 16:53:07 host02 sshd[4990]: debug1: kex: server->client aes128-cbc hmac-md5 none
Jul 23 16:53:40 host02 sshd[4990]: debug1: SSH2_MSG_KEX_DH_GEX_REQUEST received
Jul 23 16:53:40 host02 sshd[4990]: debug1: SSH2_MSG_KEX_DH_GEX_GROUP sent
Jul 23 16:53:40 host02 sshd[4990]: debug1: expecting SSH2_MSG_KEX_DH_GEX_INIT
Jul 23 16:53:40 host02 sshd[4990]: Connection closed by ::ffff:192.168.1.203 

I have found with one server I’ve been testing that if I instead use SSH1, I haven’t seen any failed connections. But I’m wondering if this can be fixed for SSH2, rather than having to open up SSH1 on all of the production machines.

Out of 100 connection attempts…

Using SSH1: 100 connections succeed.
Using SSH2: approx. 5 connections fail.

Is there something (easy) I’m missing?

Answer

We finally worked this out a few weeks ago, and it’s worth following up. We run a VPN from our local office router to our firewall at our data center, both are Cisco. We updated the IOS on the router and firewall and that seems to have done the trick.

Thanks for all of the ideas/answers that were submitted.

Attribution
Source : Link , Question Author : Mike Brittain , Answer Author : Mike Brittain

Leave a Comment