Troubleshooting low TCP transfer rate behind a router

I’m trying to debug a low tcp transfer rate on a host (“lowHost”).
As a reference, i’m comparing it against a second host (“highHost”).

I’m making two rate measurements:

  1. Downloading a big file using curl from a webserver.
  2. Using iperf

lowHost goes through an extra network hop (“router”) compared to highHost.

lowHost  ---(router)----->(fw)----> internet ----> webserver
highHost ---------------->(fw)----> internet ----> webserver

“router” is a linux host which nat’s (iptables) and routes traffic between a group of network interfaces.
I’m trying to understand if this router is the cause of the low tcp rate.

My best guess is that the low rate is caused by spikes in latency, which restrict the tcp cwnd.

                         |  lowHost                     |  highHost
--------------------------------------------------------------------------------
curl rate                |  40Mbps rate                 |  400Mbps rate
curl wshark “rtt”        |  ~100 pkts with rtt>100ms    |  ~5 pkts with rtt>100ms
curl rwnd                |  3MB                         |  3MB
curl cwnd                |  0.2MB                       |  2.5MB            
--------------------------------------------------------------------------------
iperf tcp rate           |  100Mbps                     |  700Mbps
iperf tcp wshark “rtt”   |  0 pkts with rtt>100ms       |
iperf tcp cwnd           |  0.5MB                       |
--------------------------------------------------------------------------------
iperf udp rate           |  350Mbps                     |

ping stats:
    lowHost --> webserver
         79 packets transmitted, 79 received, 0% packet loss, time 78077ms
         rtt min/avg/max/mdev = 33.285/33.664/38.176/0.800 ms

    highHost --> webserver
         108 packets transmitted, 108 received, 0% packet loss, time 106949ms
         rtt min/avg/max/mdev = 32.684/32.855/39.706/0.689 ms

I’ve used systemtap, to check the duration of ip_forward call in the kernel, to see if it’s the source of the latency spikes.
When sampling over 60sec, the max ip_forward call duration is 8ms, not nearly enough to account for the latency seen in the wireshark “rtt”.

If it’s not the router, what else could it be?
If it is the router, and it’s not latency, what else could it be?
Are there any socket statistics that can point me to the issue (/bin/ss)?
why is the iperf rate double the curl rate?

wireshark throughput/rtt/rwnd/cwnd graphs:

Answer

Attribution
Source : Link , Question Author : Tomer , Answer Author : Community

Leave a Comment