Dropped packets, on recieve only, Server 2008 only, and network speed is 100mb/s

I have a really strange one.

I have packet loss with Excessive ‘TCP Dup ACK’ & ‘TCP Fast Retransmission’ when I download files (and only download) from two different Windows 2008 servers. Upload speed is fine.

This ONLY occurs if the client computers(Win7) is connected at 100mb/s. At 1GB, no errors and I get full speed. If I set the client nic to 100Mb/s, I get a lot of ‘TCP Dup’ errors and the download speed drops to around 2-5MB/s. Upload speed is 10MB/s or above.

This only happens to the Windows 2008 Server boxes (Dell, but different hardware). This problem does not occur if I transmit between the Win7 clients and the Linux servers.

It’s like Server 2008 is unable to scale the TCP window properly, overloads the switch or something, then pauses traffic for a bit.

Parts of the network run at 100Mb/s due to older equipment, so this is really causing a problem in some buildings.

I have uploaded a pcap file from the client here.
https://dl.dropboxusercontent.com/u/24907255/slow.pcap.gz

It shows a 50MB file being written to the server, then read back from the server with the errors.

Thanks for any help. I am stumped.


11/28/13 More Information.

I shutdown the entire network so that only one client and one server are on the network. No change in the problem.

If I set every interface, server, client and Cisco 2960 switch to 100Mbs full, then the problem goes away. If I set the server and switch interface auto or 1Gbs, the problem is back.

If I bypass the switch with a Netgear 10/100 switch and set both client and server to auto, I have no problems.

I did discover this. In the normal setup, with server to switch at 1Gbs, I plug in the Netgear 10/100 switch between the client and Cisco switch, my speed problem is even worse. Speeds go from 5-7MB/s to 2-3MB/s, and yes I have tried fixed and auto network speeds.
This would explain why some of the buildings that have a 2 switch hop between them and the main Cisco switch have more of a speed problem.

On to pinging. With everything at 1GB/s, I can ping a full TCP payload,
ping -l 65500 and it works. With the client at 100Mbs, the max size I can ping is 17752. Anymore and it fails, to the Windows servers only, no problem on the Linux boxes.
With the Netgear 10/100 between the server and client, no problems pinging at 65500.


Update 3

I swapped in a PowerConnect 2748 switch. Same problem with the server at 1Gbs and the client at 100Mbs. I can ping over 17752 now tho. Strange. So I don’t think it’s the Cisco switch.


Update 4.
I am trying to get some hard numbers by using ipref.
All systems connected to the same switch, with the client set to 100Mbs and running the command ipref.exe -c -u -b 10m. So sending to the server.
One server is 2008 with no load on it right now, other is a Ubuntu with a load avg of .20.

At 10m

  • Linux jitter 0.022ms, packet loss is 0/8505
  • Server 2008 jitter 1.859, packet loss 68/8505

Pushing it to 100m

  • Linux jitter 0.445, packet loss 0/26634
  • Server 2008 jitter 0.542, packet loss 94/26596

Now for stats sending TO the client at 10m

  • Linux jitter 0.271 ms, 0/ 8500 (0%) 1 datagrams received
    out-of-order
  • Server 2008 jitter .063, 20/8505 (0.24%)

Pushing it to 100m

  • Linux jitter 0.230 ms 4083/85443 (4.8%), 1 datagrams received
    out-of-order, 95.7Mbs
  • Server 2008 jitter 0.237, 28174/81718 (47%),
    51.1mbs

So Server 2008 is poor in general, but you can see the huge packet loss 47% when the connection is pushed to the clients 100mbs limit.


Update 5.

When I tested with the PowerConnect 2748 switch, I used different cat5 cable between the server and switch and client and switch. This should rule out cabling or switch issues.

I have two Windows 2008 Servers in this environment, installed at different times, and on different hardware. The only thing they share is a Broadcom branded nic, but the chipset is different. Both experience the same problem, but I am doing my main testing on one so in case something goes wrong, the other will still work.

The one server has a built on BCM5709C with two ports, and an add-on card, pci express I think, card also with the same BCM5709C chipset and two ports. I have tried all of them and the problem still exist. So this should rule out any hardware problems.


Update 6 12/3/13
I installed the Intel nic. No change. I played around with the ctcp settings and no change there. I even turned off SMB2 and no difference.

I did some more testing at 100Mbs
Copying a 3GB ISO image TO the server, drag and drop, averages out at 10MB/s.
Copying the same 3GB ISO image FROM the server, averages out at 6.3MB/s.

With all network interfaces set to Auto and at 1Gbs.
Copying the ISO TO the server, averages 101MB/s
Copying the ISO FROM the server, averages 57MB/s

So read speeds from the server are almost half the write speeds.

Answer

This sounds like a speed/duplex mismatch causing collisions and retransmits. Misconfiguration between the server and the other side could cause this. Another reason for the mismatch could be failing autonegotiation.

Make sure both ends of the connection are configured identically regarding speed and duplex.

Attribution
Source : Link , Question Author : Porch , Answer Author : Teun Vink

Leave a Comment