I have an issue with private network traffic not being masqueraded in very specific circumstances.
The network is a group of VMware guests using the
10.1.0.0/18
network.The problematic host is
10.1.4.20 255.255.192.0
and the only gateway it is configured to use is10.1.63.254
. The gateway server37.59.245.59
should be masquerading all outbound traffic and forwarding it through37.59.245.62
, but for some reason,10.1.4.20
ends up occasionally having37.59.245.62
in its routing cache.ip -s route show cache 199.16.156.40 199.16.156.40 from 10.1.4.20 via 37.59.245.62 dev eth0 cache used 149 age 17sec ipid 0x9e49 199.16.156.40 via 37.59.245.62 dev eth0 src 10.1.4.20 cache used 119 age 11sec ipid 0x9e49 ip route flush cache 199.16.156.40 ping api.twitter.com PING api.twitter.com (199.16.156.40) 56(84) bytes of data. 64 bytes from 199.16.156.40: icmp_req=1 ttl=247 time=93.4 ms ip -s route show cache 199.16.156.40 199.16.156.40 from 10.1.4.20 via 10.1.63.254 dev eth0 cache age 3sec 199.16.156.40 via 10.1.63.254 dev eth0 src 10.1.4.20 cache used 2 age 2sec
The question is, why am I seeing a public IP address in my routing cache on a private network?
Network information for the app server (without lo) :
ip a eth0 Link encap:Ethernet HWaddr 00:50:56:a4:48:20 inet addr:10.1.4.20 Bcast:10.1.63.255 Mask:255.255.192.0 inet6 addr: fe80::250:56ff:fea4:4820/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1523222895 errors:0 dropped:407 overruns:0 frame:0 TX packets:1444207934 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1524116772058 (1.5 TB) TX bytes:565691877505 (565.6 GB)
Network information for the VPN gateway (without lo too) :
eth0 Link encap:Ethernet HWaddr 00:50:56:a4:56:e9 inet addr:37.59.245.59 Bcast:37.59.245.63 Mask:255.255.255.192 inet6 addr: fe80::250:56ff:fea4:56e9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7030472688 errors:0 dropped:1802 overruns:0 frame:0 TX packets:6959026084 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:7777330931859 (7.7 TB) TX bytes:7482143729162 (7.4 TB) eth0:0 Link encap:Ethernet HWaddr 00:50:56:a4:56:e9 inet addr:10.1.63.254 Bcast:10.1.63.255 Mask:255.255.192.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth0:1 Link encap:Ethernet HWaddr 00:50:56:a4:56:e9 inet addr:10.1.127.254 Bcast:10.1.127.255 Mask:255.255.192.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:10.8.1.1 P-t-P:10.8.1.2 Mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1 RX packets:477047415 errors:0 dropped:0 overruns:0 frame:0 TX packets:833650386 errors:0 dropped:101834 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:89948688258 (89.9 GB) TX bytes:1050533566879 (1.0 TB)
eth0 leads to the outside world, and tun0 to an openvpn network of VMs on which sits the app server.
ip r
for the VPN gateway :default via 37.59.245.62 dev eth0 metric 100 10.1.0.0/18 dev eth0 proto kernel scope link src 10.1.63.254 10.1.64.0/18 dev eth0 proto kernel scope link src 10.1.127.254 10.8.1.0/24 via 10.8.1.2 dev tun0 10.8.1.2 dev tun0 proto kernel scope link src 10.8.1.1 10.9.0.0/28 via 10.8.1.2 dev tun0 37.59.245.0/26 dev eth0 proto kernel scope link src 37.59.245.59
ip r
on the app server :default via 10.1.63.254 dev eth0 metric 100 10.1.0.0/18 dev eth0 proto kernel scope link src 10.1.4.20
Firewall rules:
Chain PREROUTING (policy ACCEPT 380M packets, 400G bytes) pkts bytes target prot opt in out source destination Chain INPUT (policy ACCEPT 127M packets, 9401M bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 1876K packets, 137M bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 223M packets, 389G bytes) pkts bytes target prot opt in out source destination 32M 1921M MASQUERADE all -- * eth0 10.1.0.0/17 0.0.0.0/0
Answer
Unfortunately, most of what you’re seeing is due to routing issues between external routers, they obtain and update their routing info dynamically, to help route traffic around problematic areas, but when those routes get changed often (normally due to availability) it is called route flapping. That is getting reflected down to you, normally end users don’t see any of this..
You could attempt to disable your route cache, as explained here (note the caveats, it’s not something that seems to offer much on the upside), but I think you’d be better off just talking to the network admin(s) locally as it seems it’s their routing which is really unstable.
I am of course going with the assumption that it isn’t you responsible for network administration.
Attribution
Source : Link , Question Author : greg0ire , Answer Author : NickW