routing table madness
Lonni J Friedman
netllama at gmail.com
Mon Nov 5 14:25:54 PST 2012
For any of you who have strong routing table fu, I'd appreciate some
assistance. I've got a bunch of servers with multiple(3) NICs and
associated network interfaces. I'm tripping over a bizarre routing
problem, where traffic that should use the default route is not, and
failing as a result. Here's what my routing table looks like:
#########
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.31.96.1 0.0.0.0 UG 0 0 0 em3
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 em1
10.31.96.0 0.0.0.0 255.255.252.0 U 0 0 0 em3
10.31.96.0 0.0.0.0 255.255.252.0 U 0 0 0 em4
#########
10.31.96.1 is my default route that all traffic should be using (that
em# stuff is a Fedora thing, you can safely mentally substitute 'eth'
everywhere that you see 'em' if it makes it easier to follow). Here's
ifconfig output:
#########
em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.100 netmask 255.0.0.0 broadcast 10.255.255.255
inet6 fe80::b6b5:2fff:fe5b:9e7c prefixlen 64 scopeid 0x20<link>
ether b4:b5:2f:5b:9e:7c txqueuelen 1000 (Ethernet)
RX packets 283922868 bytes 44297545348 (41.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 538064680 bytes 108980632740 (101.4 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xfeb60000-feb80000
em3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.31.97.100 netmask 255.255.252.0 broadcast 10.31.99.255
inet6 fe80::b6b5:2fff:fe5b:9e7e prefixlen 64 scopeid 0x20<link>
ether b4:b5:2f:5b:9e:7e txqueuelen 1000 (Ethernet)
RX packets 3733210 bytes 1042607750 (994.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1401537 bytes 114335537 (109.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xfea60000-fea80000
em4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.31.96.61 netmask 255.255.252.0 broadcast 10.31.99.255
inet6 fe80::b6b5:2fff:fe5b:9e7f prefixlen 64 scopeid 0x20<link>
ether b4:b5:2f:5b:9e:7f txqueuelen 1000 (Ethernet)
RX packets 2416588 bytes 196633917 (187.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 205038 bytes 19363499 (18.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xfeae0000-feb00000
#########
em1/10.0.0.100 goes to a switch that is attached *only* to servers in
the same rack. Its used only for the servers in that rack to
communicate amongst themselves. em3 & em4 both route to the same
subnet. The only difference between them is that em3 is not always up
(its associated with a floating IP address based on which server is
currently in the 'master' role). Basically all traffic should be
going out through em3 unless its destined for something else on the
local 10.0.0.1/8 subnet, in which case it should go out over em1.
However, that's not what is happening. 10.31.96.1/16, 10.31.97.1/16,
and 10.31.99.1/16 traffic is going through em3, but stuff destined for
10.31.45.1/16 is trying to go through em1, and timing out because
there's no way to route that traffic effectively.
This is also illustrated with the following command:
# tcptraceroute cuda-linux
traceroute to cuda-linux (10.31.45.106), 30 hops max, 60 byte packets
1 cuda-fs1a-internal (10.0.0.100) 3006.650 ms !H 3006.624 ms !H
3006.619 ms !H
Yet when run from a system on the same network as the box above, with
only a single network interface, it works:
# tcptraceroute cuda-linux
traceroute to cuda-linux (10.31.45.106), 30 hops max, 40 byte packets
1 10.31.96.2 (10.31.96.2) 0.345 ms 0.403 ms 0.474 ms
2 cuda-linux (10.31.45.106) 0.209 ms 0.208 ms 0.201 ms
I thought that I could fix this by adding a route to 10.31.45.1 for
em3, but that fails:
#########
# route add default gw 10.31.45.1 em3
SIOCADDRT: Network is unreachable
#########
I'm lost at this point on what else to try. help?
thanks.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama at gmail.com
LlamaLand https://netllama.linux-sxs.org
More information about the Linux-users
mailing list