routing table madness

Mon Nov 5 14:25:54 PST 2012

For any of you who have strong routing table fu, I'd appreciate some
assistance.  I've got a bunch of servers with multiple(3) NICs and
associated network interfaces.  I'm tripping over a bizarre routing
problem, where traffic that should use the default route is not, and
failing as a result.  Here's what my routing table looks like:
#########
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.31.96.1      0.0.0.0         UG    0      0        0 em3
10.0.0.0        0.0.0.0         255.0.0.0       U     0      0        0 em1
10.31.96.0      0.0.0.0         255.255.252.0   U     0      0        0 em3
10.31.96.0      0.0.0.0         255.255.252.0   U     0      0        0 em4
#########

10.31.96.1 is my default route that all traffic should be using (that
em# stuff is a Fedora thing, you can safely mentally substitute 'eth'
everywhere that you see 'em' if it makes it easier to follow). Here's
ifconfig output:
#########
em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.100  netmask 255.0.0.0  broadcast 10.255.255.255
        inet6 fe80::b6b5:2fff:fe5b:9e7c  prefixlen 64  scopeid 0x20<link>
        ether b4:b5:2f:5b:9e:7c  txqueuelen 1000  (Ethernet)
        RX packets 283922868  bytes 44297545348 (41.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 538064680  bytes 108980632740 (101.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfeb60000-feb80000

em3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.31.97.100  netmask 255.255.252.0  broadcast 10.31.99.255
        inet6 fe80::b6b5:2fff:fe5b:9e7e  prefixlen 64  scopeid 0x20<link>
        ether b4:b5:2f:5b:9e:7e  txqueuelen 1000  (Ethernet)
        RX packets 3733210  bytes 1042607750 (994.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1401537  bytes 114335537 (109.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfea60000-fea80000

em4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.31.96.61  netmask 255.255.252.0  broadcast 10.31.99.255
        inet6 fe80::b6b5:2fff:fe5b:9e7f  prefixlen 64  scopeid 0x20<link>
        ether b4:b5:2f:5b:9e:7f  txqueuelen 1000  (Ethernet)
        RX packets 2416588  bytes 196633917 (187.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 205038  bytes 19363499 (18.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfeae0000-feb00000
#########

em1/10.0.0.100 goes to a switch that is attached *only* to servers in
the same rack.  Its used only for the servers in that rack to
communicate amongst themselves.  em3 & em4 both route to the same
subnet.  The only difference between them is that em3 is not always up
(its associated with a floating IP address based on which server is
currently in the 'master' role).  Basically all traffic should be
going out through em3 unless its destined for something else on the
local 10.0.0.1/8 subnet, in which case it should go out over em1.
However, that's not what is happening.  10.31.96.1/16, 10.31.97.1/16,
and 10.31.99.1/16 traffic is going through em3, but stuff destined for
10.31.45.1/16 is trying to go through em1, and timing out because
there's no way to route that traffic effectively.

This is also illustrated with the following command:
# tcptraceroute cuda-linux
traceroute to cuda-linux (10.31.45.106), 30 hops max, 60 byte packets
 1  cuda-fs1a-internal (10.0.0.100)  3006.650 ms !H  3006.624 ms !H
3006.619 ms !H

Yet when run from a system on the same network as the box above, with
only a single network interface, it works:
# tcptraceroute cuda-linux
traceroute to cuda-linux (10.31.45.106), 30 hops max, 40 byte packets
 1  10.31.96.2 (10.31.96.2)  0.345 ms  0.403 ms  0.474 ms
 2  cuda-linux (10.31.45.106)  0.209 ms  0.208 ms  0.201 ms

I thought that I could fix this by adding a route to 10.31.45.1 for
em3, but that fails:
#########
# route add default gw 10.31.45.1 em3
SIOCADDRT: Network is unreachable
#########

I'm lost at this point on what else to try.  help?

thanks.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama at gmail.com
LlamaLand                       https://netllama.linux-sxs.org