routing table madness

Lonni J Friedman netllama at gmail.com
Mon Nov 5 18:08:56 PST 2012


On Mon, Nov 5, 2012 at 6:04 PM, David A. Bandel <david.bandel at gmail.com> wrote:
> On Mon, Nov 5, 2012 at 5:25 PM, Lonni J Friedman <netllama at gmail.com> wrote:
>> For any of you who have strong routing table fu, I'd appreciate some
>> assistance.  I've got a bunch of servers with multiple(3) NICs and
>> associated network interfaces.  I'm tripping over a bizarre routing
>> problem, where traffic that should use the default route is not, and
>> failing as a result.  Here's what my routing table looks like:
>> #########
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
>> 0.0.0.0         10.31.96.1      0.0.0.0         UG    0      0        0 em3
>> 10.0.0.0        0.0.0.0         255.0.0.0       U     0      0        0 em1
>> 10.31.96.0      0.0.0.0         255.255.252.0   U     0      0        0 em3
>> 10.31.96.0      0.0.0.0         255.255.252.0   U     0      0        0 em4
>> #########
>>
>> 10.31.96.1 is my default route that all traffic should be using (that
>> em# stuff is a Fedora thing, you can safely mentally substitute 'eth'
>> everywhere that you see 'em' if it makes it easier to follow). Here's
>
> first, stop using ifconfig and route, start using ip (iproute2
> package).  With it, you can do policy routing.
>
> ip addr
> ip ro
> ip ru
> ip li
>
> second, routes are evaluated from most to least specific, default will
> always be evaluated last.  Your problem is that your default gateway
> (10.31.96.1) is also part of the more specific routes to em3 and em4.
> Further, the first match here will also always win, so em4 will almost
> always be used.
>
> The only way to make the above mess work properly is to bridge em3 and
> em4 (or make em3 and em4 have distinct networks, i.e., a /24 each vice
> a /22 that both are part of).  Also, you cannot have any IP in the
> range 10.31.96.0/22 on em1.
>
> bridge fix:
> brctl addbr br0
> brctl addif br0 em3
> brctl addif br0 em4
> ip li set br0 up
> ip addr add 10.31.96.2/22 dev br0
>
>> ifconfig output:
>> #########
>> em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>>         inet 10.0.0.100  netmask 255.0.0.0  broadcast 10.255.255.255
>>         inet6 fe80::b6b5:2fff:fe5b:9e7c  prefixlen 64  scopeid 0x20<link>
>>         ether b4:b5:2f:5b:9e:7c  txqueuelen 1000  (Ethernet)
>>         RX packets 283922868  bytes 44297545348 (41.2 GiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 538064680  bytes 108980632740 (101.4 GiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>         device memory 0xfeb60000-feb80000
>>
>> em3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>>         inet 10.31.97.100  netmask 255.255.252.0  broadcast 10.31.99.255
>>         inet6 fe80::b6b5:2fff:fe5b:9e7e  prefixlen 64  scopeid 0x20<link>
>>         ether b4:b5:2f:5b:9e:7e  txqueuelen 1000  (Ethernet)
>>         RX packets 3733210  bytes 1042607750 (994.3 MiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 1401537  bytes 114335537 (109.0 MiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>         device memory 0xfea60000-fea80000
>>
>> em4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>>         inet 10.31.96.61  netmask 255.255.252.0  broadcast 10.31.99.255
>>         inet6 fe80::b6b5:2fff:fe5b:9e7f  prefixlen 64  scopeid 0x20<link>
>>         ether b4:b5:2f:5b:9e:7f  txqueuelen 1000  (Ethernet)
>>         RX packets 2416588  bytes 196633917 (187.5 MiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 205038  bytes 19363499 (18.4 MiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>         device memory 0xfeae0000-feb00000
>> #########
>>
>> em1/10.0.0.100 goes to a switch that is attached *only* to servers in
>> the same rack.  Its used only for the servers in that rack to
>> communicate amongst themselves.  em3 & em4 both route to the same
>> subnet.  The only difference between them is that em3 is not always up
>> (its associated with a floating IP address based on which server is
>> currently in the 'master' role).  Basically all traffic should be
>> going out through em3 unless its destined for something else on the
>> local 10.0.0.1/8 subnet, in which case it should go out over em1.
>> However, that's not what is happening.  10.31.96.1/16, 10.31.97.1/16,
>> and 10.31.99.1/16 traffic is going through em3, but stuff destined for
>> 10.31.45.1/16 is trying to go through em1, and timing out because
>> there's no way to route that traffic effectively.
>
> 10.31.45.1 _should_ be going out em1 according to your table.  If you
> want it going out a different interface, then your routing table is
> hosed.
>
> Obviously you' re confused, but the router is doing what it' s programmed to do.
>
>>
>> This is also illustrated with the following command:
>> # tcptraceroute cuda-linux
>> traceroute to cuda-linux (10.31.45.106), 30 hops max, 60 byte packets
>>  1  cuda-fs1a-internal (10.0.0.100)  3006.650 ms !H  3006.624 ms !H
>> 3006.619 ms !H
>>
>> Yet when run from a system on the same network as the box above, with
>> only a single network interface, it works:
>> # tcptraceroute cuda-linux
>> traceroute to cuda-linux (10.31.45.106), 30 hops max, 40 byte packets
>>  1  10.31.96.2 (10.31.96.2)  0.345 ms  0.403 ms  0.474 ms
>>  2  cuda-linux (10.31.45.106)  0.209 ms  0.208 ms  0.201 ms
>>
>> I thought that I could fix this by adding a route to 10.31.45.1 for
>> em3, but that fails:
>> #########
>> # route add default gw 10.31.45.1 em3
>> SIOCADDRT: Network is unreachable
>> #########
>>
>> I'm lost at this point on what else to try.  help?
>
> I could use a little more detail, but this is easily fixable.  Tell me
> what needs to route where and I'll send you a list of commands.

Thanks!  The *only* thing that should ever be routed over em1 is
traffic for 10.0.0.101.  Everything else should go out over em3.  Let
me know if you need any other specifics, and I'll provide them.


More information about the Linux-users mailing list