[Click] kernel crashes with linux click module on 2.4 and 2.6 and freebsd

Adam GREENHALGH A.Greenhalgh at cs.ucl.ac.uk
Thu May 12 06:11:32 EDT 2005


Eddie and co,

We (those cc'd in the email) are currently experience problems with
the linux kernel module in that it core dumps when we attempt to
forward packets through it. We are able to consistently reproduce this
with a single ping packet. These problems are observable on both the
2.4 and 2.6 versions of the kernel patch, but we only refer to the 2.4
version in this email.

Our inital setup is as follows:

H1 ----------------------------- C ------------------------- H2
192.168.3.101       192.168.3.100  192.168.4.100          192.168.4.101

All ips have a netmask of 255.255.255.0. H1 and H2 are running from
gentoo's livecd and have static routes to each other through C and C
(the click router) is currently running linux vanilla kernel 2.4.26
patched for click (thought we've tried this same setup with freeBSD
4.11 patched for click as well). The Click router currently has two
Intel pro100s (exact model is detailed below) and no other NICs, and
we've disabled native routing (though we have run several tests with
this turned on as well). We're using the CVS version of the click
software and we received no errors during its installation (we had
tried previously to use an "older" version from January to see if that
would help)

The test we tend to use is having H1 ping H2 and vice versa (though we
have tried to ssh from one to the other as well). We derived the router
configuration from the perl script /conf/make-ip-conf.pl provided with
the click sources. Running the configuration in user level (discarding
any packets to ToHost) presents no problems. Running it in kernel mode
(click-install) with polling seems to work at first; however, if we
uninstall and reinstall the same configuration linux kernel dumps
(though it doesn't reset) and the router forwards no packets (could this
be a problem with click-uninstall?). Rebooting the machine and
installing the same configuration seems to fix this problem.

click: starting router thread pid 1224 (d94a55a0)
Unable to handle kernel paging request at virtual address 7272617f
printing eip:
da8910b9
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<da8910b9>]    Not tainted
EFLAGS: 00010202
eax: 7272617f   ebx: d8e3a800   ecx: d8bc9a60   edx: 00000024
esi: d8bc9ea4   edi: d945c220   ebp: d8bcbdec   esp: d8bcbdb4
ds: 0018   es: 0018   ss: 0018
Process kclick (pid: 1224, stackpage=d8bcb000)
Stack: 00000028 d945c680 00000020 00000202 d8bcbe0c d8e3a83c c147f9e8
00000024
      6405a8c0 d8e3a800 6405a8c0 d945c220 d8bc9be0 d945c220 d8bcbe0c
da8a2b6a
      d8e3a800 d945c220 d945c220 d945c2c0 00000000 d945c220 d8bcbe2c
da86695e
Call Trace:    [<da8a2b6a>] [<da86695e>] [<da86695e>] [<da86695e>]
[<da86695e>]
 [<da86695e>] [<da8adca3>] [<da86695e>] [<da86695e>] [<da86695e>]
[<da8c45b0>]
 [<da8e2f67>] [<da927080>] [<da8bc07b>] [<da87b723>] [<da8e6fa1>]
[<da8df347>]
 [<c010582e>] [<da8df2c0>]

Code: 89 08 8b 45 ec 89 88 44 04 00 00 c7 41 20 00 00 00 00 f0 ff

We've had even worse problems trying this out in FreeBSD, as well as
when using cards other than Intel (mostly 3com/Netgear). Further, we
have tried using two entirely different machines as the click routers.
The current setup described above seems to be the most stable. 

Let me know if sending the actual configuration file would help.

We decided to extend the model testing further and used the following Network setup.

H1 ------ R1 ------- R2 ----- H4

H1 (host used to ping):
IP address: 192.168.3.102/24
OS: Linux 2.4.26
Proc: Pentium II
NIC: eth0 ADMtek Comet rev 17

R1 (click router):
IP addresses: 192.168.3.100/24, 192.168.5.100/24
OS: Linux (vanilla kernel) 2.4.26 (with click patch)
Click version: 1.5pre (from CVS repository)
Proc: Pentium II
NICs: Intel Corporation 82557/8/9 rev 0c (Ethernet Pro 100) x 2

R2 (click router):
IP addresses: 192.168.5.101/24, 192.168.4.100/24
OS: Linux (vanilla kernel) 2.4.26 (with click patch)
Click version: 1.5pre (from CVS repository)
Proc: Pentium III
NICs: Intel Corporation 82557/8/9 rev 0c (Ethernet Pro 100) x 2

H4 (host used to ping):
IP address: 192.168.4.102/24
OS: Linux 2.4.26
Proc: AMD Athlon
NIC: 3com

H1 has a static route to H4 through R1 and H4 has a static route to H1
through H2. The configuration for the routers is derived from
conf/make-ip-conf.pl with the following settings:

H1:
----------------------------------------------------------------
my $ifs = [ [ "eth0", 1, "192.168.3.100", "255.255.255.0",
"00:0E:0C:64:28:13 ],
           [ "eth1", 1, "192.168.5.100", "255.255.255.0", "00:0E:0C:64:28:C3"
], ];

my $srts = [ [ "192.168.4.102", "255.255.255.255", "192.168.5.101",
"eth1"],
            [ "0.0.0.0", "0.0.0.0", "192.168.3.100", "eth0" ] ];

my $local_host = "ToHost";
my $handle_pings = 1;
----------------------------------------------------------------
H2:
----------------------------------------------------------------
my $ifs = [ [ "eth0", 1, "192.168.5.101", "255.255.255.0",
"00:0E:0C:64:2E:0A ],
           [ "eth1", 1, "192.168.4.100", "255.255.255.0", "00:0E:0C:64:2E:09"
], ];

my $srts = [ [ "192.168.3.102", "255.255.255.255", "192.168.5.100",
"eth0"],
            [ "0.0.0.0", "0.0.0.0", "192.168.5.101", "eth0" ] ];

my $local_host = "ToHost";
my $handle_pings = 1;
----------------------------------------------------------------

The test we ran is to ping H1 from H4. This action sometimes causes R1
to crash, sometimes R2, and sometimes both. We have run this same
experiment in user-level by changing $local_host to "Print(toh) ->
Discard" and $ifs's second parameter to 0 to disable polling in the
configurations above (and of course using click instead of
click-install) and it works without problems.

We've also run tcpdump on H4 when trying to ping H1, and tcpdump sees 3
packets: an ARP broadcast for R2's address, a reply from R2, and the
ping request itself.

Eddie, having chatted to mark handley, there is no problem with
setting you up an account at on the testbed machines at ucl if that
would help with debugging. Any pointers as to how to go about fixing
this are greatfully received.

Many thanks

Adam



More information about the click mailing list