[Click] kernel crashes with linux click module on 2.4 and 2.6 and freebsd

Eddie Kohler kohler at cs.ucla.edu
Thu May 12 12:09:12 EDT 2005


Hi,

I want to fix this!@@@$(*&^!@^&(*$!

But the first thing that would help would be a 'ksymoops'.  Use the 
'-m' option on click-install to get a symbol dump.  Then 'ksymoops' has 
a -m option of its own that can make use of this dump.  If the ksymoops 
doesn't help me debug I'll get an account from you.

Eddie


On May 12, 2005, at 3:11 AM, Adam GREENHALGH wrote:

> Eddie and co,
>
> We (those cc'd in the email) are currently experience problems with
> the linux kernel module in that it core dumps when we attempt to
> forward packets through it. We are able to consistently reproduce this
> with a single ping packet. These problems are observable on both the
> 2.4 and 2.6 versions of the kernel patch, but we only refer to the 2.4
> version in this email.
>
> Our inital setup is as follows:
>
> H1 ----------------------------- C ------------------------- H2
> 192.168.3.101       192.168.3.100  192.168.4.100          192.168.4.101
>
> All ips have a netmask of 255.255.255.0. H1 and H2 are running from
> gentoo's livecd and have static routes to each other through C and C
> (the click router) is currently running linux vanilla kernel 2.4.26
> patched for click (thought we've tried this same setup with freeBSD
> 4.11 patched for click as well). The Click router currently has two
> Intel pro100s (exact model is detailed below) and no other NICs, and
> we've disabled native routing (though we have run several tests with
> this turned on as well). We're using the CVS version of the click
> software and we received no errors during its installation (we had
> tried previously to use an "older" version from January to see if that
> would help)
>
> The test we tend to use is having H1 ping H2 and vice versa (though we
> have tried to ssh from one to the other as well). We derived the router
> configuration from the perl script /conf/make-ip-conf.pl provided with
> the click sources. Running the configuration in user level (discarding
> any packets to ToHost) presents no problems. Running it in kernel mode
> (click-install) with polling seems to work at first; however, if we
> uninstall and reinstall the same configuration linux kernel dumps
> (though it doesn't reset) and the router forwards no packets (could 
> this
> be a problem with click-uninstall?). Rebooting the machine and
> installing the same configuration seems to fix this problem.
>
> click: starting router thread pid 1224 (d94a55a0)
> Unable to handle kernel paging request at virtual address 7272617f
> printing eip:
> da8910b9
> *pde = 00000000
> Oops: 0002
> CPU:    0
> EIP:    0010:[<da8910b9>]    Not tainted
> EFLAGS: 00010202
> eax: 7272617f   ebx: d8e3a800   ecx: d8bc9a60   edx: 00000024
> esi: d8bc9ea4   edi: d945c220   ebp: d8bcbdec   esp: d8bcbdb4
> ds: 0018   es: 0018   ss: 0018
> Process kclick (pid: 1224, stackpage=d8bcb000)
> Stack: 00000028 d945c680 00000020 00000202 d8bcbe0c d8e3a83c c147f9e8
> 00000024
>       6405a8c0 d8e3a800 6405a8c0 d945c220 d8bc9be0 d945c220 d8bcbe0c
> da8a2b6a
>       d8e3a800 d945c220 d945c220 d945c2c0 00000000 d945c220 d8bcbe2c
> da86695e
> Call Trace:    [<da8a2b6a>] [<da86695e>] [<da86695e>] [<da86695e>]
> [<da86695e>]
>  [<da86695e>] [<da8adca3>] [<da86695e>] [<da86695e>] [<da86695e>]
> [<da8c45b0>]
>  [<da8e2f67>] [<da927080>] [<da8bc07b>] [<da87b723>] [<da8e6fa1>]
> [<da8df347>]
>  [<c010582e>] [<da8df2c0>]
>
> Code: 89 08 8b 45 ec 89 88 44 04 00 00 c7 41 20 00 00 00 00 f0 ff
>
> We've had even worse problems trying this out in FreeBSD, as well as
> when using cards other than Intel (mostly 3com/Netgear). Further, we
> have tried using two entirely different machines as the click routers.
> The current setup described above seems to be the most stable.
>
> Let me know if sending the actual configuration file would help.
>
> We decided to extend the model testing further and used the following 
> Network setup.
>
> H1 ------ R1 ------- R2 ----- H4
>
> H1 (host used to ping):
> IP address: 192.168.3.102/24
> OS: Linux 2.4.26
> Proc: Pentium II
> NIC: eth0 ADMtek Comet rev 17
>
> R1 (click router):
> IP addresses: 192.168.3.100/24, 192.168.5.100/24
> OS: Linux (vanilla kernel) 2.4.26 (with click patch)
> Click version: 1.5pre (from CVS repository)
> Proc: Pentium II
> NICs: Intel Corporation 82557/8/9 rev 0c (Ethernet Pro 100) x 2
>
> R2 (click router):
> IP addresses: 192.168.5.101/24, 192.168.4.100/24
> OS: Linux (vanilla kernel) 2.4.26 (with click patch)
> Click version: 1.5pre (from CVS repository)
> Proc: Pentium III
> NICs: Intel Corporation 82557/8/9 rev 0c (Ethernet Pro 100) x 2
>
> H4 (host used to ping):
> IP address: 192.168.4.102/24
> OS: Linux 2.4.26
> Proc: AMD Athlon
> NIC: 3com
>
> H1 has a static route to H4 through R1 and H4 has a static route to H1
> through H2. The configuration for the routers is derived from
> conf/make-ip-conf.pl with the following settings:
>
> H1:
> ----------------------------------------------------------------
> my $ifs = [ [ "eth0", 1, "192.168.3.100", "255.255.255.0",
> "00:0E:0C:64:28:13 ],
>            [ "eth1", 1, "192.168.5.100", "255.255.255.0", 
> "00:0E:0C:64:28:C3"
> ], ];
>
> my $srts = [ [ "192.168.4.102", "255.255.255.255", "192.168.5.101",
> "eth1"],
>             [ "0.0.0.0", "0.0.0.0", "192.168.3.100", "eth0" ] ];
>
> my $local_host = "ToHost";
> my $handle_pings = 1;
> ----------------------------------------------------------------
> H2:
> ----------------------------------------------------------------
> my $ifs = [ [ "eth0", 1, "192.168.5.101", "255.255.255.0",
> "00:0E:0C:64:2E:0A ],
>            [ "eth1", 1, "192.168.4.100", "255.255.255.0", 
> "00:0E:0C:64:2E:09"
> ], ];
>
> my $srts = [ [ "192.168.3.102", "255.255.255.255", "192.168.5.100",
> "eth0"],
>             [ "0.0.0.0", "0.0.0.0", "192.168.5.101", "eth0" ] ];
>
> my $local_host = "ToHost";
> my $handle_pings = 1;
> ----------------------------------------------------------------
>
> The test we ran is to ping H1 from H4. This action sometimes causes R1
> to crash, sometimes R2, and sometimes both. We have run this same
> experiment in user-level by changing $local_host to "Print(toh) ->
> Discard" and $ifs's second parameter to 0 to disable polling in the
> configurations above (and of course using click instead of
> click-install) and it works without problems.
>
> We've also run tcpdump on H4 when trying to ping H1, and tcpdump sees 3
> packets: an ARP broadcast for R2's address, a reply from R2, and the
> ping request itself.
>
> Eddie, having chatted to mark handley, there is no problem with
> setting you up an account at on the testbed machines at ucl if that
> would help with debugging. Any pointers as to how to go about fixing
> this are greatfully received.
>
> Many thanks
>
> Adam
>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click



More information about the click mailing list