[Click] Click-git: Kernel crash w/ Queue element overflow?

Eddie Kohler kohler at cs.ucla.edu
Wed Feb 10 12:44:04 EST 2010


Nuutti,

Thanks very much for these dumps and this config.  Pretty informative.

Here are some debugging suggestions.

(0) This distinctly looks like memory corruption, possibly within ToDevice.  I 
will look at Queue itself, as well, but this seems like an unlikely source of 
problems, since your Click is not installed with --enable-multithread.

(1) Perhaps the problem is with EtherSwitch, whose internal hash table may be 
causing problems in SMP settings.  Can you try again, replacing the 
EtherSwitch element with a Hub element?  This will do the same job, but 
without a table.  My expectation is this will also fail.

(2) To narrow down the problem, we can try very simple ToDevice and Queue 
configs.  This would involve:

- ia32
- either patch or fixincludes
- SMP kernel
- The following configs:

InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> ToDevice(eth0);

-*- OR

InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> Queue
-> ToDevice(eth0);

-*- OR

InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> ToDevice(eth0);
InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> ToDevice(eth1);

-*- OR

InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> Queue
-> ToDevice(eth0);
InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> Queue
-> ToDevice(eth1);


------

These configs test ToDevice with and without Queues, and with and without 
accessing two devices.

We'll look in parallel, but I'm interested in what you see.

Eddie




Nuutti Varis wrote:
> Hey, 
> 
> While trying to run throughput measurements with Click in a kernel, running a simple EtherSwitch configuration (attached as etherswitch.click) in a topology of:
> 
> EndHostA::ethI0 <==> ethI0::EtherSwitch1::ethI1 <==> ethI1::EtherSwitch2::ethI0 <==> ethI0::EndHostB
> 192.168.2.1 ---------------------------------------------------------------------------> 192.168.2.2
> FastUDPSrc w/ 64B packet, 300kpp/s
> 
> I stumbled upon a kernel crash, seemingly when the Queue elements started dropping packets due to overflow. I tried this with two different kernel versions (2.6.31.12 and 2.6.24.7) and with either 2.6.24.7 manual patch, or with --enable-fixincludes. Interestingly, the kernel crash does not happen when I disable SMP from the kernel. Additionally, normal linux bridging does not crash the kernel on overflows. Partial/full crash dumps as attachments from various days of testing.
> 
> Configuration stuff of the EtherSwitch{1,2}:
> - Dumps arch indicated in the filename, either amd64 or ia32
> - MTU of ethI1 is 1540 (tried with 1500 as well, no difference)
> - Click is configured with --enable-linuxmodule --enable-userlevel --enable-etherswitch [--enable-fixincludes]
> - Kernel does not have any pre-empting enabled.
> - Both e1000e poll-patched and vanilla cause the problem
> - e1000e versions 0.4.1.7 and 1.0.2-k2 (comes with 2.6.31.12) cause the problem
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> 
> 
> --
> Nuutti Varis (nvaris at cc.hut.fi)
> PhD Student, Aalto University School of Science and Technology
> Department of Communications and Networking
> 
> 
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click


More information about the click mailing list