[Click] Click-git: Kernel crash w/ Queue element overflow?

Nuutti Varis nvaris at cc.hut.fi
Wed Feb 24 08:30:26 EST 2010


Hey, 

Fyi, commit http://read.cs.ucla.edu/gitweb?p=click;a=commit;h=f287e014f89a85276bb39c29d96b08600e2d1a49 probably fixed the issue, at least on a SMP kernel, no --enable-multithread. The setup has been running fine for a better part of an hour without crashing, whereas before the crashes (with e1000e -NAPI) came in seconds.

On Feb 10, 2010, at 10:36 PM, Eddie Kohler wrote:

> Hi Nuutti,
> 
> There is a small chance this commit may fix your issue:
> 
> http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=01c8f4e084036338e83a6bff7a8e74dc49caa014
> 
> If it does not, I think we need more input from you to narrow it down...
> 
> Thanks so much,
> Eddie
> 
> 
> Eddie Kohler wrote:
>> Nuutti,
>> Thanks very much for these dumps and this config.  Pretty informative.
>> Here are some debugging suggestions.
>> (0) This distinctly looks like memory corruption, possibly within ToDevice.  I will look at Queue itself, as well, but this seems like an unlikely source of problems, since your Click is not installed with --enable-multithread.
>> (1) Perhaps the problem is with EtherSwitch, whose internal hash table may be causing problems in SMP settings.  Can you try again, replacing the EtherSwitch element with a Hub element?  This will do the same job, but without a table.  My expectation is this will also fail.
>> (2) To narrow down the problem, we can try very simple ToDevice and Queue configs.  This would involve:
>> - ia32
>> - either patch or fixincludes
>> - SMP kernel
>> - The following configs:
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> ToDevice(eth0);
>> -*- OR
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> Queue
>> -> ToDevice(eth0);
>> -*- OR
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> ToDevice(eth0);
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> ToDevice(eth1);
>> -*- OR
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> Queue
>> -> ToDevice(eth0);
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> Queue
>> -> ToDevice(eth1);
>> ------
>> These configs test ToDevice with and without Queues, and with and without accessing two devices.
>> We'll look in parallel, but I'm interested in what you see.
>> Eddie
>> Nuutti Varis wrote:
>>> Hey, 
>>> While trying to run throughput measurements with Click in a kernel, running a simple EtherSwitch configuration (attached as etherswitch.click) in a topology of:
>>> 
>>> EndHostA::ethI0 <==> ethI0::EtherSwitch1::ethI1 <==> ethI1::EtherSwitch2::ethI0 <==> ethI0::EndHostB
>>> 192.168.2.1 ---------------------------------------------------------------------------> 192.168.2.2
>>> FastUDPSrc w/ 64B packet, 300kpp/s
>>> 
>>> I stumbled upon a kernel crash, seemingly when the Queue elements started dropping packets due to overflow. I tried this with two different kernel versions (2.6.31.12 and 2.6.24.7) and with either 2.6.24.7 manual patch, or with --enable-fixincludes. Interestingly, the kernel crash does not happen when I disable SMP from the kernel. Additionally, normal linux bridging does not crash the kernel on overflows. Partial/full crash dumps as attachments from various days of testing.
>>> 
>>> Configuration stuff of the EtherSwitch{1,2}:
>>> - Dumps arch indicated in the filename, either amd64 or ia32
>>> - MTU of ethI1 is 1540 (tried with 1500 as well, no difference)
>>> - Click is configured with --enable-linuxmodule --enable-userlevel --enable-etherswitch [--enable-fixincludes]
>>> - Kernel does not have any pre-empting enabled.
>>> - Both e1000e poll-patched and vanilla cause the problem
>>> - e1000e versions 0.4.1.7 and 1.0.2-k2 (comes with 2.6.31.12) cause the problem
>>> 
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Nuutti Varis (nvaris at cc.hut.fi)
>>> PhD Student, Aalto University School of Science and Technology
>>> Department of Communications and Networking
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> _______________________________________________
>>> click mailing list
>>> click at amsterdam.lcs.mit.edu
>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>> _______________________________________________
>> click mailing list
>> click at amsterdam.lcs.mit.edu
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click

--
Nuutti Varis (nvaris at cc.hut.fi)
PhD Student, Aalto University School of Science and Technology
Department of Communications and Networking






More information about the click mailing list