[Click] Click-git: Kernel crash w/ Queue element overflow?

Eddie Kohler kohler at cs.ucla.edu
Wed Feb 24 11:24:12 EST 2010


Wow!  OK, excellent; glad that commit got in.  So are you still seeing crashes 
at all, then?  What's the current status?

Eddie


Nuutti Varis wrote:
> Hey, 
> 
> Fyi, commit http://read.cs.ucla.edu/gitweb?p=click;a=commit;h=f287e014f89a85276bb39c29d96b08600e2d1a49 probably fixed the issue, at least on a SMP kernel, no --enable-multithread. The setup has been running fine for a better part of an hour without crashing, whereas before the crashes (with e1000e -NAPI) came in seconds.
> 
> On Feb 10, 2010, at 10:36 PM, Eddie Kohler wrote:
> 
>> Hi Nuutti,
>>
>> There is a small chance this commit may fix your issue:
>>
>> http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=01c8f4e084036338e83a6bff7a8e74dc49caa014
>>
>> If it does not, I think we need more input from you to narrow it down...
>>
>> Thanks so much,
>> Eddie
>>
>>
>> Eddie Kohler wrote:
>>> Nuutti,
>>> Thanks very much for these dumps and this config.  Pretty informative.
>>> Here are some debugging suggestions.
>>> (0) This distinctly looks like memory corruption, possibly within ToDevice.  I will look at Queue itself, as well, but this seems like an unlikely source of problems, since your Click is not installed with --enable-multithread.
>>> (1) Perhaps the problem is with EtherSwitch, whose internal hash table may be causing problems in SMP settings.  Can you try again, replacing the EtherSwitch element with a Hub element?  This will do the same job, but without a table.  My expectation is this will also fail.
>>> (2) To narrow down the problem, we can try very simple ToDevice and Queue configs.  This would involve:
>>> - ia32
>>> - either patch or fixincludes
>>> - SMP kernel
>>> - The following configs:
>>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>>> -> ToDevice(eth0);
>>> -*- OR
>>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>>> -> Queue
>>> -> ToDevice(eth0);
>>> -*- OR
>>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>>> -> ToDevice(eth0);
>>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>>> -> ToDevice(eth1);
>>> -*- OR
>>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>>> -> Queue
>>> -> ToDevice(eth0);
>>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>>> -> Queue
>>> -> ToDevice(eth1);
>>> ------
>>> These configs test ToDevice with and without Queues, and with and without accessing two devices.
>>> We'll look in parallel, but I'm interested in what you see.
>>> Eddie
>>> Nuutti Varis wrote:
>>>> Hey, 
>>>> While trying to run throughput measurements with Click in a kernel, running a simple EtherSwitch configuration (attached as etherswitch.click) in a topology of:
>>>>
>>>> EndHostA::ethI0 <==> ethI0::EtherSwitch1::ethI1 <==> ethI1::EtherSwitch2::ethI0 <==> ethI0::EndHostB
>>>> 192.168.2.1 ---------------------------------------------------------------------------> 192.168.2.2
>>>> FastUDPSrc w/ 64B packet, 300kpp/s
>>>>
>>>> I stumbled upon a kernel crash, seemingly when the Queue elements started dropping packets due to overflow. I tried this with two different kernel versions (2.6.31.12 and 2.6.24.7) and with either 2.6.24.7 manual patch, or with --enable-fixincludes. Interestingly, the kernel crash does not happen when I disable SMP from the kernel. Additionally, normal linux bridging does not crash the kernel on overflows. Partial/full crash dumps as attachments from various days of testing.
>>>>
>>>> Configuration stuff of the EtherSwitch{1,2}:
>>>> - Dumps arch indicated in the filename, either amd64 or ia32
>>>> - MTU of ethI1 is 1540 (tried with 1500 as well, no difference)
>>>> - Click is configured with --enable-linuxmodule --enable-userlevel --enable-etherswitch [--enable-fixincludes]
>>>> - Kernel does not have any pre-empting enabled.
>>>> - Both e1000e poll-patched and vanilla cause the problem
>>>> - e1000e versions 0.4.1.7 and 1.0.2-k2 (comes with 2.6.31.12) cause the problem
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nuutti Varis (nvaris at cc.hut.fi)
>>>> PhD Student, Aalto University School of Science and Technology
>>>> Department of Communications and Networking
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> click mailing list
>>>> click at amsterdam.lcs.mit.edu
>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>> _______________________________________________
>>> click mailing list
>>> click at amsterdam.lcs.mit.edu
>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 
> --
> Nuutti Varis (nvaris at cc.hut.fi)
> PhD Student, Aalto University School of Science and Technology
> Department of Communications and Networking
> 
> 
> 


More information about the click mailing list