[Click] Core performance checkins

Eddie Kohler kohler at cs.ucla.edu
Fri Feb 11 01:46:43 EST 2011


Bobby,

THANKS!!!  Totally right; thanks for the fix.  I found anohter bug as well, 
and fixed it.

Awesome!!  Much appreciated.

Eddie


On 02/10/2011 11:40 AM, Bobby Longpocket wrote:
> I think the problem is just that _any_pending is never getting set, so RouterThread::active() returns false even if there are tasks on the pending list.
>
> I don't run click the normal way, so I can't easily reproduce the issue, but try making the following change:
>
> In RouterThread::active(), replace both occurrences of _any_pending with _pending_head.
>
> diff --git a/include/click/routerthread.hh b/include/click/routerthread.hh
> index a405e1c..504706d 100644
> --- a/include/click/routerthread.hh
> +++ b/include/click/routerthread.hh
> @@ -231,9 +231,9 @@ inline bool
>   RouterThread::active() const
>   {
>   #if HAVE_TASK_HEAP
> -    return _task_heap.size() != 0 || _any_pending;
> +    return _task_heap.size() != 0 || _pending_head;
>   #else
> -    return ((const Task *)_next != this) || _any_pending;
> +    return ((const Task *)_next != this) || _pending_head;
>   #endif
>   }
>
>
> --- On Thu, 2/10/11, Eddie Kohler<kohler at cs.ucla.edu>  wrote:
>
>> From: Eddie Kohler<kohler at cs.ucla.edu>
>> Subject: Re: [Click] Core performance checkins
>> To: "Beyers Cronje"<bcronje at gmail.com>
>> Cc: "Click"<click at pdos.csail.mit.edu>
>> Date: Thursday, February 10, 2011, 6:15 AM
>> I'm very sorry about this
>> regression... Unfortunately due to travel it's going
>> to hard to look at this until the weekend.  I would
>> rather not yet revert
>> coreperformance.  Can you handle this situation?
>>
>> Eddie
>>
>>
>> On 02/09/2011 05:22 AM, Beyers Cronje wrote:
>>> Hi Eddie,
>>>
>>> Some info that I'm sure will help debugging. I
>> configured click
>>> with  --enable-schedule-debugging=extra and also
>> enabled NOTIFIERQUEUE_DEBUG
>>>
>>> It seems Unqueue gets stuck in the pending list. See
>> the output below:
>>>
>>> read q.length
>>> 1000
>>>
>>> read q.notifier_state
>>> notifier on
>>> task 0x19387b0 [uq :: Unqueue] scheduled
>>>
>>> read uq.scheduled
>>> true /* but pending */
>>>
>>> read uq.notifier
>>> empty.0/1:1*
>>>
>>>
>>> Unqueue stays in this scheduled but pending state for
>> an undetermined period
>>> of time, up to minutes some times.
>>>
>>> Any idea where I can start on fixing this bug?
>>>
>>> Beyers
>>>
>>>
>>> On Wed, Feb 9, 2011 at 2:52 AM, Beyers Cronje<bcronje at gmail.com
>>> <mailto:bcronje at gmail.com>>
>> wrote:
>>>
>>>       Update - Strange, after typing
>> my previous email I checked again, and all
>>>       of a sudden Unqueue was
>> pulling packets again. Not sure if me breaking
>>>       into Click with gdb
>> kick-started, it again :) or if it's a intermittent issue.
>>>
>>>
>>>       On Wed, Feb 9, 2011 at 2:41
>> AM, Beyers Cronje<bcronje at gmail.com
>>>       <mailto:bcronje at gmail.com>>
>> wrote:
>>>
>>>           Hi Eddie,
>>>
>>>           Since running
>> this merge I've been experiencing issues with usermode
>>>           multithreading.
>> I'm using commit 9419098acbdc20837e37f3033c40661809431f8d
>>>           I do believe the
>> issues are related to the changes of this merged, as
>>>           I was running
>> the same config on pre-coreperformance merge code
>>>           without any
>> issues.
>>>
>>>           *Simplified
>> Config used:*
>>>
>>>
>>     fd::FromDAG(/dev/dag0) ->
>> cl1::Classifier(12/0800) ->  MarkIPHeader(14)
>>>           ->
>> ipc1::IPClassifier(udp port 1646);
>>>           sfp::SFP();
>>>
>>     q::ThreadSafeQueue;
>>>           log::Logger(sfp,
>> 1800);
>>>
>>>           ipc1 ->
>> RadAccounting ->  q;
>>>           q ->
>> uq::Unqueue ->  aupcc::Counter ->  sfp;
>>>
>>>
>>     StaticThreadSched(fd 0, uq 1, log 2);
>>>
>>>           *Problem
>> Description:*
>>>           The 3 threads in
>> the config: FromDAG and Unqueue do what their names
>>>           indicate, and
>> Logger has a timer that schedules its task once every
>>>           30min.
>> Everything runs fine initially and then somewhere along the
>>>           line Unqueue
>> stops pulling packets from the queue which leads to a
>>>           constant queue
>> overflow. When this happens I can see FromDAG is still
>>>           working as
>> q.drops increases constantly, and Logger also fires away
>>>           every 30min.
>> Note push rate from FromDAG is quite high, but
>>>           what
>> RadAccounting pushes into queue is very low ~ 100pps, which
>> means
>>>           queue is most of
>> the time empty and Unqueue is not scheduled.
>>>
>>>           Unfortunately I
>> didn't configure debug scheduling, which would've
>>>           helped. I did
>> notice Unqueue.scheduled always returned true, even
>>>           though it never
>> actually ran. Not sure if the following will help, gdb
>>>           shows the
>> following:
>>>
>>>           (gdb) info
>> threads
>>>              3 Thread
>> 0x7fa422559700 (LWP 27205)  0x000000392cedb0b3 in poll
>> ()
>>>           from
>> /lib64/libc.so.6
>>>              2 Thread
>> 0x7fa421d58700 (LWP 27206)  0x000000392cedcee3 in
>> select
>>>           () from
>> /lib64/libc.so.6
>>>           * 1 Thread
>> 0x7fa432cfb740 (LWP 27197)  FromDAG::process_packet
>>>           (this=0x1afc570,
>> erf_record=0x7fa423258bf0, rlen=<value optimized
>>>           out>) at
>> ../elements/local/fromdag.cc:193
>>>           (gdb) thread 2
>>>           [Switching to
>> thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0
>>>
>>     0x000000392cedb0b3 in poll () from
>> /lib64/libc.so.6
>>>           (gdb) bt
>>>           #0
>> 0x000000392cedb0b3 in poll () from /lib64/libc.so.6
>>>           #1
>> 0x000000000058bc36 in Master::run_selects_poll
>> (this=0x1afacc0,
>>>
>>     thread=0x1afb8b0, more_tasks=false) at
>> ../lib/master.cc:862
>>>           #2
>> 0x000000000058c4ec in Master::run_selects (this=0x1afacc0,
>>>
>>     thread=0x1afb8b0) at
>> ../lib/master.cc:1050
>>>           #3
>> 0x000000000057ddf1 in run_os (this=0x1afb8b0) at
>>>
>>     ../lib/routerthread.cc:447
>>>           #4
>> RouterThread::driver (this=0x1afb8b0) at
>> ../lib/routerthread.cc:568
>>>           #5
>> 0x0000000000556dc9 in thread_driver (user_data=<value
>> optimized
>>>           out>) at
>> click.cc:414
>>>           #6
>> 0x000000392d206d5b in start_thread () from
>> /lib64/libpthread.so.0
>>>           #7
>> 0x000000392cee4aad in clone () from /lib64/libc.so.6
>>>           (gdb) thread 3
>>>           [Switching to
>> thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0
>>>
>>     0x000000392cedcee3 in select () from
>> /lib64/libc.so.6
>>>           (gdb) bt
>>>           #0
>> 0x000000392cedcee3 in select () from /lib64/libc.so.6
>>>           #1
>> 0x000000000058c4bf in Master::run_selects (this=0x1afacc0,
>>>
>>     thread=0x1afb7e0) at
>> ../lib/master.cc:1015
>>>           #2
>> 0x000000000057ddf1 in run_os (this=0x1afb7e0) at
>>>
>>     ../lib/routerthread.cc:447
>>>           #3
>> RouterThread::driver (this=0x1afb7e0) at
>> ../lib/routerthread.cc:568
>>>           #4
>> 0x0000000000556dc9 in thread_driver (user_data=<value
>> optimized
>>>           out>) at
>> click.cc:414
>>>           #5
>> 0x000000392d206d5b in start_thread () from
>> /lib64/libpthread.so.0
>>>           #6
>> 0x000000392cee4aad in clone () from /lib64/libc.so.6
>>>           (gdb) thread 1
>>>           [Switching to
>> thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0
>>>
>>     0x000000392d20ebfd in nanosleep () from
>> /lib64/libpthread.so.0
>>>           (gdb) bt
>>>           #0
>> FromDAG::process_packet (this=0x1afc570,
>>>
>>     erf_record=0x7fa423258bf0, rlen=<value
>> optimized out>) at
>>>
>>     ../elements/local/fromdag.cc:193
>>>           #1
>> 0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at
>>>
>>     ../elements/local/fromdag.cc:150
>>>           #2
>> 0x000000000057dbe6 in fire (this=0x1afb710) at
>>>
>>     ../include/click/task.hh:612
>>>           #3
>> run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410
>>>           #4
>> RouterThread::driver (this=0x1afb710) at
>> ../lib/routerthread.cc:600
>>>           #5
>> 0x0000000000558513 in main (argc=<value optimized
>> out>,
>>>           argv=<value
>> optimized out>) at click.cc:639
>>>
>>>           One thing to
>> note, for various reasons I'm doing something very
>>>           unclick-like
>> with FromDAG where I allow it to block up to 10ms. For my
>>>           specific
>> requirements this is not a problem, but just in case it
>> might
>>>           affect the way
>> the new task handling operates it's worth noting.
>>>
>>>           Beyers
>>>
>>>           On Sun, Feb 6,
>> 2011 at 1:23 AM, Eddie Kohler<kohler at cs.ucla.edu
>>>           <mailto:kohler at cs.ucla.edu>>
>> wrote:
>>>
>>>               Hi
>> all,
>>>
>>>
>>     This is just a note to say that I've
>> merged the "coreperformance"
>>>
>>     branch
>>>
>>     with master.  There are several
>> changes that may speed up particularly
>>>
>>     simple configurations, and that (more
>> importantly) may make it
>>>
>>     easier to
>>>
>>     experiment with different multithreading
>> setups.  (For instance I
>>>
>>     believe switching a task from one thread
>> to another is faster
>>>
>>     now.)  Let
>>>               me
>> know if you experience any problems
>>>
>>>
>>     Eddie
>>>
>>     _______________________________________________
>>>
>>     click mailing list
>>>               click at amsterdam.lcs.mit.edu
>> <mailto:click at amsterdam.lcs.mit.edu>
>>>               https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
>>>
>>>
>>>
>> _______________________________________________
>> click mailing list
>> click at amsterdam.lcs.mit.edu
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>
>
>
>


More information about the click mailing list