[Click] Core performance checkins

Beyers Cronje bcronje at gmail.com
Sat Feb 12 18:41:25 EST 2011


Nice, been running for a day or so without any signs of the issue I was
experiencing before, well done.

Thanks Bobby and Eddie, much appreciated.

On Fri, Feb 11, 2011 at 8:46 AM, Eddie Kohler <kohler at cs.ucla.edu> wrote:

> Bobby,
>
> THANKS!!!  Totally right; thanks for the fix.  I found anohter bug as well,
> and fixed it.
>
> Awesome!!  Much appreciated.
>
> Eddie
>
>
>
> On 02/10/2011 11:40 AM, Bobby Longpocket wrote:
>
>> I think the problem is just that _any_pending is never getting set, so
>> RouterThread::active() returns false even if there are tasks on the pending
>> list.
>>
>> I don't run click the normal way, so I can't easily reproduce the issue,
>> but try making the following change:
>>
>> In RouterThread::active(), replace both occurrences of _any_pending with
>> _pending_head.
>>
>> diff --git a/include/click/routerthread.hh b/include/click/routerthread.hh
>> index a405e1c..504706d 100644
>> --- a/include/click/routerthread.hh
>> +++ b/include/click/routerthread.hh
>> @@ -231,9 +231,9 @@ inline bool
>>  RouterThread::active() const
>>  {
>>  #if HAVE_TASK_HEAP
>> -    return _task_heap.size() != 0 || _any_pending;
>> +    return _task_heap.size() != 0 || _pending_head;
>>  #else
>> -    return ((const Task *)_next != this) || _any_pending;
>> +    return ((const Task *)_next != this) || _pending_head;
>>  #endif
>>  }
>>
>>
>> --- On Thu, 2/10/11, Eddie Kohler<kohler at cs.ucla.edu>  wrote:
>>
>>  From: Eddie Kohler<kohler at cs.ucla.edu>
>>> Subject: Re: [Click] Core performance checkins
>>> To: "Beyers Cronje"<bcronje at gmail.com>
>>> Cc: "Click"<click at pdos.csail.mit.edu>
>>> Date: Thursday, February 10, 2011, 6:15 AM
>>> I'm very sorry about this
>>> regression... Unfortunately due to travel it's going
>>> to hard to look at this until the weekend.  I would
>>> rather not yet revert
>>> coreperformance.  Can you handle this situation?
>>>
>>> Eddie
>>>
>>>
>>> On 02/09/2011 05:22 AM, Beyers Cronje wrote:
>>>
>>>> Hi Eddie,
>>>>
>>>> Some info that I'm sure will help debugging. I
>>>>
>>> configured click
>>>
>>>> with  --enable-schedule-debugging=extra and also
>>>>
>>> enabled NOTIFIERQUEUE_DEBUG
>>>
>>>>
>>>> It seems Unqueue gets stuck in the pending list. See
>>>>
>>> the output below:
>>>
>>>>
>>>> read q.length
>>>> 1000
>>>>
>>>> read q.notifier_state
>>>> notifier on
>>>> task 0x19387b0 [uq :: Unqueue] scheduled
>>>>
>>>> read uq.scheduled
>>>> true /* but pending */
>>>>
>>>> read uq.notifier
>>>> empty.0/1:1*
>>>>
>>>>
>>>> Unqueue stays in this scheduled but pending state for
>>>>
>>> an undetermined period
>>>
>>>> of time, up to minutes some times.
>>>>
>>>> Any idea where I can start on fixing this bug?
>>>>
>>>> Beyers
>>>>
>>>>
>>>> On Wed, Feb 9, 2011 at 2:52 AM, Beyers Cronje<bcronje at gmail.com
>>>> <mailto:bcronje at gmail.com>>
>>>>
>>> wrote:
>>>
>>>>
>>>>      Update - Strange, after typing
>>>>
>>> my previous email I checked again, and all
>>>
>>>>      of a sudden Unqueue was
>>>>
>>> pulling packets again. Not sure if me breaking
>>>
>>>>      into Click with gdb
>>>>
>>> kick-started, it again :) or if it's a intermittent issue.
>>>
>>>>
>>>>
>>>>      On Wed, Feb 9, 2011 at 2:41
>>>>
>>> AM, Beyers Cronje<bcronje at gmail.com
>>>
>>>>      <mailto:bcronje at gmail.com>>
>>>>
>>> wrote:
>>>
>>>>
>>>>          Hi Eddie,
>>>>
>>>>          Since running
>>>>
>>> this merge I've been experiencing issues with usermode
>>>
>>>>          multithreading.
>>>>
>>> I'm using commit 9419098acbdc20837e37f3033c40661809431f8d
>>>
>>>>          I do believe the
>>>>
>>> issues are related to the changes of this merged, as
>>>
>>>>          I was running
>>>>
>>> the same config on pre-coreperformance merge code
>>>
>>>>          without any
>>>>
>>> issues.
>>>
>>>>
>>>>          *Simplified
>>>>
>>> Config used:*
>>>
>>>>
>>>>
>>>>     fd::FromDAG(/dev/dag0) ->
>>> cl1::Classifier(12/0800) ->  MarkIPHeader(14)
>>>
>>>>          ->
>>>>
>>> ipc1::IPClassifier(udp port 1646);
>>>
>>>>          sfp::SFP();
>>>>
>>>>     q::ThreadSafeQueue;
>>>
>>>>          log::Logger(sfp,
>>>>
>>> 1800);
>>>
>>>>
>>>>          ipc1 ->
>>>>
>>> RadAccounting ->  q;
>>>
>>>>          q ->
>>>>
>>> uq::Unqueue ->  aupcc::Counter ->  sfp;
>>>
>>>>
>>>>
>>>>     StaticThreadSched(fd 0, uq 1, log 2);
>>>
>>>>
>>>>          *Problem
>>>>
>>> Description:*
>>>
>>>>          The 3 threads in
>>>>
>>> the config: FromDAG and Unqueue do what their names
>>>
>>>>          indicate, and
>>>>
>>> Logger has a timer that schedules its task once every
>>>
>>>>          30min.
>>>>
>>> Everything runs fine initially and then somewhere along the
>>>
>>>>          line Unqueue
>>>>
>>> stops pulling packets from the queue which leads to a
>>>
>>>>          constant queue
>>>>
>>> overflow. When this happens I can see FromDAG is still
>>>
>>>>          working as
>>>>
>>> q.drops increases constantly, and Logger also fires away
>>>
>>>>          every 30min.
>>>>
>>> Note push rate from FromDAG is quite high, but
>>>
>>>>          what
>>>>
>>> RadAccounting pushes into queue is very low ~ 100pps, which
>>> means
>>>
>>>>          queue is most of
>>>>
>>> the time empty and Unqueue is not scheduled.
>>>
>>>>
>>>>          Unfortunately I
>>>>
>>> didn't configure debug scheduling, which would've
>>>
>>>>          helped. I did
>>>>
>>> notice Unqueue.scheduled always returned true, even
>>>
>>>>          though it never
>>>>
>>> actually ran. Not sure if the following will help, gdb
>>>
>>>>          shows the
>>>>
>>> following:
>>>
>>>>
>>>>          (gdb) info
>>>>
>>> threads
>>>
>>>>             3 Thread
>>>>
>>> 0x7fa422559700 (LWP 27205)  0x000000392cedb0b3 in poll
>>> ()
>>>
>>>>          from
>>>>
>>> /lib64/libc.so.6
>>>
>>>>             2 Thread
>>>>
>>> 0x7fa421d58700 (LWP 27206)  0x000000392cedcee3 in
>>> select
>>>
>>>>          () from
>>>>
>>> /lib64/libc.so.6
>>>
>>>>          * 1 Thread
>>>>
>>> 0x7fa432cfb740 (LWP 27197)  FromDAG::process_packet
>>>
>>>>          (this=0x1afc570,
>>>>
>>> erf_record=0x7fa423258bf0, rlen=<value optimized
>>>
>>>>          out>) at
>>>>
>>> ../elements/local/fromdag.cc:193
>>>
>>>>          (gdb) thread 2
>>>>          [Switching to
>>>>
>>> thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0
>>>
>>>>
>>>>     0x000000392cedb0b3 in poll () from
>>> /lib64/libc.so.6
>>>
>>>>          (gdb) bt
>>>>          #0
>>>>
>>> 0x000000392cedb0b3 in poll () from /lib64/libc.so.6
>>>
>>>>          #1
>>>>
>>> 0x000000000058bc36 in Master::run_selects_poll
>>> (this=0x1afacc0,
>>>
>>>>
>>>>     thread=0x1afb8b0, more_tasks=false) at
>>> ../lib/master.cc:862
>>>
>>>>          #2
>>>>
>>> 0x000000000058c4ec in Master::run_selects (this=0x1afacc0,
>>>
>>>>
>>>>     thread=0x1afb8b0) at
>>> ../lib/master.cc:1050
>>>
>>>>          #3
>>>>
>>> 0x000000000057ddf1 in run_os (this=0x1afb8b0) at
>>>
>>>>
>>>>     ../lib/routerthread.cc:447
>>>
>>>>          #4
>>>>
>>> RouterThread::driver (this=0x1afb8b0) at
>>> ../lib/routerthread.cc:568
>>>
>>>>          #5
>>>>
>>> 0x0000000000556dc9 in thread_driver (user_data=<value
>>> optimized
>>>
>>>>          out>) at
>>>>
>>> click.cc:414
>>>
>>>>          #6
>>>>
>>> 0x000000392d206d5b in start_thread () from
>>> /lib64/libpthread.so.0
>>>
>>>>          #7
>>>>
>>> 0x000000392cee4aad in clone () from /lib64/libc.so.6
>>>
>>>>          (gdb) thread 3
>>>>          [Switching to
>>>>
>>> thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0
>>>
>>>>
>>>>     0x000000392cedcee3 in select () from
>>> /lib64/libc.so.6
>>>
>>>>          (gdb) bt
>>>>          #0
>>>>
>>> 0x000000392cedcee3 in select () from /lib64/libc.so.6
>>>
>>>>          #1
>>>>
>>> 0x000000000058c4bf in Master::run_selects (this=0x1afacc0,
>>>
>>>>
>>>>     thread=0x1afb7e0) at
>>> ../lib/master.cc:1015
>>>
>>>>          #2
>>>>
>>> 0x000000000057ddf1 in run_os (this=0x1afb7e0) at
>>>
>>>>
>>>>     ../lib/routerthread.cc:447
>>>
>>>>          #3
>>>>
>>> RouterThread::driver (this=0x1afb7e0) at
>>> ../lib/routerthread.cc:568
>>>
>>>>          #4
>>>>
>>> 0x0000000000556dc9 in thread_driver (user_data=<value
>>> optimized
>>>
>>>>          out>) at
>>>>
>>> click.cc:414
>>>
>>>>          #5
>>>>
>>> 0x000000392d206d5b in start_thread () from
>>> /lib64/libpthread.so.0
>>>
>>>>          #6
>>>>
>>> 0x000000392cee4aad in clone () from /lib64/libc.so.6
>>>
>>>>          (gdb) thread 1
>>>>          [Switching to
>>>>
>>> thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0
>>>
>>>>
>>>>     0x000000392d20ebfd in nanosleep () from
>>> /lib64/libpthread.so.0
>>>
>>>>          (gdb) bt
>>>>          #0
>>>>
>>> FromDAG::process_packet (this=0x1afc570,
>>>
>>>>
>>>>     erf_record=0x7fa423258bf0, rlen=<value
>>> optimized out>) at
>>>
>>>>
>>>>     ../elements/local/fromdag.cc:193
>>>
>>>>          #1
>>>>
>>> 0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at
>>>
>>>>
>>>>     ../elements/local/fromdag.cc:150
>>>
>>>>          #2
>>>>
>>> 0x000000000057dbe6 in fire (this=0x1afb710) at
>>>
>>>>
>>>>     ../include/click/task.hh:612
>>>
>>>>          #3
>>>>
>>> run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410
>>>
>>>>          #4
>>>>
>>> RouterThread::driver (this=0x1afb710) at
>>> ../lib/routerthread.cc:600
>>>
>>>>          #5
>>>>
>>> 0x0000000000558513 in main (argc=<value optimized
>>> out>,
>>>
>>>>          argv=<value
>>>>
>>> optimized out>) at click.cc:639
>>>
>>>>
>>>>          One thing to
>>>>
>>> note, for various reasons I'm doing something very
>>>
>>>>          unclick-like
>>>>
>>> with FromDAG where I allow it to block up to 10ms. For my
>>>
>>>>          specific
>>>>
>>> requirements this is not a problem, but just in case it
>>> might
>>>
>>>>          affect the way
>>>>
>>> the new task handling operates it's worth noting.
>>>
>>>>
>>>>          Beyers
>>>>
>>>>          On Sun, Feb 6,
>>>>
>>> 2011 at 1:23 AM, Eddie Kohler<kohler at cs.ucla.edu
>>>
>>>>          <mailto:kohler at cs.ucla.edu>>
>>>>
>>> wrote:
>>>
>>>>
>>>>              Hi
>>>>
>>> all,
>>>
>>>>
>>>>
>>>>     This is just a note to say that I've
>>> merged the "coreperformance"
>>>
>>>>
>>>>     branch
>>>
>>>>
>>>>     with master.  There are several
>>> changes that may speed up particularly
>>>
>>>>
>>>>     simple configurations, and that (more
>>> importantly) may make it
>>>
>>>>
>>>>     easier to
>>>
>>>>
>>>>     experiment with different multithreading
>>> setups.  (For instance I
>>>
>>>>
>>>>     believe switching a task from one thread
>>> to another is faster
>>>
>>>>
>>>>     now.)  Let
>>>
>>>>              me
>>>>
>>> know if you experience any problems
>>>
>>>>
>>>>
>>>>     Eddie
>>>
>>>>
>>>>     _______________________________________________
>>>
>>>>
>>>>     click mailing list
>>>
>>>>              click at amsterdam.lcs.mit.edu
>>>>
>>> <mailto:click at amsterdam.lcs.mit.edu>
>>>
>>>>              https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>
>>>>
>>>>
>>>>
>>>>  _______________________________________________
>>> click mailing list
>>> click at amsterdam.lcs.mit.edu
>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
>>>
>>
>>
>>


More information about the click mailing list