[Click] Core performance checkins

Eddie Kohler kohler at cs.ucla.edu
Sat Feb 12 22:05:21 EST 2011


Hooray!  I'm very glad it was something simple (in fact retarded).

Cliff is still welcome to teach a concurrency class any time :) :) :)

E


On 2/12/11 3:41 PM, Beyers Cronje wrote:
> Nice, been running for a day or so without any signs of the issue I was
> experiencing before, well done.
>
> Thanks Bobby and Eddie, much appreciated.
>
> On Fri, Feb 11, 2011 at 8:46 AM, Eddie Kohler <kohler at cs.ucla.edu
> <mailto:kohler at cs.ucla.edu>> wrote:
>
>     Bobby,
>
>     THANKS!!!  Totally right; thanks for the fix.  I found anohter bug as
>     well, and fixed it.
>
>     Awesome!!  Much appreciated.
>
>     Eddie
>
>
>
>     On 02/10/2011 11:40 AM, Bobby Longpocket wrote:
>
>         I think the problem is just that _any_pending is never getting set, so
>         RouterThread::active() returns false even if there are tasks on the
>         pending list.
>
>         I don't run click the normal way, so I can't easily reproduce the
>         issue, but try making the following change:
>
>         In RouterThread::active(), replace both occurrences of _any_pending
>         with _pending_head.
>
>         diff --git a/include/click/routerthread.hh b/include/click/routerthread.hh
>         index a405e1c..504706d 100644
>         --- a/include/click/routerthread.hh
>         +++ b/include/click/routerthread.hh
>         @@ -231,9 +231,9 @@ inline bool
>           RouterThread::active() const
>           {
>           #if HAVE_TASK_HEAP
>         -    return _task_heap.size() != 0 || _any_pending;
>         +    return _task_heap.size() != 0 || _pending_head;
>           #else
>         -    return ((const Task *)_next != this) || _any_pending;
>         +    return ((const Task *)_next != this) || _pending_head;
>           #endif
>           }
>
>
>         --- On Thu, 2/10/11, Eddie Kohler<kohler at cs.ucla.edu
>         <mailto:kohler at cs.ucla.edu>>  wrote:
>
>             From: Eddie Kohler<kohler at cs.ucla.edu <mailto:kohler at cs.ucla.edu>>
>             Subject: Re: [Click] Core performance checkins
>             To: "Beyers Cronje"<bcronje at gmail.com <mailto:bcronje at gmail.com>>
>             Cc: "Click"<click at pdos.csail.mit.edu
>             <mailto:click at pdos.csail.mit.edu>>
>             Date: Thursday, February 10, 2011, 6:15 AM
>             I'm very sorry about this
>             regression... Unfortunately due to travel it's going
>             to hard to look at this until the weekend.  I would
>             rather not yet revert
>             coreperformance.  Can you handle this situation?
>
>             Eddie
>
>
>             On 02/09/2011 05:22 AM, Beyers Cronje wrote:
>
>                 Hi Eddie,
>
>                 Some info that I'm sure will help debugging. I
>
>             configured click
>
>                 with  --enable-schedule-debugging=extra and also
>
>             enabled NOTIFIERQUEUE_DEBUG
>
>
>                 It seems Unqueue gets stuck in the pending list. See
>
>             the output below:
>
>
>                 read q.length
>                 1000
>
>                 read q.notifier_state
>                 notifier on
>                 task 0x19387b0 [uq :: Unqueue] scheduled
>
>                 read uq.scheduled
>                 true /* but pending */
>
>                 read uq.notifier
>                 empty.0/1:1*
>
>
>                 Unqueue stays in this scheduled but pending state for
>
>             an undetermined period
>
>                 of time, up to minutes some times.
>
>                 Any idea where I can start on fixing this bug?
>
>                 Beyers
>
>
>                 On Wed, Feb 9, 2011 at 2:52 AM, Beyers
>                 Cronje<bcronje at gmail.com <mailto:bcronje at gmail.com>
>                 <mailto:bcronje at gmail.com <mailto:bcronje at gmail.com>>>
>
>             wrote:
>
>
>                       Update - Strange, after typing
>
>             my previous email I checked again, and all
>
>                       of a sudden Unqueue was
>
>             pulling packets again. Not sure if me breaking
>
>                       into Click with gdb
>
>             kick-started, it again :) or if it's a intermittent issue.
>
>
>
>                       On Wed, Feb 9, 2011 at 2:41
>
>             AM, Beyers Cronje<bcronje at gmail.com <mailto:bcronje at gmail.com>
>
>                 <mailto:bcronje at gmail.com <mailto:bcronje at gmail.com>>>
>
>             wrote:
>
>
>                           Hi Eddie,
>
>                           Since running
>
>             this merge I've been experiencing issues with usermode
>
>                           multithreading.
>
>             I'm using commit 9419098acbdc20837e37f3033c40661809431f8d
>
>                           I do believe the
>
>             issues are related to the changes of this merged, as
>
>                           I was running
>
>             the same config on pre-coreperformance merge code
>
>                           without any
>
>             issues.
>
>
>                           *Simplified
>
>             Config used:*
>
>
>
>                 fd::FromDAG(/dev/dag0) ->
>             cl1::Classifier(12/0800) ->  MarkIPHeader(14)
>
>                           ->
>
>             ipc1::IPClassifier(udp port 1646);
>
>                           sfp::SFP();
>
>                 q::ThreadSafeQueue;
>
>                           log::Logger(sfp,
>
>             1800);
>
>
>                           ipc1 ->
>
>             RadAccounting ->  q;
>
>                           q ->
>
>             uq::Unqueue ->  aupcc::Counter ->  sfp;
>
>
>
>                 StaticThreadSched(fd 0, uq 1, log 2);
>
>
>                           *Problem
>
>             Description:*
>
>                           The 3 threads in
>
>             the config: FromDAG and Unqueue do what their names
>
>                           indicate, and
>
>             Logger has a timer that schedules its task once every
>
>                           30min.
>
>             Everything runs fine initially and then somewhere along the
>
>                           line Unqueue
>
>             stops pulling packets from the queue which leads to a
>
>                           constant queue
>
>             overflow. When this happens I can see FromDAG is still
>
>                           working as
>
>             q.drops increases constantly, and Logger also fires away
>
>                           every 30min.
>
>             Note push rate from FromDAG is quite high, but
>
>                           what
>
>             RadAccounting pushes into queue is very low ~ 100pps, which
>             means
>
>                           queue is most of
>
>             the time empty and Unqueue is not scheduled.
>
>
>                           Unfortunately I
>
>             didn't configure debug scheduling, which would've
>
>                           helped. I did
>
>             notice Unqueue.scheduled always returned true, even
>
>                           though it never
>
>             actually ran. Not sure if the following will help, gdb
>
>                           shows the
>
>             following:
>
>
>                           (gdb) info
>
>             threads
>
>                              3 Thread
>
>             0x7fa422559700 (LWP 27205)  0x000000392cedb0b3 in poll
>             ()
>
>                           from
>
>             /lib64/libc.so.6
>
>                              2 Thread
>
>             0x7fa421d58700 (LWP 27206)  0x000000392cedcee3 in
>             select
>
>                           () from
>
>             /lib64/libc.so.6
>
>                           * 1 Thread
>
>             0x7fa432cfb740 (LWP 27197)  FromDAG::process_packet
>
>                           (this=0x1afc570,
>
>             erf_record=0x7fa423258bf0, rlen=<value optimized
>
>                           out>) at
>
>             ../elements/local/fromdag.cc:193
>
>                           (gdb) thread 2
>                           [Switching to
>
>             thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0
>
>
>                 0x000000392cedb0b3 in poll () from
>             /lib64/libc.so.6
>
>                           (gdb) bt
>                           #0
>
>             0x000000392cedb0b3 in poll () from /lib64/libc.so.6
>
>                           #1
>
>             0x000000000058bc36 in Master::run_selects_poll
>             (this=0x1afacc0,
>
>
>                 thread=0x1afb8b0, more_tasks=false) at
>             ../lib/master.cc:862
>
>                           #2
>
>             0x000000000058c4ec in Master::run_selects (this=0x1afacc0,
>
>
>                 thread=0x1afb8b0) at
>             ../lib/master.cc:1050
>
>                           #3
>
>             0x000000000057ddf1 in run_os (this=0x1afb8b0) at
>
>
>                 ../lib/routerthread.cc:447
>
>                           #4
>
>             RouterThread::driver (this=0x1afb8b0) at
>             ../lib/routerthread.cc:568
>
>                           #5
>
>             0x0000000000556dc9 in thread_driver (user_data=<value
>             optimized
>
>                           out>) at
>
>             click.cc:414
>
>                           #6
>
>             0x000000392d206d5b in start_thread () from
>             /lib64/libpthread.so.0
>
>                           #7
>
>             0x000000392cee4aad in clone () from /lib64/libc.so.6
>
>                           (gdb) thread 3
>                           [Switching to
>
>             thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0
>
>
>                 0x000000392cedcee3 in select () from
>             /lib64/libc.so.6
>
>                           (gdb) bt
>                           #0
>
>             0x000000392cedcee3 in select () from /lib64/libc.so.6
>
>                           #1
>
>             0x000000000058c4bf in Master::run_selects (this=0x1afacc0,
>
>
>                 thread=0x1afb7e0) at
>             ../lib/master.cc:1015
>
>                           #2
>
>             0x000000000057ddf1 in run_os (this=0x1afb7e0) at
>
>
>                 ../lib/routerthread.cc:447
>
>                           #3
>
>             RouterThread::driver (this=0x1afb7e0) at
>             ../lib/routerthread.cc:568
>
>                           #4
>
>             0x0000000000556dc9 in thread_driver (user_data=<value
>             optimized
>
>                           out>) at
>
>             click.cc:414
>
>                           #5
>
>             0x000000392d206d5b in start_thread () from
>             /lib64/libpthread.so.0
>
>                           #6
>
>             0x000000392cee4aad in clone () from /lib64/libc.so.6
>
>                           (gdb) thread 1
>                           [Switching to
>
>             thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0
>
>
>                 0x000000392d20ebfd in nanosleep () from
>             /lib64/libpthread.so.0
>
>                           (gdb) bt
>                           #0
>
>             FromDAG::process_packet (this=0x1afc570,
>
>
>                 erf_record=0x7fa423258bf0, rlen=<value
>             optimized out>) at
>
>
>                 ../elements/local/fromdag.cc:193
>
>                           #1
>
>             0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at
>
>
>                 ../elements/local/fromdag.cc:150
>
>                           #2
>
>             0x000000000057dbe6 in fire (this=0x1afb710) at
>
>
>                 ../include/click/task.hh:612
>
>                           #3
>
>             run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410
>
>                           #4
>
>             RouterThread::driver (this=0x1afb710) at
>             ../lib/routerthread.cc:600
>
>                           #5
>
>             0x0000000000558513 in main (argc=<value optimized
>             out>,
>
>                           argv=<value
>
>             optimized out>) at click.cc:639
>
>
>                           One thing to
>
>             note, for various reasons I'm doing something very
>
>                           unclick-like
>
>             with FromDAG where I allow it to block up to 10ms. For my
>
>                           specific
>
>             requirements this is not a problem, but just in case it
>             might
>
>                           affect the way
>
>             the new task handling operates it's worth noting.
>
>
>                           Beyers
>
>                           On Sun, Feb 6,
>
>             2011 at 1:23 AM, Eddie Kohler<kohler at cs.ucla.edu
>             <mailto:kohler at cs.ucla.edu>
>
>                 <mailto:kohler at cs.ucla.edu <mailto:kohler at cs.ucla.edu>>>
>
>             wrote:
>
>
>                               Hi
>
>             all,
>
>
>
>                 This is just a note to say that I've
>             merged the "coreperformance"
>
>
>                 branch
>
>
>                 with master.  There are several
>             changes that may speed up particularly
>
>
>                 simple configurations, and that (more
>             importantly) may make it
>
>
>                 easier to
>
>
>                 experiment with different multithreading
>             setups.  (For instance I
>
>
>                 believe switching a task from one thread
>             to another is faster
>
>
>                 now.)  Let
>
>                               me
>
>             know if you experience any problems
>
>
>
>                 Eddie
>
>
>                 _______________________________________________
>
>
>                 click mailing list
>
>                 click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
>
>             <mailto:click at amsterdam.lcs.mit.edu
>             <mailto:click at amsterdam.lcs.mit.edu>>
>
>                 https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
>
>
>             _______________________________________________
>             click mailing list
>             click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
>             https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
>
>
>


More information about the click mailing list