[Click] Core performance checkins

Bobby Longpocket bobbylongpocket at yahoo.com
Thu Feb 10 14:40:32 EST 2011


I think the problem is just that _any_pending is never getting set, so RouterThread::active() returns false even if there are tasks on the pending list.

I don't run click the normal way, so I can't easily reproduce the issue, but try making the following change:

In RouterThread::active(), replace both occurrences of _any_pending with _pending_head.

diff --git a/include/click/routerthread.hh b/include/click/routerthread.hh
index a405e1c..504706d 100644
--- a/include/click/routerthread.hh
+++ b/include/click/routerthread.hh
@@ -231,9 +231,9 @@ inline bool
 RouterThread::active() const
 {
 #if HAVE_TASK_HEAP
-    return _task_heap.size() != 0 || _any_pending;
+    return _task_heap.size() != 0 || _pending_head;
 #else
-    return ((const Task *)_next != this) || _any_pending;
+    return ((const Task *)_next != this) || _pending_head;
 #endif
 }


--- On Thu, 2/10/11, Eddie Kohler <kohler at cs.ucla.edu> wrote:

> From: Eddie Kohler <kohler at cs.ucla.edu>
> Subject: Re: [Click] Core performance checkins
> To: "Beyers Cronje" <bcronje at gmail.com>
> Cc: "Click" <click at pdos.csail.mit.edu>
> Date: Thursday, February 10, 2011, 6:15 AM
> I'm very sorry about this
> regression... Unfortunately due to travel it's going 
> to hard to look at this until the weekend.  I would
> rather not yet revert 
> coreperformance.  Can you handle this situation?
> 
> Eddie
> 
> 
> On 02/09/2011 05:22 AM, Beyers Cronje wrote:
> > Hi Eddie,
> >
> > Some info that I'm sure will help debugging. I
> configured click
> > with  --enable-schedule-debugging=extra and also
> enabled NOTIFIERQUEUE_DEBUG
> >
> > It seems Unqueue gets stuck in the pending list. See
> the output below:
> >
> > read q.length
> > 1000
> >
> > read q.notifier_state
> > notifier on
> > task 0x19387b0 [uq :: Unqueue] scheduled
> >
> > read uq.scheduled
> > true /* but pending */
> >
> > read uq.notifier
> > empty.0/1:1*
> >
> >
> > Unqueue stays in this scheduled but pending state for
> an undetermined period
> > of time, up to minutes some times.
> >
> > Any idea where I can start on fixing this bug?
> >
> > Beyers
> >
> >
> > On Wed, Feb 9, 2011 at 2:52 AM, Beyers Cronje <bcronje at gmail.com
> > <mailto:bcronje at gmail.com>>
> wrote:
> >
> >     Update - Strange, after typing
> my previous email I checked again, and all
> >     of a sudden Unqueue was
> pulling packets again. Not sure if me breaking
> >     into Click with gdb
> kick-started, it again :) or if it's a intermittent issue.
> >
> >
> >     On Wed, Feb 9, 2011 at 2:41
> AM, Beyers Cronje <bcronje at gmail.com
> >     <mailto:bcronje at gmail.com>>
> wrote:
> >
> >         Hi Eddie,
> >
> >         Since running
> this merge I've been experiencing issues with usermode
> >         multithreading.
> I'm using commit 9419098acbdc20837e37f3033c40661809431f8d
> >         I do believe the
> issues are related to the changes of this merged, as
> >         I was running
> the same config on pre-coreperformance merge code
> >         without any
> issues.
> >
> >         *Simplified
> Config used:*
> >
> >     
>    fd::FromDAG(/dev/dag0) ->
> cl1::Classifier(12/0800) -> MarkIPHeader(14)
> >         ->
> ipc1::IPClassifier(udp port 1646);
> >         sfp::SFP();
> >     
>    q::ThreadSafeQueue;
> >         log::Logger(sfp,
> 1800);
> >
> >         ipc1 ->
> RadAccounting -> q;
> >         q ->
> uq::Unqueue -> aupcc::Counter -> sfp;
> >
> >     
>    StaticThreadSched(fd 0, uq 1, log 2);
> >
> >         *Problem
> Description:*
> >         The 3 threads in
> the config: FromDAG and Unqueue do what their names
> >         indicate, and
> Logger has a timer that schedules its task once every
> >         30min.
> Everything runs fine initially and then somewhere along the
> >         line Unqueue
> stops pulling packets from the queue which leads to a
> >         constant queue
> overflow. When this happens I can see FromDAG is still
> >         working as
> q.drops increases constantly, and Logger also fires away
> >         every 30min.
> Note push rate from FromDAG is quite high, but
> >         what
> RadAccounting pushes into queue is very low ~ 100pps, which
> means
> >         queue is most of
> the time empty and Unqueue is not scheduled.
> >
> >         Unfortunately I
> didn't configure debug scheduling, which would've
> >         helped. I did
> notice Unqueue.scheduled always returned true, even
> >         though it never
> actually ran. Not sure if the following will help, gdb
> >         shows the
> following:
> >
> >         (gdb) info
> threads
> >            3 Thread
> 0x7fa422559700 (LWP 27205)  0x000000392cedb0b3 in poll
> ()
> >         from
> /lib64/libc.so.6
> >            2 Thread
> 0x7fa421d58700 (LWP 27206)  0x000000392cedcee3 in
> select
> >         () from
> /lib64/libc.so.6
> >         * 1 Thread
> 0x7fa432cfb740 (LWP 27197)  FromDAG::process_packet
> >         (this=0x1afc570,
> erf_record=0x7fa423258bf0, rlen=<value optimized
> >         out>) at
> ../elements/local/fromdag.cc:193
> >         (gdb) thread 2
> >         [Switching to
> thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0
> >       
>    0x000000392cedb0b3 in poll () from
> /lib64/libc.so.6
> >         (gdb) bt
> >         #0 
> 0x000000392cedb0b3 in poll () from /lib64/libc.so.6
> >         #1 
> 0x000000000058bc36 in Master::run_selects_poll
> (this=0x1afacc0,
> >     
>    thread=0x1afb8b0, more_tasks=false) at
> ../lib/master.cc:862
> >         #2 
> 0x000000000058c4ec in Master::run_selects (this=0x1afacc0,
> >     
>    thread=0x1afb8b0) at
> ../lib/master.cc:1050
> >         #3 
> 0x000000000057ddf1 in run_os (this=0x1afb8b0) at
> >     
>    ../lib/routerthread.cc:447
> >         #4 
> RouterThread::driver (this=0x1afb8b0) at
> ../lib/routerthread.cc:568
> >         #5 
> 0x0000000000556dc9 in thread_driver (user_data=<value
> optimized
> >         out>) at
> click.cc:414
> >         #6 
> 0x000000392d206d5b in start_thread () from
> /lib64/libpthread.so.0
> >         #7 
> 0x000000392cee4aad in clone () from /lib64/libc.so.6
> >         (gdb) thread 3
> >         [Switching to
> thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0
> >       
>    0x000000392cedcee3 in select () from
> /lib64/libc.so.6
> >         (gdb) bt
> >         #0 
> 0x000000392cedcee3 in select () from /lib64/libc.so.6
> >         #1 
> 0x000000000058c4bf in Master::run_selects (this=0x1afacc0,
> >     
>    thread=0x1afb7e0) at
> ../lib/master.cc:1015
> >         #2 
> 0x000000000057ddf1 in run_os (this=0x1afb7e0) at
> >     
>    ../lib/routerthread.cc:447
> >         #3 
> RouterThread::driver (this=0x1afb7e0) at
> ../lib/routerthread.cc:568
> >         #4 
> 0x0000000000556dc9 in thread_driver (user_data=<value
> optimized
> >         out>) at
> click.cc:414
> >         #5 
> 0x000000392d206d5b in start_thread () from
> /lib64/libpthread.so.0
> >         #6 
> 0x000000392cee4aad in clone () from /lib64/libc.so.6
> >         (gdb) thread 1
> >         [Switching to
> thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0
> >       
>    0x000000392d20ebfd in nanosleep () from
> /lib64/libpthread.so.0
> >         (gdb) bt
> >         #0 
> FromDAG::process_packet (this=0x1afc570,
> >     
>    erf_record=0x7fa423258bf0, rlen=<value
> optimized out>) at
> >     
>    ../elements/local/fromdag.cc:193
> >         #1 
> 0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at
> >     
>    ../elements/local/fromdag.cc:150
> >         #2 
> 0x000000000057dbe6 in fire (this=0x1afb710) at
> >     
>    ../include/click/task.hh:612
> >         #3 
> run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410
> >         #4 
> RouterThread::driver (this=0x1afb710) at
> ../lib/routerthread.cc:600
> >         #5 
> 0x0000000000558513 in main (argc=<value optimized
> out>,
> >         argv=<value
> optimized out>) at click.cc:639
> >
> >         One thing to
> note, for various reasons I'm doing something very
> >         unclick-like
> with FromDAG where I allow it to block up to 10ms. For my
> >         specific
> requirements this is not a problem, but just in case it
> might
> >         affect the way
> the new task handling operates it's worth noting.
> >
> >         Beyers
> >
> >         On Sun, Feb 6,
> 2011 at 1:23 AM, Eddie Kohler <kohler at cs.ucla.edu
> >         <mailto:kohler at cs.ucla.edu>>
> wrote:
> >
> >             Hi
> all,
> >
> >         
>    This is just a note to say that I've
> merged the "coreperformance"
> >         
>    branch
> >         
>    with master.  There are several
> changes that may speed up particularly
> >         
>    simple configurations, and that (more
> importantly) may make it
> >         
>    easier to
> >         
>    experiment with different multithreading
> setups.  (For instance I
> >         
>    believe switching a task from one thread
> to another is faster
> >         
>    now.)  Let
> >             me
> know if you experience any problems
> >
> >         
>    Eddie
> >         
>    _______________________________________________
> >         
>    click mailing list
> >             click at amsterdam.lcs.mit.edu
> <mailto:click at amsterdam.lcs.mit.edu>
> >             https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >
> >
> >
> >
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 


      



More information about the click mailing list