[Click] Core performance checkins

Eddie Kohler kohler at cs.ucla.edu
Thu Feb 10 09:15:21 EST 2011


I'm very sorry about this regression... Unfortunately due to travel it's going 
to hard to look at this until the weekend.  I would rather not yet revert 
coreperformance.  Can you handle this situation?

Eddie


On 02/09/2011 05:22 AM, Beyers Cronje wrote:
> Hi Eddie,
>
> Some info that I'm sure will help debugging. I configured click
> with  --enable-schedule-debugging=extra and also enabled NOTIFIERQUEUE_DEBUG
>
> It seems Unqueue gets stuck in the pending list. See the output below:
>
> read q.length
> 1000
>
> read q.notifier_state
> notifier on
> task 0x19387b0 [uq :: Unqueue] scheduled
>
> read uq.scheduled
> true /* but pending */
>
> read uq.notifier
> empty.0/1:1*
>
>
> Unqueue stays in this scheduled but pending state for an undetermined period
> of time, up to minutes some times.
>
> Any idea where I can start on fixing this bug?
>
> Beyers
>
>
> On Wed, Feb 9, 2011 at 2:52 AM, Beyers Cronje <bcronje at gmail.com
> <mailto:bcronje at gmail.com>> wrote:
>
>     Update - Strange, after typing my previous email I checked again, and all
>     of a sudden Unqueue was pulling packets again. Not sure if me breaking
>     into Click with gdb kick-started, it again :) or if it's a intermittent issue.
>
>
>     On Wed, Feb 9, 2011 at 2:41 AM, Beyers Cronje <bcronje at gmail.com
>     <mailto:bcronje at gmail.com>> wrote:
>
>         Hi Eddie,
>
>         Since running this merge I've been experiencing issues with usermode
>         multithreading. I'm using commit 9419098acbdc20837e37f3033c40661809431f8d
>         I do believe the issues are related to the changes of this merged, as
>         I was running the same config on pre-coreperformance merge code
>         without any issues.
>
>         *Simplified Config used:*
>
>         fd::FromDAG(/dev/dag0) -> cl1::Classifier(12/0800) -> MarkIPHeader(14)
>         -> ipc1::IPClassifier(udp port 1646);
>         sfp::SFP();
>         q::ThreadSafeQueue;
>         log::Logger(sfp, 1800);
>
>         ipc1 -> RadAccounting -> q;
>         q -> uq::Unqueue -> aupcc::Counter -> sfp;
>
>         StaticThreadSched(fd 0, uq 1, log 2);
>
>         *Problem Description:*
>         The 3 threads in the config: FromDAG and Unqueue do what their names
>         indicate, and Logger has a timer that schedules its task once every
>         30min. Everything runs fine initially and then somewhere along the
>         line Unqueue stops pulling packets from the queue which leads to a
>         constant queue overflow. When this happens I can see FromDAG is still
>         working as q.drops increases constantly, and Logger also fires away
>         every 30min. Note push rate from FromDAG is quite high, but
>         what RadAccounting pushes into queue is very low ~ 100pps, which means
>         queue is most of the time empty and Unqueue is not scheduled.
>
>         Unfortunately I didn't configure debug scheduling, which would've
>         helped. I did notice Unqueue.scheduled always returned true, even
>         though it never actually ran. Not sure if the following will help, gdb
>         shows the following:
>
>         (gdb) info threads
>            3 Thread 0x7fa422559700 (LWP 27205)  0x000000392cedb0b3 in poll ()
>         from /lib64/libc.so.6
>            2 Thread 0x7fa421d58700 (LWP 27206)  0x000000392cedcee3 in select
>         () from /lib64/libc.so.6
>         * 1 Thread 0x7fa432cfb740 (LWP 27197)  FromDAG::process_packet
>         (this=0x1afc570, erf_record=0x7fa423258bf0, rlen=<value optimized
>         out>) at ../elements/local/fromdag.cc:193
>         (gdb) thread 2
>         [Switching to thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0
>           0x000000392cedb0b3 in poll () from /lib64/libc.so.6
>         (gdb) bt
>         #0  0x000000392cedb0b3 in poll () from /lib64/libc.so.6
>         #1  0x000000000058bc36 in Master::run_selects_poll (this=0x1afacc0,
>         thread=0x1afb8b0, more_tasks=false) at ../lib/master.cc:862
>         #2  0x000000000058c4ec in Master::run_selects (this=0x1afacc0,
>         thread=0x1afb8b0) at ../lib/master.cc:1050
>         #3  0x000000000057ddf1 in run_os (this=0x1afb8b0) at
>         ../lib/routerthread.cc:447
>         #4  RouterThread::driver (this=0x1afb8b0) at ../lib/routerthread.cc:568
>         #5  0x0000000000556dc9 in thread_driver (user_data=<value optimized
>         out>) at click.cc:414
>         #6  0x000000392d206d5b in start_thread () from /lib64/libpthread.so.0
>         #7  0x000000392cee4aad in clone () from /lib64/libc.so.6
>         (gdb) thread 3
>         [Switching to thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0
>           0x000000392cedcee3 in select () from /lib64/libc.so.6
>         (gdb) bt
>         #0  0x000000392cedcee3 in select () from /lib64/libc.so.6
>         #1  0x000000000058c4bf in Master::run_selects (this=0x1afacc0,
>         thread=0x1afb7e0) at ../lib/master.cc:1015
>         #2  0x000000000057ddf1 in run_os (this=0x1afb7e0) at
>         ../lib/routerthread.cc:447
>         #3  RouterThread::driver (this=0x1afb7e0) at ../lib/routerthread.cc:568
>         #4  0x0000000000556dc9 in thread_driver (user_data=<value optimized
>         out>) at click.cc:414
>         #5  0x000000392d206d5b in start_thread () from /lib64/libpthread.so.0
>         #6  0x000000392cee4aad in clone () from /lib64/libc.so.6
>         (gdb) thread 1
>         [Switching to thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0
>           0x000000392d20ebfd in nanosleep () from /lib64/libpthread.so.0
>         (gdb) bt
>         #0  FromDAG::process_packet (this=0x1afc570,
>         erf_record=0x7fa423258bf0, rlen=<value optimized out>) at
>         ../elements/local/fromdag.cc:193
>         #1  0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at
>         ../elements/local/fromdag.cc:150
>         #2  0x000000000057dbe6 in fire (this=0x1afb710) at
>         ../include/click/task.hh:612
>         #3  run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410
>         #4  RouterThread::driver (this=0x1afb710) at ../lib/routerthread.cc:600
>         #5  0x0000000000558513 in main (argc=<value optimized out>,
>         argv=<value optimized out>) at click.cc:639
>
>         One thing to note, for various reasons I'm doing something very
>         unclick-like with FromDAG where I allow it to block up to 10ms. For my
>         specific requirements this is not a problem, but just in case it might
>         affect the way the new task handling operates it's worth noting.
>
>         Beyers
>
>         On Sun, Feb 6, 2011 at 1:23 AM, Eddie Kohler <kohler at cs.ucla.edu
>         <mailto:kohler at cs.ucla.edu>> wrote:
>
>             Hi all,
>
>             This is just a note to say that I've merged the "coreperformance"
>             branch
>             with master.  There are several changes that may speed up particularly
>             simple configurations, and that (more importantly) may make it
>             easier to
>             experiment with different multithreading setups.  (For instance I
>             believe switching a task from one thread to another is faster
>             now.)  Let
>             me know if you experience any problems
>
>             Eddie
>             _______________________________________________
>             click mailing list
>             click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
>             https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
>
>


More information about the click mailing list