[Click] Core performance checkins

Beyers Cronje bcronje at gmail.com
Tue Feb 8 19:41:22 EST 2011


Hi Eddie,

Since running this merge I've been experiencing issues with usermode
multithreading. I'm using commit 9419098acbdc20837e37f3033c40661809431f8d
I do believe the issues are related to the changes of this merged, as I was
running the same config on pre-coreperformance merge code without any
issues.

*Simplified Config used:*

fd::FromDAG(/dev/dag0) -> cl1::Classifier(12/0800) -> MarkIPHeader(14) ->
ipc1::IPClassifier(udp port 1646);
sfp::SFP();
q::ThreadSafeQueue;
log::Logger(sfp, 1800);

ipc1 -> RadAccounting -> q;
q -> uq::Unqueue -> aupcc::Counter -> sfp;

StaticThreadSched(fd 0, uq 1, log 2);

*Problem Description:*
The 3 threads in the config: FromDAG and Unqueue do what their names
indicate, and Logger has a timer that schedules its task once every 30min.
Everything runs fine initially and then somewhere along the line Unqueue
stops pulling packets from the queue which leads to a constant queue
overflow. When this happens I can see FromDAG is still working as q.drops
increases constantly, and Logger also fires away every 30min. Note push rate
from FromDAG is quite high, but what RadAccounting pushes into queue is very
low ~ 100pps, which means queue is most of the time empty and Unqueue is not
scheduled.

Unfortunately I didn't configure debug scheduling, which would've helped. I
did notice Unqueue.scheduled always returned true, even though it never
actually ran. Not sure if the following will help, gdb shows the following:

(gdb) info threads
  3 Thread 0x7fa422559700 (LWP 27205)  0x000000392cedb0b3 in poll () from
/lib64/libc.so.6
  2 Thread 0x7fa421d58700 (LWP 27206)  0x000000392cedcee3 in select () from
/lib64/libc.so.6
* 1 Thread 0x7fa432cfb740 (LWP 27197)  FromDAG::process_packet
(this=0x1afc570, erf_record=0x7fa423258bf0, rlen=<value optimized out>) at
../elements/local/fromdag.cc:193
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0
 0x000000392cedb0b3 in poll () from /lib64/libc.so.6
(gdb) bt
#0  0x000000392cedb0b3 in poll () from /lib64/libc.so.6
#1  0x000000000058bc36 in Master::run_selects_poll (this=0x1afacc0,
thread=0x1afb8b0, more_tasks=false) at ../lib/master.cc:862
#2  0x000000000058c4ec in Master::run_selects (this=0x1afacc0,
thread=0x1afb8b0) at ../lib/master.cc:1050
#3  0x000000000057ddf1 in run_os (this=0x1afb8b0) at
../lib/routerthread.cc:447
#4  RouterThread::driver (this=0x1afb8b0) at ../lib/routerthread.cc:568
#5  0x0000000000556dc9 in thread_driver (user_data=<value optimized out>) at
click.cc:414
#6  0x000000392d206d5b in start_thread () from /lib64/libpthread.so.0
#7  0x000000392cee4aad in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0
 0x000000392cedcee3 in select () from /lib64/libc.so.6
(gdb) bt
#0  0x000000392cedcee3 in select () from /lib64/libc.so.6
#1  0x000000000058c4bf in Master::run_selects (this=0x1afacc0,
thread=0x1afb7e0) at ../lib/master.cc:1015
#2  0x000000000057ddf1 in run_os (this=0x1afb7e0) at
../lib/routerthread.cc:447
#3  RouterThread::driver (this=0x1afb7e0) at ../lib/routerthread.cc:568
#4  0x0000000000556dc9 in thread_driver (user_data=<value optimized out>) at
click.cc:414
#5  0x000000392d206d5b in start_thread () from /lib64/libpthread.so.0
#6  0x000000392cee4aad in clone () from /lib64/libc.so.6
(gdb) thread 1
[Switching to thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0
 0x000000392d20ebfd in nanosleep () from /lib64/libpthread.so.0
(gdb) bt
#0  FromDAG::process_packet (this=0x1afc570, erf_record=0x7fa423258bf0,
rlen=<value optimized out>) at ../elements/local/fromdag.cc:193
#1  0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at
../elements/local/fromdag.cc:150
#2  0x000000000057dbe6 in fire (this=0x1afb710) at
../include/click/task.hh:612
#3  run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410
#4  RouterThread::driver (this=0x1afb710) at ../lib/routerthread.cc:600
#5  0x0000000000558513 in main (argc=<value optimized out>, argv=<value
optimized out>) at click.cc:639

One thing to note, for various reasons I'm doing something very unclick-like
with FromDAG where I allow it to block up to 10ms. For my specific
requirements this is not a problem, but just in case it might affect the way
the new task handling operates it's worth noting.

Beyers

On Sun, Feb 6, 2011 at 1:23 AM, Eddie Kohler <kohler at cs.ucla.edu> wrote:

> Hi all,
>
> This is just a note to say that I've merged the "coreperformance" branch
> with master.  There are several changes that may speed up particularly
> simple configurations, and that (more importantly) may make it easier to
> experiment with different multithreading setups.  (For instance I
> believe switching a task from one thread to another is faster now.)  Let
> me know if you experience any problems
>
> Eddie
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>


More information about the click mailing list