[Click] Core performance checkins
Eddie Kohler
kohler at cs.ucla.edu
Sat Feb 12 22:05:21 EST 2011
Hooray! I'm very glad it was something simple (in fact retarded).
Cliff is still welcome to teach a concurrency class any time :) :) :)
E
On 2/12/11 3:41 PM, Beyers Cronje wrote:
> Nice, been running for a day or so without any signs of the issue I was
> experiencing before, well done.
>
> Thanks Bobby and Eddie, much appreciated.
>
> On Fri, Feb 11, 2011 at 8:46 AM, Eddie Kohler <kohler at cs.ucla.edu
> <mailto:kohler at cs.ucla.edu>> wrote:
>
> Bobby,
>
> THANKS!!! Totally right; thanks for the fix. I found anohter bug as
> well, and fixed it.
>
> Awesome!! Much appreciated.
>
> Eddie
>
>
>
> On 02/10/2011 11:40 AM, Bobby Longpocket wrote:
>
> I think the problem is just that _any_pending is never getting set, so
> RouterThread::active() returns false even if there are tasks on the
> pending list.
>
> I don't run click the normal way, so I can't easily reproduce the
> issue, but try making the following change:
>
> In RouterThread::active(), replace both occurrences of _any_pending
> with _pending_head.
>
> diff --git a/include/click/routerthread.hh b/include/click/routerthread.hh
> index a405e1c..504706d 100644
> --- a/include/click/routerthread.hh
> +++ b/include/click/routerthread.hh
> @@ -231,9 +231,9 @@ inline bool
> RouterThread::active() const
> {
> #if HAVE_TASK_HEAP
> - return _task_heap.size() != 0 || _any_pending;
> + return _task_heap.size() != 0 || _pending_head;
> #else
> - return ((const Task *)_next != this) || _any_pending;
> + return ((const Task *)_next != this) || _pending_head;
> #endif
> }
>
>
> --- On Thu, 2/10/11, Eddie Kohler<kohler at cs.ucla.edu
> <mailto:kohler at cs.ucla.edu>> wrote:
>
> From: Eddie Kohler<kohler at cs.ucla.edu <mailto:kohler at cs.ucla.edu>>
> Subject: Re: [Click] Core performance checkins
> To: "Beyers Cronje"<bcronje at gmail.com <mailto:bcronje at gmail.com>>
> Cc: "Click"<click at pdos.csail.mit.edu
> <mailto:click at pdos.csail.mit.edu>>
> Date: Thursday, February 10, 2011, 6:15 AM
> I'm very sorry about this
> regression... Unfortunately due to travel it's going
> to hard to look at this until the weekend. I would
> rather not yet revert
> coreperformance. Can you handle this situation?
>
> Eddie
>
>
> On 02/09/2011 05:22 AM, Beyers Cronje wrote:
>
> Hi Eddie,
>
> Some info that I'm sure will help debugging. I
>
> configured click
>
> with --enable-schedule-debugging=extra and also
>
> enabled NOTIFIERQUEUE_DEBUG
>
>
> It seems Unqueue gets stuck in the pending list. See
>
> the output below:
>
>
> read q.length
> 1000
>
> read q.notifier_state
> notifier on
> task 0x19387b0 [uq :: Unqueue] scheduled
>
> read uq.scheduled
> true /* but pending */
>
> read uq.notifier
> empty.0/1:1*
>
>
> Unqueue stays in this scheduled but pending state for
>
> an undetermined period
>
> of time, up to minutes some times.
>
> Any idea where I can start on fixing this bug?
>
> Beyers
>
>
> On Wed, Feb 9, 2011 at 2:52 AM, Beyers
> Cronje<bcronje at gmail.com <mailto:bcronje at gmail.com>
> <mailto:bcronje at gmail.com <mailto:bcronje at gmail.com>>>
>
> wrote:
>
>
> Update - Strange, after typing
>
> my previous email I checked again, and all
>
> of a sudden Unqueue was
>
> pulling packets again. Not sure if me breaking
>
> into Click with gdb
>
> kick-started, it again :) or if it's a intermittent issue.
>
>
>
> On Wed, Feb 9, 2011 at 2:41
>
> AM, Beyers Cronje<bcronje at gmail.com <mailto:bcronje at gmail.com>
>
> <mailto:bcronje at gmail.com <mailto:bcronje at gmail.com>>>
>
> wrote:
>
>
> Hi Eddie,
>
> Since running
>
> this merge I've been experiencing issues with usermode
>
> multithreading.
>
> I'm using commit 9419098acbdc20837e37f3033c40661809431f8d
>
> I do believe the
>
> issues are related to the changes of this merged, as
>
> I was running
>
> the same config on pre-coreperformance merge code
>
> without any
>
> issues.
>
>
> *Simplified
>
> Config used:*
>
>
>
> fd::FromDAG(/dev/dag0) ->
> cl1::Classifier(12/0800) -> MarkIPHeader(14)
>
> ->
>
> ipc1::IPClassifier(udp port 1646);
>
> sfp::SFP();
>
> q::ThreadSafeQueue;
>
> log::Logger(sfp,
>
> 1800);
>
>
> ipc1 ->
>
> RadAccounting -> q;
>
> q ->
>
> uq::Unqueue -> aupcc::Counter -> sfp;
>
>
>
> StaticThreadSched(fd 0, uq 1, log 2);
>
>
> *Problem
>
> Description:*
>
> The 3 threads in
>
> the config: FromDAG and Unqueue do what their names
>
> indicate, and
>
> Logger has a timer that schedules its task once every
>
> 30min.
>
> Everything runs fine initially and then somewhere along the
>
> line Unqueue
>
> stops pulling packets from the queue which leads to a
>
> constant queue
>
> overflow. When this happens I can see FromDAG is still
>
> working as
>
> q.drops increases constantly, and Logger also fires away
>
> every 30min.
>
> Note push rate from FromDAG is quite high, but
>
> what
>
> RadAccounting pushes into queue is very low ~ 100pps, which
> means
>
> queue is most of
>
> the time empty and Unqueue is not scheduled.
>
>
> Unfortunately I
>
> didn't configure debug scheduling, which would've
>
> helped. I did
>
> notice Unqueue.scheduled always returned true, even
>
> though it never
>
> actually ran. Not sure if the following will help, gdb
>
> shows the
>
> following:
>
>
> (gdb) info
>
> threads
>
> 3 Thread
>
> 0x7fa422559700 (LWP 27205) 0x000000392cedb0b3 in poll
> ()
>
> from
>
> /lib64/libc.so.6
>
> 2 Thread
>
> 0x7fa421d58700 (LWP 27206) 0x000000392cedcee3 in
> select
>
> () from
>
> /lib64/libc.so.6
>
> * 1 Thread
>
> 0x7fa432cfb740 (LWP 27197) FromDAG::process_packet
>
> (this=0x1afc570,
>
> erf_record=0x7fa423258bf0, rlen=<value optimized
>
> out>) at
>
> ../elements/local/fromdag.cc:193
>
> (gdb) thread 2
> [Switching to
>
> thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0
>
>
> 0x000000392cedb0b3 in poll () from
> /lib64/libc.so.6
>
> (gdb) bt
> #0
>
> 0x000000392cedb0b3 in poll () from /lib64/libc.so.6
>
> #1
>
> 0x000000000058bc36 in Master::run_selects_poll
> (this=0x1afacc0,
>
>
> thread=0x1afb8b0, more_tasks=false) at
> ../lib/master.cc:862
>
> #2
>
> 0x000000000058c4ec in Master::run_selects (this=0x1afacc0,
>
>
> thread=0x1afb8b0) at
> ../lib/master.cc:1050
>
> #3
>
> 0x000000000057ddf1 in run_os (this=0x1afb8b0) at
>
>
> ../lib/routerthread.cc:447
>
> #4
>
> RouterThread::driver (this=0x1afb8b0) at
> ../lib/routerthread.cc:568
>
> #5
>
> 0x0000000000556dc9 in thread_driver (user_data=<value
> optimized
>
> out>) at
>
> click.cc:414
>
> #6
>
> 0x000000392d206d5b in start_thread () from
> /lib64/libpthread.so.0
>
> #7
>
> 0x000000392cee4aad in clone () from /lib64/libc.so.6
>
> (gdb) thread 3
> [Switching to
>
> thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0
>
>
> 0x000000392cedcee3 in select () from
> /lib64/libc.so.6
>
> (gdb) bt
> #0
>
> 0x000000392cedcee3 in select () from /lib64/libc.so.6
>
> #1
>
> 0x000000000058c4bf in Master::run_selects (this=0x1afacc0,
>
>
> thread=0x1afb7e0) at
> ../lib/master.cc:1015
>
> #2
>
> 0x000000000057ddf1 in run_os (this=0x1afb7e0) at
>
>
> ../lib/routerthread.cc:447
>
> #3
>
> RouterThread::driver (this=0x1afb7e0) at
> ../lib/routerthread.cc:568
>
> #4
>
> 0x0000000000556dc9 in thread_driver (user_data=<value
> optimized
>
> out>) at
>
> click.cc:414
>
> #5
>
> 0x000000392d206d5b in start_thread () from
> /lib64/libpthread.so.0
>
> #6
>
> 0x000000392cee4aad in clone () from /lib64/libc.so.6
>
> (gdb) thread 1
> [Switching to
>
> thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0
>
>
> 0x000000392d20ebfd in nanosleep () from
> /lib64/libpthread.so.0
>
> (gdb) bt
> #0
>
> FromDAG::process_packet (this=0x1afc570,
>
>
> erf_record=0x7fa423258bf0, rlen=<value
> optimized out>) at
>
>
> ../elements/local/fromdag.cc:193
>
> #1
>
> 0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at
>
>
> ../elements/local/fromdag.cc:150
>
> #2
>
> 0x000000000057dbe6 in fire (this=0x1afb710) at
>
>
> ../include/click/task.hh:612
>
> #3
>
> run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410
>
> #4
>
> RouterThread::driver (this=0x1afb710) at
> ../lib/routerthread.cc:600
>
> #5
>
> 0x0000000000558513 in main (argc=<value optimized
> out>,
>
> argv=<value
>
> optimized out>) at click.cc:639
>
>
> One thing to
>
> note, for various reasons I'm doing something very
>
> unclick-like
>
> with FromDAG where I allow it to block up to 10ms. For my
>
> specific
>
> requirements this is not a problem, but just in case it
> might
>
> affect the way
>
> the new task handling operates it's worth noting.
>
>
> Beyers
>
> On Sun, Feb 6,
>
> 2011 at 1:23 AM, Eddie Kohler<kohler at cs.ucla.edu
> <mailto:kohler at cs.ucla.edu>
>
> <mailto:kohler at cs.ucla.edu <mailto:kohler at cs.ucla.edu>>>
>
> wrote:
>
>
> Hi
>
> all,
>
>
>
> This is just a note to say that I've
> merged the "coreperformance"
>
>
> branch
>
>
> with master. There are several
> changes that may speed up particularly
>
>
> simple configurations, and that (more
> importantly) may make it
>
>
> easier to
>
>
> experiment with different multithreading
> setups. (For instance I
>
>
> believe switching a task from one thread
> to another is faster
>
>
> now.) Let
>
> me
>
> know if you experience any problems
>
>
>
> Eddie
>
>
> _______________________________________________
>
>
> click mailing list
>
> click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
>
> <mailto:click at amsterdam.lcs.mit.edu
> <mailto:click at amsterdam.lcs.mit.edu>>
>
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
>
>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
>
>
>
More information about the click
mailing list