[Click] Core performance checkins

Thu Feb 10 09:38:52 EST 2011

On 10 February 2011 14:28, Beyers Cronje <bcronje at gmail.com> wrote:
> Hi Eddie,
>
> I've reverted to the pre-coreperformance code on the "semi-production"
> server for now, and will see if I can replicate the issue on my dev server.
> Hopefully I can use Cliff's test case to replicate and troubleshoot. Will
> keep you posted. In all honesty though I haven't wrapped my head around
> memory barriers and the like, so I'm not sure how far I will get on my own
> :(
>
> Cliff, care to host an online advanced multithreading course ? :)

Cliff, If you are hosting mind if i join ;-) ?

>
> On Thu, Feb 10, 2011 at 4:15 PM, Eddie Kohler <kohler at cs.ucla.edu> wrote:
>
>> I'm very sorry about this regression... Unfortunately due to travel it's
>> going to hard to look at this until the weekend.  I would rather not yet
>> revert coreperformance.  Can you handle this situation?
>>
>> Eddie
>>
>>
>>
>> On 02/09/2011 05:22 AM, Beyers Cronje wrote:
>>
>>> Hi Eddie,
>>>
>>> Some info that I'm sure will help debugging. I configured click
>>> with  --enable-schedule-debugging=extra and also enabled
>>> NOTIFIERQUEUE_DEBUG
>>>
>>> It seems Unqueue gets stuck in the pending list. See the output below:
>>>
>>> read q.length
>>> 1000
>>>
>>> read q.notifier_state
>>> notifier on
>>> task 0x19387b0 [uq :: Unqueue] scheduled
>>>
>>> read uq.scheduled
>>> true /* but pending */
>>>
>>> read uq.notifier
>>> empty.0/1:1*
>>>
>>>
>>> Unqueue stays in this scheduled but pending state for an undetermined
>>> period
>>> of time, up to minutes some times.
>>>
>>> Any idea where I can start on fixing this bug?
>>>
>>> Beyers
>>>
>>>
>>> On Wed, Feb 9, 2011 at 2:52 AM, Beyers Cronje <bcronje at gmail.com
>>> <mailto:bcronje at gmail.com>> wrote:
>>>
>>>    Update - Strange, after typing my previous email I checked again, and
>>> all
>>>    of a sudden Unqueue was pulling packets again. Not sure if me breaking
>>>    into Click with gdb kick-started, it again :) or if it's a intermittent
>>> issue.
>>>
>>>
>>>    On Wed, Feb 9, 2011 at 2:41 AM, Beyers Cronje <bcronje at gmail.com
>>>    <mailto:bcronje at gmail.com>> wrote:
>>>
>>>        Hi Eddie,
>>>
>>>        Since running this merge I've been experiencing issues with
>>> usermode
>>>        multithreading. I'm using commit
>>> 9419098acbdc20837e37f3033c40661809431f8d
>>>        I do believe the issues are related to the changes of this merged,
>>> as
>>>        I was running the same config on pre-coreperformance merge code
>>>        without any issues.
>>>
>>>        *Simplified Config used:*
>>>
>>>        fd::FromDAG(/dev/dag0) -> cl1::Classifier(12/0800) ->
>>> MarkIPHeader(14)
>>>        -> ipc1::IPClassifier(udp port 1646);
>>>        sfp::SFP();
>>>        q::ThreadSafeQueue;
>>>        log::Logger(sfp, 1800);
>>>
>>>        ipc1 -> RadAccounting -> q;
>>>        q -> uq::Unqueue -> aupcc::Counter -> sfp;
>>>
>>>        StaticThreadSched(fd 0, uq 1, log 2);
>>>
>>>        *Problem Description:*
>>>        The 3 threads in the config: FromDAG and Unqueue do what their
>>> names
>>>        indicate, and Logger has a timer that schedules its task once every
>>>        30min. Everything runs fine initially and then somewhere along the
>>>        line Unqueue stops pulling packets from the queue which leads to a
>>>        constant queue overflow. When this happens I can see FromDAG is
>>> still
>>>        working as q.drops increases constantly, and Logger also fires away
>>>        every 30min. Note push rate from FromDAG is quite high, but
>>>        what RadAccounting pushes into queue is very low ~ 100pps, which
>>> means
>>>        queue is most of the time empty and Unqueue is not scheduled.
>>>
>>>        Unfortunately I didn't configure debug scheduling, which would've
>>>        helped. I did notice Unqueue.scheduled always returned true, even
>>>        though it never actually ran. Not sure if the following will help,
>>> gdb
>>>        shows the following:
>>>
>>>        (gdb) info threads
>>>           3 Thread 0x7fa422559700 (LWP 27205)  0x000000392cedb0b3 in poll
>>> ()
>>>        from /lib64/libc.so.6
>>>           2 Thread 0x7fa421d58700 (LWP 27206)  0x000000392cedcee3 in
>>> select
>>>        () from /lib64/libc.so.6
>>>        * 1 Thread 0x7fa432cfb740 (LWP 27197)  FromDAG::process_packet
>>>        (this=0x1afc570, erf_record=0x7fa423258bf0, rlen=<value optimized
>>>        out>) at ../elements/local/fromdag.cc:193
>>>        (gdb) thread 2
>>>        [Switching to thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0
>>>          0x000000392cedb0b3 in poll () from /lib64/libc.so.6
>>>        (gdb) bt
>>>        #0  0x000000392cedb0b3 in poll () from /lib64/libc.so.6
>>>        #1  0x000000000058bc36 in Master::run_selects_poll (this=0x1afacc0,
>>>        thread=0x1afb8b0, more_tasks=false) at ../lib/master.cc:862
>>>        #2  0x000000000058c4ec in Master::run_selects (this=0x1afacc0,
>>>        thread=0x1afb8b0) at ../lib/master.cc:1050
>>>        #3  0x000000000057ddf1 in run_os (this=0x1afb8b0) at
>>>        ../lib/routerthread.cc:447
>>>        #4  RouterThread::driver (this=0x1afb8b0) at
>>> ../lib/routerthread.cc:568
>>>        #5  0x0000000000556dc9 in thread_driver (user_data=<value optimized
>>>        out>) at click.cc:414
>>>        #6  0x000000392d206d5b in start_thread () from
>>> /lib64/libpthread.so.0
>>>        #7  0x000000392cee4aad in clone () from /lib64/libc.so.6
>>>        (gdb) thread 3
>>>        [Switching to thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0
>>>          0x000000392cedcee3 in select () from /lib64/libc.so.6
>>>        (gdb) bt
>>>        #0  0x000000392cedcee3 in select () from /lib64/libc.so.6
>>>        #1  0x000000000058c4bf in Master::run_selects (this=0x1afacc0,
>>>        thread=0x1afb7e0) at ../lib/master.cc:1015
>>>        #2  0x000000000057ddf1 in run_os (this=0x1afb7e0) at
>>>        ../lib/routerthread.cc:447
>>>        #3  RouterThread::driver (this=0x1afb7e0) at
>>> ../lib/routerthread.cc:568
>>>        #4  0x0000000000556dc9 in thread_driver (user_data=<value optimized
>>>        out>) at click.cc:414
>>>        #5  0x000000392d206d5b in start_thread () from
>>> /lib64/libpthread.so.0
>>>        #6  0x000000392cee4aad in clone () from /lib64/libc.so.6
>>>        (gdb) thread 1
>>>        [Switching to thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0
>>>          0x000000392d20ebfd in nanosleep () from /lib64/libpthread.so.0
>>>        (gdb) bt
>>>        #0  FromDAG::process_packet (this=0x1afc570,
>>>        erf_record=0x7fa423258bf0, rlen=<value optimized out>) at
>>>        ../elements/local/fromdag.cc:193
>>>        #1  0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at
>>>        ../elements/local/fromdag.cc:150
>>>        #2  0x000000000057dbe6 in fire (this=0x1afb710) at
>>>        ../include/click/task.hh:612
>>>        #3  run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410
>>>        #4  RouterThread::driver (this=0x1afb710) at
>>> ../lib/routerthread.cc:600
>>>        #5  0x0000000000558513 in main (argc=<value optimized out>,
>>>        argv=<value optimized out>) at click.cc:639
>>>
>>>        One thing to note, for various reasons I'm doing something very
>>>        unclick-like with FromDAG where I allow it to block up to 10ms. For
>>> my
>>>        specific requirements this is not a problem, but just in case it
>>> might
>>>        affect the way the new task handling operates it's worth noting.
>>>
>>>        Beyers
>>>
>>>        On Sun, Feb 6, 2011 at 1:23 AM, Eddie Kohler <kohler at cs.ucla.edu
>>>        <mailto:kohler at cs.ucla.edu>> wrote:
>>>
>>>            Hi all,
>>>
>>>            This is just a note to say that I've merged the
>>> "coreperformance"
>>>            branch
>>>            with master.  There are several changes that may speed up
>>> particularly
>>>            simple configurations, and that (more importantly) may make it
>>>            easier to
>>>            experiment with different multithreading setups.  (For instance
>>> I
>>>            believe switching a task from one thread to another is faster
>>>            now.)  Let
>>>            me know if you experience any problems
>>>
>>>            Eddie
>>>            _______________________________________________
>>>            click mailing list
>>>            click at amsterdam.lcs.mit.edu <mailto:
>>> click at amsterdam.lcs.mit.edu>
>>>
>>>            https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
>>>
>>>
>>>
>>>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>