[Click] Multithreading bug

Beyers Cronje bcronje at gmail.com
Wed May 11 12:00:38 EDT 2011


Hi Eddie,

Sorry, I meant in concept it works for me. On the live server I changed the
config so that Socket is in a Push path which bypasses selected(). I have a
change control window tomorrow on this server so I'll apply the change and
report back to you then.

Beyers

On Wed, May 11, 2011 at 5:47 PM, Eddie Kohler <kohler at cs.ucla.edu> wrote:

> Meaning that you've tried it and you no longer see a bug, so I should check
> it in?
>
> E
>
>
>
> On 5/11/11 6:30 AM, Beyers Cronje wrote:
>
>> Perfect, that works for me.
>>
>> On Wed, May 11, 2011 at 3:25 PM, Eddie Kohler <kohler at cs.ucla.edu
>> <mailto:kohler at cs.ucla.edu>> wrote:
>>
>>    Good catch!  This is a bug in Socket.  The fast_reschedule() method
>> should
>>    only be called when an element is truly being run from the scheduler.
>>  (I
>>    guess the documentation is a bit ambiguous.)  It is safe to call
>>    reschedule() from anywhere; I would suggest just changing
>>    Socket::run_task() to call reschedule().
>>
>>    Eddie
>>
>>
>>
>>    On 5/11/11 6:09 AM, Beyers Cronje wrote:
>>
>>        Hi guys,
>>
>>        I seem to have come across a userlevel click multithreading bug. In
>> short,
>>        when running multiple threads in userlevel click with Socket
>> element
>>        it can
>>        happen that two separate threads end up in fast_reschedule() at the
>> same
>>        time causing click to crash at the following line: task.hh:558
>> while
>>        (n !=
>>        _thread&&  !PASS_GT(n->_pass, _pass))
>>
>>        One thread runs Socket::run_task() via standard task scheduling,
>> while the
>>        second thread call Socket::run_task() via Socket::selected()
>> scheduling.
>>        This will obviously affect any other element that calls run_task()
>> via
>>        selected().
>>
>>        Eddie, Cliff, should this issue be addressed inside Socket? Or
>> should we
>>        look at how RouterThread handles locking with regards to task and
>> select
>>        scheduling?
>>
>>        Below the stack traces: 7 threads, with thread 1 and 6 being the
>> culprits.
>>
>>        (gdb) info threads
>>           7 Thread 23122  0x000000392cedcee3 in select () from
>> /lib64/libc.so.6
>>           6 Thread 23121  0x000000000054ce1b in fast_reschedule
>>        (this=0x14571e0) at
>>        ../include/click/task.hh:558
>>           5 Thread 23125  0x000000000058471b in RouterThread::driver
>>        (this=0x1445bb0) at ../lib/routerthread.cc:565
>>           4 Thread 23120  FromDAG::run_task (this=0x1456990) at
>> fromdag.cc:157
>>           3 Thread 23112  0x0000000000560305 in Packet::~Packet
>> (this=0x14b22f0,
>>        __in_chrg=<value optimized out>) at ../lib/packet.cc:181
>>           2 Thread 23124  0x00000000005844b0 in RouterThread::driver
>>        (this=0x1445af0) at ../lib/routerthread.cc:594
>>        * 1 Thread 23123  0x000000000054ce1b in fast_reschedule
>>        (this=0x14571e0) at
>>        ../include/click/task.hh:558
>>
>>        (gdb) thread 1
>>        [Switching to thread 1 (Thread 23123)]#0  0x000000000054ce1b in
>>        fast_reschedule (this=0x14571e0) at ../include/click/task.hh:558
>>        558             while (n != _thread&&  !PASS_GT(n->_pass, _pass))
>>        (gdb) bt
>>        #0  0x000000000054ce1b in fast_reschedule (this=0x14571e0) at
>>        ../include/click/task.hh:558
>>        #1  Socket::run_task (this=0x14571e0) at
>>        ../elements/userlevel/socket.cc:524
>>        #2  0x00000000005845d6 in fire (this=0x1445a30) at
>>        ../include/click/task.hh:612
>>        #3  run_tasks (this=0x1445a30) at ../lib/routerthread.cc:405
>>        #4  RouterThread::driver (this=0x1445a30) at
>> ../lib/routerthread.cc:594
>>        #5  0x0000000000556e39 in thread_driver (user_data=<value optimized
>>        out>) at
>>        click.cc:414
>>        #6  0x000000392d206d5b in start_thread () from
>> /lib64/libpthread.so.0
>>        #7  0x000000392cee4aad in clone () from /lib64/libc.so.6
>>
>>        (gdb) thread 6
>>        [Switching to thread 6 (Thread 23121)]#0  0x000000000054ce1b in
>>        fast_reschedule (this=0x14571e0) at ../include/click/task.hh:558
>>        558             while (n != _thread&&  !PASS_GT(n->_pass, _pass))
>>        (gdb) bt
>>        #0  0x000000000054ce1b in fast_reschedule (this=0x14571e0) at
>>        ../include/click/task.hh:558
>>        #1  Socket::run_task (this=0x14571e0) at
>>        ../elements/userlevel/socket.cc:524
>>        #2  0x000000000054c2cf in Socket::selected (this=0x14571e0,
>> fd=<value
>>        optimized out>) at ../elements/userlevel/socket.cc:417
>>        #3  0x0000000000592856 in call_selected (this=0x1444c50,
>> thread=<value
>>        optimized out>, more_tasks=<value optimized out>) at
>> ../lib/master.cc:732
>>        #4  Master::run_selects_poll (this=0x1444c50, thread=<value
>> optimized
>>        out>,
>>        more_tasks=<value optimized out>) at ../lib/master.cc:889
>>        #5  0x0000000000592f34 in Master::run_selects (this=0x1444c50,
>>        thread=0x14458b0) at ../lib/master.cc:1050
>>        #6  0x00000000005847d7 in run_os (this=0x14458b0) at
>>        ../lib/routerthread.cc:442
>>        #7  RouterThread::driver (this=0x14458b0) at
>> ../lib/routerthread.cc:562
>>        #8  0x0000000000556e39 in thread_driver (user_data=<value optimized
>>        out>) at
>>        click.cc:414
>>        #9  0x000000392d206d5b in start_thread () from
>> /lib64/libpthread.so.0
>>        #10 0x000000392cee4aad in clone () from /lib64/libc.so.6
>>
>>
>>        Beyers
>>        _______________________________________________
>>        click mailing list
>>        click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
>>
>>        https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>
>>
>>


More information about the click mailing list