[Click] Spinlock Problems

springbo at cs.wisc.edu springbo at cs.wisc.edu
Fri Sep 5 15:21:22 EDT 2008


Perfect Joonwoo!

commit 381b5f8ff5fe2a1d7f26ac39b5591b99c4365bae (Write handlers become
EXCLUSIVE by default) made all write handlers exclusive by default. When
click-uninstall writes to /click/config to clear the configuration it is
done with a now exclusive write handler (requiring all threads to be
locked). The action of the write handler then attempts to lock all tasks
again routerthread.cc:633. The write handler action is attempting to grab
a spinlock it already holds -> Deadlock.

The attached patch simply marks the write handler as NONEXCLUSIVE. The
proper fix would be to remove lock_tasks() from
RouterThread::unschedule_router_tasks,  if you can guarantee all routes
into the function have already locked the tasks.

Thanks!
Kevin



> Hi all,
>
> I bisected tree for this issue, and found following commit is the
> first bad commit.
>
>  commit a7dd5c93a51b3b5b8cd4a177ef379492c27859bf
>  click-combine documentation: Mention that RouterLink is fake.
>
> Any idea?
>
> Thanks,
> Joonwoo
>
> 2008/9/3  <springbo at cs.wisc.edu>:
>> Hi Eddie,
>>
>> Sorry for the wait. Today I pulled the latest sources and rebuilt a
>> vanilla click kernel on a 64-bit 4 core Xeon machine (with the click
>> e1000
>> driver installed). The locking seems to be working fine, but
>> 'click-uninstall' is resulting in a lockup. See the attached dumps
>> (click
>> was started with click-install -t 3 in both cases). I still see the
>> lockup
>> with -t 1.
>>
>> Click uninstalls without problem with our version from a couple of weeks
>> back.
>>
>> Unrelated, but I also saw:
>> ../include/click/timestamp.hh: In constructor
>> 'Timestamp::Timestamp(double)':
>> ../include/click/timestamp.hh:938: warning: converting to 'int64_t' from
>> 'double'
>> all over when building userlevel click.
>>
>> Sorry I don't have the time to dive into this further right now.
>>
>> Thanks!
>> Kevin
>>
>>
>>
>>
>>> Kevin,
>>>
>>> Is there any chance you'd be able to try the newer Spinlock code any
>>> time
>>> soon?
>>>
>>> Eddie
>>>
>>>
>>> springbo at cs.wisc.edu wrote:
>>>> Hi,
>>>>
>>>> I think there is a concurrency issue with Spinlocks in linuxmodule
>>>> multi-threaded click (running a 2.6.19.2 patched kernel, the
>>>> e1000-NAPI
>>>> driver and today's trunk). I've put together a temporary patch, but
>>>> there
>>>> might be further issues. Sorry, I don't have the time to investigate
>>>> more.
>>>>
>>>>
>>>> Problems:
>>>> The series of events are:
>>>> -Thread A is running on CPU 0 and thread B is running on CPU 1.
>>>> -A acquires the Spinlock and executes code in the critical section.
>>>> -Linux schedules Thread A to run on CPU 1 and thread B to run on CPU
>>>> 0.
>>>> -B acquires the Spinlock (works because CPU 0 is the owner of the
>>>> lock,
>>>> not the thread)
>>>> -A and B are both operating in the critical section
>>>> -(if you're lucky) A releases the Spinlock and generates a "releasing
>>>> someone else's lock" message
>>>> -(if you're unlucky) Oops
>>>>
>>>> Attached is a test element and a configuration to elicit the behavior.
>>>> I
>>>> could only replicate the problem when using a polling input source
>>>> under
>>>> heavy load.
>>>>
>>>> After fixing the above problem I was still seeing two threads inside
>>>> the
>>>> CS. For some reason the atomic_uint32_t::swap was not working
>>>> atomically
>>>> as expected. Changing this to atomic_uint32::compare_and_swap solved
>>>> the
>>>> problem. We're running x86_64 quad core Xeon CPUs.
>>>>
>>>>
>>>> Solution:
>>>> In the patch attached I added a new function click_current_thread()
>>>> which
>>>> returns a thread_info pointer. Inside of Spinlock I replace
>>>> click_current_processor() uses with click_current_thread(). This way
>>>> even
>>>> if the thread is assigned to another core it will still have the same
>>>> thread_info pointer.
>>>>
>>>> The problem with this solution is that it does not work as an index
>>>> into
>>>> an array nicely and cannot simply replace all uses of
>>>> click_current_processor(). There may be other places in the code which
>>>> make the same incorrect use of click_current_processor(), but a better
>>>> approach will be needed if the value is used to index into an array
>>>> (such
>>>> as in ReadWriteLock).
>>>>
>>>>
>>>> Thanks!
>>>> Kevin Springborn
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> click mailing list
>>>> click at amsterdam.lcs.mit.edu
>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
>>
>> _______________________________________________
>> click mailing list
>> click at amsterdam.lcs.mit.edu
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: click-uninstall-lockup.patch
Type: text/x-patch
Size: 706 bytes
Desc: not available
Url : http://amsterdam.lcs.mit.edu/pipermail/click/attachments/20080905/517a0561/attachment.bin 


More information about the click mailing list