[Click] Spinlock Problems

Joonwoo Park joonwpark81 at gmail.com
Fri Sep 5 13:50:22 EDT 2008


Hi all,

I bisected tree for this issue, and found following commit is the
first bad commit.

 commit a7dd5c93a51b3b5b8cd4a177ef379492c27859bf
 click-combine documentation: Mention that RouterLink is fake.

Any idea?

Thanks,
Joonwoo

2008/9/3  <springbo at cs.wisc.edu>:
> Hi Eddie,
>
> Sorry for the wait. Today I pulled the latest sources and rebuilt a
> vanilla click kernel on a 64-bit 4 core Xeon machine (with the click e1000
> driver installed). The locking seems to be working fine, but
> 'click-uninstall' is resulting in a lockup. See the attached dumps (click
> was started with click-install -t 3 in both cases). I still see the lockup
> with -t 1.
>
> Click uninstalls without problem with our version from a couple of weeks
> back.
>
> Unrelated, but I also saw:
> ../include/click/timestamp.hh: In constructor 'Timestamp::Timestamp(double)':
> ../include/click/timestamp.hh:938: warning: converting to 'int64_t' from
> 'double'
> all over when building userlevel click.
>
> Sorry I don't have the time to dive into this further right now.
>
> Thanks!
> Kevin
>
>
>
>
>> Kevin,
>>
>> Is there any chance you'd be able to try the newer Spinlock code any time
>> soon?
>>
>> Eddie
>>
>>
>> springbo at cs.wisc.edu wrote:
>>> Hi,
>>>
>>> I think there is a concurrency issue with Spinlocks in linuxmodule
>>> multi-threaded click (running a 2.6.19.2 patched kernel, the e1000-NAPI
>>> driver and today's trunk). I've put together a temporary patch, but
>>> there
>>> might be further issues. Sorry, I don't have the time to investigate
>>> more.
>>>
>>>
>>> Problems:
>>> The series of events are:
>>> -Thread A is running on CPU 0 and thread B is running on CPU 1.
>>> -A acquires the Spinlock and executes code in the critical section.
>>> -Linux schedules Thread A to run on CPU 1 and thread B to run on CPU 0.
>>> -B acquires the Spinlock (works because CPU 0 is the owner of the lock,
>>> not the thread)
>>> -A and B are both operating in the critical section
>>> -(if you're lucky) A releases the Spinlock and generates a "releasing
>>> someone else's lock" message
>>> -(if you're unlucky) Oops
>>>
>>> Attached is a test element and a configuration to elicit the behavior. I
>>> could only replicate the problem when using a polling input source under
>>> heavy load.
>>>
>>> After fixing the above problem I was still seeing two threads inside the
>>> CS. For some reason the atomic_uint32_t::swap was not working atomically
>>> as expected. Changing this to atomic_uint32::compare_and_swap solved the
>>> problem. We're running x86_64 quad core Xeon CPUs.
>>>
>>>
>>> Solution:
>>> In the patch attached I added a new function click_current_thread()
>>> which
>>> returns a thread_info pointer. Inside of Spinlock I replace
>>> click_current_processor() uses with click_current_thread(). This way
>>> even
>>> if the thread is assigned to another core it will still have the same
>>> thread_info pointer.
>>>
>>> The problem with this solution is that it does not work as an index into
>>> an array nicely and cannot simply replace all uses of
>>> click_current_processor(). There may be other places in the code which
>>> make the same incorrect use of click_current_processor(), but a better
>>> approach will be needed if the value is used to index into an array
>>> (such
>>> as in ReadWriteLock).
>>>
>>>
>>> Thanks!
>>> Kevin Springborn
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> click mailing list
>>> click at amsterdam.lcs.mit.edu
>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>
>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>


More information about the click mailing list