[Click] Spinlock Problems

Sun Aug 24 12:02:13 EDT 2008

I think that there, strictly speaking, "foo" WOULD need to be declared 
volatile.  But I think you are right anyway.  swap() and compare_and_swap() 
gain "memory" constraints.

Eddie

Cliff Frey wrote:
> I really actually think that the "memory" is necessary... or some other 
> related thing... otherwise code like
> 
> int foo;
> void myfunc() {
> lock()
> foo++;
> unlock()
> lock()
> foo++;
> unlock()
> }
> 
> ends up being totally broken... and I believe that there is nothing 
> wrong with that code... or do you think that foo needs to be declared 
> volatile?  But that has its own downsides...
> 
> 
> here is my real example code...  no.. there is no good reason for the 
> switch to c++
> 
> static inline int the_func(int x, int* foo) {
>     asm volatile ("xchgl %0,%1"
>                   : "=r" (x), "=m" (foo)
>                   : "0" (x), "m" (foo));
>     return x;
> }
> 
> class Aoeu {
> int foo;
>   //volatile int 
> bar;                                                          
>   Aoeu();
> };
> 
> Aoeu::Aoeu() {
>     the_func(2, &lock);
>     foo++;
>     the_func(2, &lock);
> 
>     the_func(2, &lock);
>     foo++;
>     the_func(2, &lock);
> }
> 
> On Sat, Aug 23, 2008 at 7:50 AM, Eddie Kohler <kohler at cs.ucla.edu 
> <mailto:kohler at cs.ucla.edu>> wrote:
> 
>     Yeah!  I have two contrasting thoughts;  1) swap() should do exactly
>     what it says and nothing more, one builds robustness from correctly
>     and simply specified parts, and  2) you're right, "memory" is
>     strictly better.  Grr. I'll keep this in my inbox.
> 
>     Eddie
> 
> 
>     Cliff Frey wrote:
> 
>         I wonder if the extra "memory" constraint gives you a free
>         memory-barrier in some ways?  As in, it forces gcc not to
>         reorder anything... but I guess maybe the volatile should do
>         that for you... but in general I guess the memory might protect
>         you from bad code.
> 
>         example:
> 
>         static inline int the_func(int x, int* foo) {
>            asm volatile ("xchgl %0,%1"
>                          : "=r" (x), "=m" (foo)
>                          : "0" (x), "m" (foo));
>            return x;
>         }
> 
>         int foo = 1;
>         int lock = 1;
>         volatile int bar = 0;
>         void main() {
>            if (foo > 100)
>                bar++;
>            the_func(2, &lock);
>            foo++;
>         }
> 
> 
>         if you compile with gcc -O3, the value of foo is cached (and if
>         the_func was a lock acquire, it would not have protected you).
>          If you add the "memory" constraint, then it does re-read foo as
>         part of the increment.
> 
>         In any case, I can still see making the call either way... but
>         the "memory" one does seem slightly safer to me.
> 
>         Cliff
> 
>         On Fri, Aug 22, 2008 at 11:29 PM, Eddie Kohler
>         <kohler at cs.ucla.edu <mailto:kohler at cs.ucla.edu>
>         <mailto:kohler at cs.ucla.edu <mailto:kohler at cs.ucla.edu>>> wrote:
> 
>            Kevin,
> 
>            Thanks so much for this bug report!!  Apologies, this must have
>            sucked to find.
> 
> 
>            - Re: swap() is not atomic: It looks like the inline assembly was
>            bad.  Mea
>            culpa.  I had:
>                asm volatile ("xchgl %0,%1"
>                             : "=r" (x), "=m" (CLICK_ATOMIC_VAL));
>            But this doesn't tell the compiler to put "x" in the relevant
>            register BEFORE
>            the instruction.   Here's what I have now:
>                asm volatile ("xchgl %0,%1"
>                             : "=r" (x), "=m" (CLICK_ATOMIC_VAL)
>                             : "0" (x), "m" (CLICK_ATOMIC_VAL));
>            Linux goes even further; it says:
>                                   __asm__ __volatile__("xchgl %0,%1"
>                                           :"=r" (x)
>                                           :"m" (*__xg(ptr)), "0" (x)
>                                           :"memory");
>            but I don't think the extra "memory" constraint is necessary, as
>            long as we
>            say that the memory has been modified.  Boost appears to agree.
> 
>            - Re: click_current_processor().  Yes, this is a problem.  I
>         believe
>            that the
>            Linux style fix would be to use macros get_cpu() and put_cpu(),
>            which turn off
>            preemption in between.  That way a CPU number remains constant
>            while, for
>            example, a Spinlock is held.  As a bonus Click won't get
>         preempted
>            while it
>            holds a Click Spinlock.  My guess is that this is what we
>         want; but,
>            on the
>            other hand, it may be that Click Spinlocks CAN'T
> 
> 
>            I've checked in patches for both these fixes, but these fixes ARE
>            NOT TESTED,
>            as I don't have immediate access to an SMP box.  Any feedback
>         would
>            be greatly
>            appreciated.
> 
>            Eddie
> 
> 
>            springbo at cs.wisc.edu <mailto:springbo at cs.wisc.edu>
>         <mailto:springbo at cs.wisc.edu <mailto:springbo at cs.wisc.edu>> wrote:
>             > Hi,
>             >
>             > I think there is a concurrency issue with Spinlocks in
>         linuxmodule
>             > multi-threaded click (running a 2.6.19.2 <http://2.6.19.2>
>         <http://2.6.19.2>
> 
>            patched kernel, the e1000-NAPI
>             > driver and today's trunk). I've put together a temporary
>         patch,
>            but there
>             > might be further issues. Sorry, I don't have the time to
>            investigate more.
>             >
>             >
>             > Problems:
>             > The series of events are:
>             > -Thread A is running on CPU 0 and thread B is running on
>         CPU 1.
>             > -A acquires the Spinlock and executes code in the critical
>         section.
>             > -Linux schedules Thread A to run on CPU 1 and thread B to
>         run on
>            CPU 0.
>             > -B acquires the Spinlock (works because CPU 0 is the owner
>         of the
>            lock,
>             > not the thread)
>             > -A and B are both operating in the critical section
>             > -(if you're lucky) A releases the Spinlock and generates a
>         "releasing
>             > someone else's lock" message
>             > -(if you're unlucky) Oops
>             >
>             > Attached is a test element and a configuration to elicit the
>            behavior. I
>             > could only replicate the problem when using a polling input
>            source under
>             > heavy load.
>             >
>             > After fixing the above problem I was still seeing two threads
>            inside the
>             > CS. For some reason the atomic_uint32_t::swap was not working
>            atomically
>             > as expected. Changing this to atomic_uint32::compare_and_swap
>            solved the
>             > problem. We're running x86_64 quad core Xeon CPUs.
>             >
>             >
>             > Solution:
>             > In the patch attached I added a new function
>            click_current_thread() which
>             > returns a thread_info pointer. Inside of Spinlock I replace
>             > click_current_processor() uses with
>         click_current_thread(). This
>            way even
>             > if the thread is assigned to another core it will still
>         have the same
>             > thread_info pointer.
>             >
>             > The problem with this solution is that it does not work as an
>            index into
>             > an array nicely and cannot simply replace all uses of
>             > click_current_processor(). There may be other places in
>         the code
>            which
>             > make the same incorrect use of click_current_processor(),
>         but a
>            better
>             > approach will be needed if the value is used to index into an
>            array (such
>             > as in ReadWriteLock).
>             >
>             >
>             > Thanks!
>             > Kevin Springborn
>             >
>             >
>             >
>          
>          ------------------------------------------------------------------------
>             >
>             > _______________________________________________
>             > click mailing list
>             > click at amsterdam.lcs.mit.edu
>         <mailto:click at amsterdam.lcs.mit.edu>
>         <mailto:click at amsterdam.lcs.mit.edu
>         <mailto:click at amsterdam.lcs.mit.edu>>
> 
>             > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 
>            _______________________________________________
>            click mailing list
>            click at amsterdam.lcs.mit.edu
>         <mailto:click at amsterdam.lcs.mit.edu>
>         <mailto:click at amsterdam.lcs.mit.edu
>         <mailto:click at amsterdam.lcs.mit.edu>>
> 
>            https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 
> 
>