[Click] SMP SpinLock and Deadlocks

Paine, Thomas Asa PAINETA at uwec.edu
Wed Sep 13 23:06:04 EDT 2006


...more...

	Ok, so I added some read/write handlers to my elements so I
could call "attempt" and "release" on my locks and manually manipulate
their state once I got the system locked up.  I was able to call the
handlers and cause the system to recover from a locked up state.  Thus
confirming it was a deadlock.

	I wrote a small app using the sched_setaffinity system call and
could then bind the click pid to cpu0 or cpu1 (virtual of course) after
starting click.  Not so amazingly there were no more deadlocks or
messages indicating semaphore issues.

	So what's the deal?  Is the xchgw instruction not atomic with a
HT processor?  Should I use a spinlock native to the Linux kernel
instead (haven't looked at how that's implemented)?  My hack works, but
doesn't really scale if I start using the --threads= with click-install.
Also if the router is under heavy traffic loads upon startup, there is
still a potential for deadlock between the time the process starts and
affinity is set.

Hehe, fun stuff :)


Thanks, 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
   Thomas Paine (paineta at uwec.edu) 
   University of Wisconsin - Eau Claire 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
 
-----Original Message-----
From: click-bounces at pdos.csail.mit.edu
[mailto:click-bounces at pdos.csail.mit.edu] On Behalf Of Paine, Thomas Asa
Sent: Tuesday, September 12, 2006 2:14 PM
To: click at pdos.csail.mit.edu
Subject: [Click] SMP SpinLock and Deadlocks


	I have a click package, which contains my custom elements.
Those elements basically maintain maps of IP and data.  To protect the
maps during read/write handler operations, I have wrapped the critical
sections using the SpinLock found in the sync.hh.

My typical usage...

   ...
   _lock.acquire();
   bleMap->clear();
   _lock.release();
   ...


   ...
   _lock.acquire();
   bleMap->insert(n, type);
   _lock.release();
   ...



However, I'm experiencing periodic deadlocks and am seeing these types
of messages very frequently...

	"chatter: releasing someone else's lock" ...which is coming from
the SpinLock

I'm seeing these messages even when handlers are not being called and I
nice it a bit...  "echo -19 > /click/priority", so I'm not sure of the
scope of the problem.  Clearly calling _depth-- on the wrong lock would
lead to accounting problems with the semaphore and deadlock.
	The configuration is currently running on an appliance using the
latest CVS code and the 2.6.16.3 kernel.  The appliance is using an
Intel(R) Pentium(R) 4 CPU 3.20GHz with hyperthreading, and the kernel is
running with SMP support.  This configuration has been run on SMP
enabled machines before, but with 2 physical processors, and was still
using the 2.4 kernel, so I don't want to make too many comparisons
there.  I can say I have not experienced any behavior like this in the
past with the my elements or with click.  

The click and custom packages were last configured using the following,
and I've tried several variations of config options to see if I could at
least cause a change, but I haven't seemed to find what's broken.

"./configure --prefix=/backups/staging/click --disable-userlevel"

At this point, I'm not quite sure where the problem may lie.  Is this an
affinity problem with the click kernel module, SpinLock, hyperthreading,
kernel compilation option, or am I just not configuring/using something
correctly?  Has anyone else seen these messages unexpectedly? 



Thanks, 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
   Thomas Paine (paineta at uwec.edu) 
   University of Wisconsin - Eau Claire 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
 

_______________________________________________
click mailing list
click at amsterdam.lcs.mit.edu
https://amsterdam.lcs.mit.edu/mailman/listinfo/click



More information about the click mailing list