[Click] NotifierQueue on multi-threaded Click problem.

Eddie Kohler kohler at cs.ucla.edu
Fri Sep 7 14:42:47 EDT 2007


Hi Cliff,

Cliff Frey wrote:
> Do _head/_tail need to be declared volatile?

Yes!  It is probably helpful for them to be declared volatile, and I've 
checked in that change.

>  I'm not clear on the rules 
> for c++ reordering operations...  It like adding some sort of barrier 
> between these lines might be a good idea.

It might be important on non-x86 architectures, but x86 processors implement 
write ordering, so if _q[_tail] is assigned before _tail, then other threads 
will see this as well.  I think making _q and _head and _tail volatile is 
enough to tell the compiler not to reorder.

> (and the similar set for deq, 
> but actually the assert() statement probably does that as a side effect...)
> 
>     _q[_tail] = p;
>     _tail = next;
> 
> also, I think that I'm confused about the difference between enq/deq and 
> push/pull, but it might be worthwhile to add assert(p); to the push method.
> 
> And a side note that might be outside the scope of this discussion.
> 
> If we wanted to get rid of the one thread enq/deq constraint, couldn't 
> we do it without locks if we used cmpxchg and some other factor to 
> dictate that things were actually done their operation.
> 
> The other factor could either be another set of head/tail pointers, or 
> it could be the actual contents of the queue (if it has NULL in it or not).
> 
> But... I guess doing a cmpxchg operation per enq/deq would be just about 
> as expensive as acquiring/releasing a lock... so perhaps this is still 
> out of the question for performance reasons?  Or maybe we care about 
> architectures that don't support atomic operations at a reasonable speed?

Even adding a single atomic operation can slow things down by a surprising 
amount; see later in the thread.  But I'm not totally averse to this.

Eddie


> 
> Cliff
> 
> On 9/6/07, *Beyers Cronje* <bcronje at gmail.com 
> <mailto:bcronje at gmail.com>> wrote:
> 
>     Hi Eddie,
> 
>     I'm using SimpleQueue in a multithreaded userlevel click
>     configuration with
>     two threads:
> 
>     Thread 1 - Main click thread. A simple "FromMyDevice ->Discard;" config.
>     FromMyDevice is derived from SimpleQueue and calls deq() in run_task().
> 
>     Thread 2 - RX packets from NIC and calls FromMyDevice::enq().
> 
>     After some packets have passed through the queue I'm getting an
>     assertion
>     failed in deq() simplequeue.hh:152 under high load (300k pps).
>     However when
>     I wrap the deq() and enq() calls in mutexes everything works fine. This
>     seems to indicate a race condition in the queue of SimpleQueue even when
>     only one enq() and one deq() thread is present, which clashes with your
>     statement that one push and one pull thread can access the queue
>     concurrently? Unless of course SMP Click threads somehow operate
>     differently
>     than userlevel pthreads?
> 
>     I'd really like to get a non-locking Single Producer/Single Consumer
>     FIFO
>     queue working, so I'm available for any testing or suggestions. I can
>     duplicate the above issue within 2 secs.
> 
>     Cheers
> 
>     Beyers Cronje
> 
>     On 9/6/07, Eddie Kohler < kohler at cs.ucla.edu
>     <mailto:kohler at cs.ucla.edu>> wrote:
>      >
>      > Hi Joonwoo,
>      >
>      > I'm really glad that we are making progress!!
>      >
>      > Now, for the Queue.  I agree that a simple test should be
>     sufficient to
>      > test race conditions.
>      >
>      > Can you verify that in your configuration, there are *never* two
>     threads
>      > pushing into a Queue at the same time, and there are *never* two
>     threads
>      > pulling from a Queue at the same time?  NotifierQueue and
>     FullNoteQueue
>      > are designed for *at most one* pushing thread and *at most one*
>     pulling
>      > thread at a time.  The pushing thread can be different from the
>     pulling
>      > thread, but NotifierQueue and FullNoteQueue don't support multiple
>      > threads pushing at once (or pulling at once).  Send your
>     configuration
>      > to the list for us to check.
>      >
>      > If you are sure that there aren't multiple pushing threads (or
>     pulling
>      > threads), then take a look at the code.  What do you think the race
>      > condition is?  Locking the full bodies of those push and pull
>     functions
>      > is not required.  Can you help me figure it out?
>      >
>      > Eddie
>      >
>      >
>      > Joonwoo Park wrote:
>      > > Hi Eddie,
>      > >
>      > > 2007/9/5, Eddie Kohler <kohler at cs.ucla.edu
>     <mailto:kohler at cs.ucla.edu>>:
>      > >> Hi Joonwoo,
>      > >>
>      > >> Joonwoo Park wrote:
>      > >>> There are some problems.
>      > >>> The Task::process_pending() calls add_pending() which is
>     reference
>      > master's
>      > >>> member variable internally.
>      > >>> So to call Task::process_pending(), _task_lock is need to
>     acquired or
>      > >>> synchronized by another method.
>      > >> There was a problem here, but not the problem you
>     describe.  If two or
>      > more
>      > >> threads called Master::process_pending() at the same time, the
>     whole
>      > locking
>      > >> structure would break down.  (Duh.)
>      > >>
>      > >>>> This change is checked in.  Does it work for you?
>      > >>> I think it has problems too.
>      > >>> Memorizing my test of the last month, It might _empty_note is
>     need to
>      > >>> be synchronized.
>      > >>> IMO the puller and pusher is watching notifier
>     race-conditionally.
>      > >> It seems that you are basing this on reading code rather than
>      > tests.  Can you
>      > >> confirm that there is a problem in tests?
>      > >
>      > > There is a close connection between the queue and pending.
>      > > So I didn't test it. (And had no time...)
>      > > But, at this time I can confirm it has problems. (based on
>     test, plz see
>      > below)
>      > >
>      > >>> I tried a patch and It's doing lock-less way.
>      > >>> It passes pointers of master's member to
>     process_pending(add_pending)
>      > >>> to prevent pollution of another thread.
>      > >>> How do you think?
>      > >> See above; I think this patch doesn't fix the true problem.  (The
>      > failed
>      > >> assertion was the true problem.)
>      > >>
>      > >> I have checked in a different patch, though, that is at least
>     getting
>      > closer.
>      > >>  Comment discussion and diff here:
>      > >>
>      > >>
>      >
>     http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=ba637c51282065cdae499b9f5cb5320950c11271
>     <http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=ba637c51282065cdae499b9f5cb5320950c11271>
>      > >>
>      > >> Does this work better for you?  Do you understand what it is
>     doing?
>      > >
>      > > Obviously It works better than before!!!
>      > > I prepared test bed again that was blew up last month and
>     tested it
>      > again.
>      > > Therefore It has worked over 12hours.
>      > >
>      > > But, to working, spinlock is needed to the queue. (between pull and
>      > push).
>      > > Without the mods, I got failure. The queue stops notification
>     in some
>      > seconds.
>      > >
>      > > In addition to that, I modified SLEEPINESS_TRIGGER of the queue
>     to 2
>      > > from 9 to make more race-conditional situation.
>      > > Tantalizingly, It failed in half an hour.
>      > > As test method, is this OK? How do you think?
>      > >
>      > >> Eddie
>      > >>
>      > >
>      > > Jason Park (Joonwoo Park)
>      > _______________________________________________
>      > click mailing list
>      > click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
>      > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>     <https://amsterdam.lcs.mit.edu/mailman/listinfo/click>
>      >
>     _______________________________________________
>     click mailing list
>     click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
>     https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 
> 


More information about the click mailing list