[Click] NotifierQueue on multi-threaded Click problem.

Cliff Frey cliff at meraki.net
Thu Sep 6 20:29:45 EDT 2007


Do _head/_tail need to be declared volatile?  I'm not clear on the rules for
c++ reordering operations...  It like adding some sort of barrier between
these lines might be a good idea. (and the similar set for deq, but actually
the assert() statement probably does that as a side effect...)

    _q[_tail] = p;
    _tail = next;

also, I think that I'm confused about the difference between enq/deq and
push/pull, but it might be worthwhile to add assert(p); to the push method.

And a side note that might be outside the scope of this discussion.

If we wanted to get rid of the one thread enq/deq constraint, couldn't we do
it without locks if we used cmpxchg and some other factor to dictate that
things were actually done their operation.

The other factor could either be another set of head/tail pointers, or it
could be the actual contents of the queue (if it has NULL in it or not).

But... I guess doing a cmpxchg operation per enq/deq would be just about as
expensive as acquiring/releasing a lock... so perhaps this is still out of
the question for performance reasons?  Or maybe we care about architectures
that don't support atomic operations at a reasonable speed?

Cliff

On 9/6/07, Beyers Cronje <bcronje at gmail.com> wrote:
>
> Hi Eddie,
>
> I'm using SimpleQueue in a multithreaded userlevel click configuration
> with
> two threads:
>
> Thread 1 - Main click thread. A simple "FromMyDevice ->Discard;" config.
> FromMyDevice is derived from SimpleQueue and calls deq() in run_task().
>
> Thread 2 - RX packets from NIC and calls FromMyDevice::enq().
>
> After some packets have passed through the queue I'm getting an assertion
> failed in deq() simplequeue.hh:152 under high load (300k pps). However
> when
> I wrap the deq() and enq() calls in mutexes everything works fine. This
> seems to indicate a race condition in the queue of SimpleQueue even when
> only one enq() and one deq() thread is present, which clashes with your
> statement that one push and one pull thread can access the queue
> concurrently? Unless of course SMP Click threads somehow operate
> differently
> than userlevel pthreads?
>
> I'd really like to get a non-locking Single Producer/Single Consumer FIFO
> queue working, so I'm available for any testing or suggestions. I can
> duplicate the above issue within 2 secs.
>
> Cheers
>
> Beyers Cronje
>
> On 9/6/07, Eddie Kohler <kohler at cs.ucla.edu> wrote:
> >
> > Hi Joonwoo,
> >
> > I'm really glad that we are making progress!!
> >
> > Now, for the Queue.  I agree that a simple test should be sufficient to
> > test race conditions.
> >
> > Can you verify that in your configuration, there are *never* two threads
> > pushing into a Queue at the same time, and there are *never* two threads
> > pulling from a Queue at the same time?  NotifierQueue and FullNoteQueue
> > are designed for *at most one* pushing thread and *at most one* pulling
> > thread at a time.  The pushing thread can be different from the pulling
> > thread, but NotifierQueue and FullNoteQueue don't support multiple
> > threads pushing at once (or pulling at once).  Send your configuration
> > to the list for us to check.
> >
> > If you are sure that there aren't multiple pushing threads (or pulling
> > threads), then take a look at the code.  What do you think the race
> > condition is?  Locking the full bodies of those push and pull functions
> > is not required.  Can you help me figure it out?
> >
> > Eddie
> >
> >
> > Joonwoo Park wrote:
> > > Hi Eddie,
> > >
> > > 2007/9/5, Eddie Kohler <kohler at cs.ucla.edu>:
> > >> Hi Joonwoo,
> > >>
> > >> Joonwoo Park wrote:
> > >>> There are some problems.
> > >>> The Task::process_pending() calls add_pending() which is reference
> > master's
> > >>> member variable internally.
> > >>> So to call Task::process_pending(), _task_lock is need to acquired
> or
> > >>> synchronized by another method.
> > >> There was a problem here, but not the problem you describe.  If two
> or
> > more
> > >> threads called Master::process_pending() at the same time, the whole
> > locking
> > >> structure would break down.  (Duh.)
> > >>
> > >>>> This change is checked in.  Does it work for you?
> > >>> I think it has problems too.
> > >>> Memorizing my test of the last month, It might _empty_note is need
> to
> > >>> be synchronized.
> > >>> IMO the puller and pusher is watching notifier race-conditionally.
> > >> It seems that you are basing this on reading code rather than
> > tests.  Can you
> > >> confirm that there is a problem in tests?
> > >
> > > There is a close connection between the queue and pending.
> > > So I didn't test it. (And had no time...)
> > > But, at this time I can confirm it has problems. (based on test, plz
> see
> > below)
> > >
> > >>> I tried a patch and It's doing lock-less way.
> > >>> It passes pointers of master's member to
> process_pending(add_pending)
> > >>> to prevent pollution of another thread.
> > >>> How do you think?
> > >> See above; I think this patch doesn't fix the true problem.  (The
> > failed
> > >> assertion was the true problem.)
> > >>
> > >> I have checked in a different patch, though, that is at least getting
> > closer.
> > >>  Comment discussion and diff here:
> > >>
> > >>
> >
> http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=ba637c51282065cdae499b9f5cb5320950c11271
> > >>
> > >> Does this work better for you?  Do you understand what it is doing?
> > >
> > > Obviously It works better than before!!!
> > > I prepared test bed again that was blew up last month and tested it
> > again.
> > > Therefore It has worked over 12hours.
> > >
> > > But, to working, spinlock is needed to the queue. (between pull and
> > push).
> > > Without the mods, I got failure. The queue stops notification in some
> > seconds.
> > >
> > > In addition to that, I modified SLEEPINESS_TRIGGER of the queue to 2
> > > from 9 to make more race-conditional situation.
> > > Tantalizingly, It failed in half an hour.
> > > As test method, is this OK? How do you think?
> > >
> > >> Eddie
> > >>
> > >
> > > Jason Park (Joonwoo Park)
> > _______________________________________________
> > click mailing list
> > click at amsterdam.lcs.mit.edu
> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>


More information about the click mailing list