[Click] NotifierQueue on multi-threaded Click problem.
Eddie Kohler
kohler at cs.ucla.edu
Fri Sep 7 14:42:47 EDT 2007
Hi Cliff,
Cliff Frey wrote:
> Do _head/_tail need to be declared volatile?
Yes! It is probably helpful for them to be declared volatile, and I've
checked in that change.
> I'm not clear on the rules
> for c++ reordering operations... It like adding some sort of barrier
> between these lines might be a good idea.
It might be important on non-x86 architectures, but x86 processors implement
write ordering, so if _q[_tail] is assigned before _tail, then other threads
will see this as well. I think making _q and _head and _tail volatile is
enough to tell the compiler not to reorder.
> (and the similar set for deq,
> but actually the assert() statement probably does that as a side effect...)
>
> _q[_tail] = p;
> _tail = next;
>
> also, I think that I'm confused about the difference between enq/deq and
> push/pull, but it might be worthwhile to add assert(p); to the push method.
>
> And a side note that might be outside the scope of this discussion.
>
> If we wanted to get rid of the one thread enq/deq constraint, couldn't
> we do it without locks if we used cmpxchg and some other factor to
> dictate that things were actually done their operation.
>
> The other factor could either be another set of head/tail pointers, or
> it could be the actual contents of the queue (if it has NULL in it or not).
>
> But... I guess doing a cmpxchg operation per enq/deq would be just about
> as expensive as acquiring/releasing a lock... so perhaps this is still
> out of the question for performance reasons? Or maybe we care about
> architectures that don't support atomic operations at a reasonable speed?
Even adding a single atomic operation can slow things down by a surprising
amount; see later in the thread. But I'm not totally averse to this.
Eddie
>
> Cliff
>
> On 9/6/07, *Beyers Cronje* <bcronje at gmail.com
> <mailto:bcronje at gmail.com>> wrote:
>
> Hi Eddie,
>
> I'm using SimpleQueue in a multithreaded userlevel click
> configuration with
> two threads:
>
> Thread 1 - Main click thread. A simple "FromMyDevice ->Discard;" config.
> FromMyDevice is derived from SimpleQueue and calls deq() in run_task().
>
> Thread 2 - RX packets from NIC and calls FromMyDevice::enq().
>
> After some packets have passed through the queue I'm getting an
> assertion
> failed in deq() simplequeue.hh:152 under high load (300k pps).
> However when
> I wrap the deq() and enq() calls in mutexes everything works fine. This
> seems to indicate a race condition in the queue of SimpleQueue even when
> only one enq() and one deq() thread is present, which clashes with your
> statement that one push and one pull thread can access the queue
> concurrently? Unless of course SMP Click threads somehow operate
> differently
> than userlevel pthreads?
>
> I'd really like to get a non-locking Single Producer/Single Consumer
> FIFO
> queue working, so I'm available for any testing or suggestions. I can
> duplicate the above issue within 2 secs.
>
> Cheers
>
> Beyers Cronje
>
> On 9/6/07, Eddie Kohler < kohler at cs.ucla.edu
> <mailto:kohler at cs.ucla.edu>> wrote:
> >
> > Hi Joonwoo,
> >
> > I'm really glad that we are making progress!!
> >
> > Now, for the Queue. I agree that a simple test should be
> sufficient to
> > test race conditions.
> >
> > Can you verify that in your configuration, there are *never* two
> threads
> > pushing into a Queue at the same time, and there are *never* two
> threads
> > pulling from a Queue at the same time? NotifierQueue and
> FullNoteQueue
> > are designed for *at most one* pushing thread and *at most one*
> pulling
> > thread at a time. The pushing thread can be different from the
> pulling
> > thread, but NotifierQueue and FullNoteQueue don't support multiple
> > threads pushing at once (or pulling at once). Send your
> configuration
> > to the list for us to check.
> >
> > If you are sure that there aren't multiple pushing threads (or
> pulling
> > threads), then take a look at the code. What do you think the race
> > condition is? Locking the full bodies of those push and pull
> functions
> > is not required. Can you help me figure it out?
> >
> > Eddie
> >
> >
> > Joonwoo Park wrote:
> > > Hi Eddie,
> > >
> > > 2007/9/5, Eddie Kohler <kohler at cs.ucla.edu
> <mailto:kohler at cs.ucla.edu>>:
> > >> Hi Joonwoo,
> > >>
> > >> Joonwoo Park wrote:
> > >>> There are some problems.
> > >>> The Task::process_pending() calls add_pending() which is
> reference
> > master's
> > >>> member variable internally.
> > >>> So to call Task::process_pending(), _task_lock is need to
> acquired or
> > >>> synchronized by another method.
> > >> There was a problem here, but not the problem you
> describe. If two or
> > more
> > >> threads called Master::process_pending() at the same time, the
> whole
> > locking
> > >> structure would break down. (Duh.)
> > >>
> > >>>> This change is checked in. Does it work for you?
> > >>> I think it has problems too.
> > >>> Memorizing my test of the last month, It might _empty_note is
> need to
> > >>> be synchronized.
> > >>> IMO the puller and pusher is watching notifier
> race-conditionally.
> > >> It seems that you are basing this on reading code rather than
> > tests. Can you
> > >> confirm that there is a problem in tests?
> > >
> > > There is a close connection between the queue and pending.
> > > So I didn't test it. (And had no time...)
> > > But, at this time I can confirm it has problems. (based on
> test, plz see
> > below)
> > >
> > >>> I tried a patch and It's doing lock-less way.
> > >>> It passes pointers of master's member to
> process_pending(add_pending)
> > >>> to prevent pollution of another thread.
> > >>> How do you think?
> > >> See above; I think this patch doesn't fix the true problem. (The
> > failed
> > >> assertion was the true problem.)
> > >>
> > >> I have checked in a different patch, though, that is at least
> getting
> > closer.
> > >> Comment discussion and diff here:
> > >>
> > >>
> >
> http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=ba637c51282065cdae499b9f5cb5320950c11271
> <http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=ba637c51282065cdae499b9f5cb5320950c11271>
> > >>
> > >> Does this work better for you? Do you understand what it is
> doing?
> > >
> > > Obviously It works better than before!!!
> > > I prepared test bed again that was blew up last month and
> tested it
> > again.
> > > Therefore It has worked over 12hours.
> > >
> > > But, to working, spinlock is needed to the queue. (between pull and
> > push).
> > > Without the mods, I got failure. The queue stops notification
> in some
> > seconds.
> > >
> > > In addition to that, I modified SLEEPINESS_TRIGGER of the queue
> to 2
> > > from 9 to make more race-conditional situation.
> > > Tantalizingly, It failed in half an hour.
> > > As test method, is this OK? How do you think?
> > >
> > >> Eddie
> > >>
> > >
> > > Jason Park (Joonwoo Park)
> > _______________________________________________
> > click mailing list
> > click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> <https://amsterdam.lcs.mit.edu/mailman/listinfo/click>
> >
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
More information about the click
mailing list