[Click] NotifierQueue on multi-threaded Click problem.

Beyers Cronje bcronje at gmail.com
Thu Sep 6 19:30:43 EDT 2007


Hi Eddie,

I'm using SimpleQueue in a multithreaded userlevel click configuration with
two threads:

Thread 1 - Main click thread. A simple "FromMyDevice ->Discard;" config.
FromMyDevice is derived from SimpleQueue and calls deq() in run_task().

Thread 2 - RX packets from NIC and calls FromMyDevice::enq().

After some packets have passed through the queue I'm getting an assertion
failed in deq() simplequeue.hh:152 under high load (300k pps). However when
I wrap the deq() and enq() calls in mutexes everything works fine. This
seems to indicate a race condition in the queue of SimpleQueue even when
only one enq() and one deq() thread is present, which clashes with your
statement that one push and one pull thread can access the queue
concurrently? Unless of course SMP Click threads somehow operate differently
than userlevel pthreads?

I'd really like to get a non-locking Single Producer/Single Consumer FIFO
queue working, so I'm available for any testing or suggestions. I can
duplicate the above issue within 2 secs.

Cheers

Beyers Cronje

On 9/6/07, Eddie Kohler <kohler at cs.ucla.edu> wrote:
>
> Hi Joonwoo,
>
> I'm really glad that we are making progress!!
>
> Now, for the Queue.  I agree that a simple test should be sufficient to
> test race conditions.
>
> Can you verify that in your configuration, there are *never* two threads
> pushing into a Queue at the same time, and there are *never* two threads
> pulling from a Queue at the same time?  NotifierQueue and FullNoteQueue
> are designed for *at most one* pushing thread and *at most one* pulling
> thread at a time.  The pushing thread can be different from the pulling
> thread, but NotifierQueue and FullNoteQueue don't support multiple
> threads pushing at once (or pulling at once).  Send your configuration
> to the list for us to check.
>
> If you are sure that there aren't multiple pushing threads (or pulling
> threads), then take a look at the code.  What do you think the race
> condition is?  Locking the full bodies of those push and pull functions
> is not required.  Can you help me figure it out?
>
> Eddie
>
>
> Joonwoo Park wrote:
> > Hi Eddie,
> >
> > 2007/9/5, Eddie Kohler <kohler at cs.ucla.edu>:
> >> Hi Joonwoo,
> >>
> >> Joonwoo Park wrote:
> >>> There are some problems.
> >>> The Task::process_pending() calls add_pending() which is reference
> master's
> >>> member variable internally.
> >>> So to call Task::process_pending(), _task_lock is need to acquired or
> >>> synchronized by another method.
> >> There was a problem here, but not the problem you describe.  If two or
> more
> >> threads called Master::process_pending() at the same time, the whole
> locking
> >> structure would break down.  (Duh.)
> >>
> >>>> This change is checked in.  Does it work for you?
> >>> I think it has problems too.
> >>> Memorizing my test of the last month, It might _empty_note is need to
> >>> be synchronized.
> >>> IMO the puller and pusher is watching notifier race-conditionally.
> >> It seems that you are basing this on reading code rather than
> tests.  Can you
> >> confirm that there is a problem in tests?
> >
> > There is a close connection between the queue and pending.
> > So I didn't test it. (And had no time...)
> > But, at this time I can confirm it has problems. (based on test, plz see
> below)
> >
> >>> I tried a patch and It's doing lock-less way.
> >>> It passes pointers of master's member to process_pending(add_pending)
> >>> to prevent pollution of another thread.
> >>> How do you think?
> >> See above; I think this patch doesn't fix the true problem.  (The
> failed
> >> assertion was the true problem.)
> >>
> >> I have checked in a different patch, though, that is at least getting
> closer.
> >>  Comment discussion and diff here:
> >>
> >>
> http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=ba637c51282065cdae499b9f5cb5320950c11271
> >>
> >> Does this work better for you?  Do you understand what it is doing?
> >
> > Obviously It works better than before!!!
> > I prepared test bed again that was blew up last month and tested it
> again.
> > Therefore It has worked over 12hours.
> >
> > But, to working, spinlock is needed to the queue. (between pull and
> push).
> > Without the mods, I got failure. The queue stops notification in some
> seconds.
> >
> > In addition to that, I modified SLEEPINESS_TRIGGER of the queue to 2
> > from 9 to make more race-conditional situation.
> > Tantalizingly, It failed in half an hour.
> > As test method, is this OK? How do you think?
> >
> >> Eddie
> >>
> >
> > Jason Park (Joonwoo Park)
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>


More information about the click mailing list