[Click] NotifierQueue on multi-threaded Click problem.

Eddie Kohler kohler at cs.ucla.edu
Thu Sep 6 14:04:41 EDT 2007


Hi Joonwoo,

I'm really glad that we are making progress!!

Now, for the Queue.  I agree that a simple test should be sufficient to 
test race conditions.

Can you verify that in your configuration, there are *never* two threads 
pushing into a Queue at the same time, and there are *never* two threads 
pulling from a Queue at the same time?  NotifierQueue and FullNoteQueue 
are designed for *at most one* pushing thread and *at most one* pulling 
thread at a time.  The pushing thread can be different from the pulling 
thread, but NotifierQueue and FullNoteQueue don't support multiple 
threads pushing at once (or pulling at once).  Send your configuration 
to the list for us to check.

If you are sure that there aren't multiple pushing threads (or pulling 
threads), then take a look at the code.  What do you think the race 
condition is?  Locking the full bodies of those push and pull functions 
is not required.  Can you help me figure it out?

Eddie


Joonwoo Park wrote:
> Hi Eddie,
> 
> 2007/9/5, Eddie Kohler <kohler at cs.ucla.edu>:
>> Hi Joonwoo,
>>
>> Joonwoo Park wrote:
>>> There are some problems.
>>> The Task::process_pending() calls add_pending() which is reference master's
>>> member variable internally.
>>> So to call Task::process_pending(), _task_lock is need to acquired or
>>> synchronized by another method.
>> There was a problem here, but not the problem you describe.  If two or more
>> threads called Master::process_pending() at the same time, the whole locking
>> structure would break down.  (Duh.)
>>
>>>> This change is checked in.  Does it work for you?
>>> I think it has problems too.
>>> Memorizing my test of the last month, It might _empty_note is need to
>>> be synchronized.
>>> IMO the puller and pusher is watching notifier race-conditionally.
>> It seems that you are basing this on reading code rather than tests.  Can you
>> confirm that there is a problem in tests?
> 
> There is a close connection between the queue and pending.
> So I didn't test it. (And had no time...)
> But, at this time I can confirm it has problems. (based on test, plz see below)
> 
>>> I tried a patch and It's doing lock-less way.
>>> It passes pointers of master's member to process_pending(add_pending)
>>> to prevent pollution of another thread.
>>> How do you think?
>> See above; I think this patch doesn't fix the true problem.  (The failed
>> assertion was the true problem.)
>>
>> I have checked in a different patch, though, that is at least getting closer.
>>  Comment discussion and diff here:
>>
>> http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=ba637c51282065cdae499b9f5cb5320950c11271
>>
>> Does this work better for you?  Do you understand what it is doing?
> 
> Obviously It works better than before!!!
> I prepared test bed again that was blew up last month and tested it again.
> Therefore It has worked over 12hours.
> 
> But, to working, spinlock is needed to the queue. (between pull and push).
> Without the mods, I got failure. The queue stops notification in some seconds.
> 
> In addition to that, I modified SLEEPINESS_TRIGGER of the queue to 2
> from 9 to make more race-conditional situation.
> Tantalizingly, It failed in half an hour.
> As test method, is this OK? How do you think?
> 
>> Eddie
>>
> 
> Jason Park (Joonwoo Park)


More information about the click mailing list