[Click] Conversion of threads

Thu Feb 24 19:23:30 EST 2011

Thank you very, very much for this very clear explanation.

That sure is *a* bug (not sure it is *the*).  What do you think of the 
following fix?

http://www.read.cs.ucla.edu/gitweb?p=click;a=commitdiff;h=0cbe4e8bdfbee6efe127cdd714c145cd80e89af9;hp=968964c3e344c2898681e9644d51b20669fef6cd

Eddie

On 02/24/2011 02:48 PM, Bobby Longpocket wrote:
> The bug is in ThreadSafeQueue.
>
> You can't treat head and tail independently in the push/pull operations. The result of the current code is that you can fill the queue beyond capacity (turning it into an 'empty' queue as a result).
>
> This happens because a thread can be using a stale head and tail, and this can go unnoticed if the current value of tail (in push) or head (in pull) has wrapped around to the original value.
>
> For instance, suppose a thread gets into push with _head==3 and _tail==1 (one slot left), then gets delayed for a while...
>
> ...meanwhile, other pushers and pullers work on the queue... head=3;tail=2...head=50;tail=49...head=2;tail=1...
>
> ...now our thread gets back to work.  _tail/_xtail is 1 again, so we pass the compare-and-swap and we'll do our push with next-tail==2 even though h==2.
>
>
>
>
> --- On Thu, 2/24/11, Eddie Kohler<kohler at cs.ucla.edu>  wrote:
>
>> From: Eddie Kohler<kohler at cs.ucla.edu>
>> Subject: Re: [Click] Conversion of threads
>> To: "Cliff Frey"<cliff at meraki.com>
>> Cc: click at pdos.csail.mit.edu
>> Date: Thursday, February 24, 2011, 10:30 AM
>> Hey
>>
>> I thought I found a bug in ThreadSafeQueue, but convinced
>> myself it is OK.
>> There could be problems in the threading core (which has
>> been changed for
>> coreperformance work).  However, I do not observe
>> Click getting stuck.
>> Rather, I observe it performing radically slowly.  The
>> attached config prints
>> counter information, etc. every second.  The counters
>> keep going up.  Although
>> the InfiniteSources often look unscheduled, this is because
>> they *are*
>> unscheduled much of the time; the fast_reschedule() happens
>> at the end of
>> InfintieSource::run_task().  I don't know what's going
>> on.  Commenting out all
>> notification does not help.
>>
>> Eddie
>>
>>
>> On 2/24/11 8:12 AM, Cliff Frey wrote:
>>> I wanted to benchmark the difference between
>> ThreadSafeQueue and
>>> one-queue-per-thread + RoundRobinScheduler + Unqueue +
>> another-Queue.
>>>     However, I ended up finding a bug.
>>>
>>> This config:
>>>
>>> elementclass Foo {
>>>      is :: InfiniteSource ->  [0] output;
>>> };
>>> f0::Foo,f1::Foo,f2::Foo,f3::Foo,f4::Foo,f5::Foo
>>> ->  master_q :: ThreadSafeQueue
>>> ->  uq2 :: Unqueue
>>> ->  c :: Counter(COUNT_CALL 4000000 stop)
>>> ->  Discard;
>>> StaticThreadSched(
>>> f0/is 0,
>>> f1/is 1,
>>> f2/is 2,
>>> f3/is 3,
>>> f4/is 4,
>>> f5/is 5
>>> uq2 7);
>>>
>>> When run with 'click --threads=8 foo.click' ends up
>> getting suck, with an
>>> empty master_q and none of the infinitesource elements
>> scheduled.  I'm running
>>> on a i7 CPU with 4 cores + hyperthreading, giving 8
>> CPUs from linux's perspective.
>>>
>>> Also interesting is how many drops will be seen at the
>> queue in this case (not
>>> too surprising really)
>>>
>>> read f0/is.scheduled
>>> false
>>> read master_q.length
>>> 0
>>> read master_q.drops
>>> 163540
>>> read c.count
>>> 231797
>>>
>>> Cliff
>>>
>>> On Thu, Feb 24, 2011 at 7:27 AM, Eddie Kohler<kohler at cs.ucla.edu
>>> <mailto:kohler at cs.ucla.edu>>
>> wrote:
>>>
>>>       Try ThreadSafeQueue at the
>> converge point, which as Bobby points out should be
>>>       correct even when
>> multithreaded, but doesn't require locks.  However, it
>> will
>>>       still be slowish, as would be
>> anything that tried to enforce strict order,
>>>       since the queue pointers will
>> bounce among cores.
>>>
>>>       Eddie
>>>
>>>
>>>       On 2/23/11 9:36 PM, Philip
>> Prindeville wrote:
>>>        >  I was wondering... if I'm
>> running multiple duplicate paths each on its
>>>       own core and I eventually need
>> to collect them up so I can resequence
>>>       packets and pass them up into
>> user-space... how feasible is it to go from
>>>       threaded and lockless to
>> converging into a single locked queue (fan-in)?
>>>        >
>>>        >  I figure it would be the best
>> of both worlds because most of the
>>>       operations could happen
>> threaded and without locking, and the locking only
>>>       needs to be enforced in the
>> final stage...
>>>        >
>>>        >  Thanks.
>>>        >
>>>        >  -Philip
>>>        >
>>>        >
>> _______________________________________________
>>>        >  click mailing list
>>>        >  click at amsterdam.lcs.mit.edu
>> <mailto:click at amsterdam.lcs.mit.edu>
>>>        >  https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
>>     _______________________________________________
>>>       click mailing list
>>>       click at amsterdam.lcs.mit.edu
>> <mailto:click at amsterdam.lcs.mit.edu>
>>>       https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
>>>
>>
>> -----Inline Attachment Follows-----
>>
>> _______________________________________________
>> click mailing list
>> click at amsterdam.lcs.mit.edu
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>
>
>
>