[Click] Conversion of threads

Eddie Kohler kohler at cs.ucla.edu
Fri Feb 25 12:04:43 EST 2011


For what it's worth, Cliff reports that this fix did fix *the* problem.

E


On 2/24/11 5:03 PM, Bobby Longpocket wrote:
> It may not be *the* bug, but if you hit it, it will produce *the* problem. :)
>
> I think the fix will work.
>
>
>
> --- On Thu, 2/24/11, Eddie Kohler<kohler at cs.ucla.edu>  wrote:
>
>> From: Eddie Kohler<kohler at cs.ucla.edu>
>> Subject: Re: [Click] Conversion of threads
>> To: "Bobby Longpocket"<bobbylongpocket at yahoo.com>
>> Cc: "Cliff Frey"<cliff at meraki.com>, click at pdos.csail.mit.edu
>> Date: Thursday, February 24, 2011, 4:23 PM
>> Thank you very, very much for this
>> very clear explanation.
>>
>> That sure is *a* bug (not sure it is *the*).  What do
>> you think of the
>> following fix?
>>
>> http://www.read.cs.ucla.edu/gitweb?p=click;a=commitdiff;h=0cbe4e8bdfbee6efe127cdd714c145cd80e89af9;hp=968964c3e344c2898681e9644d51b20669fef6cd
>>
>> Eddie
>>
>>
>> On 02/24/2011 02:48 PM, Bobby Longpocket wrote:
>>> The bug is in ThreadSafeQueue.
>>>
>>> You can't treat head and tail independently in the
>> push/pull operations. The result of the current code is that
>> you can fill the queue beyond capacity (turning it into an
>> 'empty' queue as a result).
>>>
>>> This happens because a thread can be using a stale
>> head and tail, and this can go unnoticed if the current
>> value of tail (in push) or head (in pull) has wrapped around
>> to the original value.
>>>
>>> For instance, suppose a thread gets into push with
>> _head==3 and _tail==1 (one slot left), then gets delayed for
>> a while...
>>>
>>> ...meanwhile, other pushers and pullers work on the
>> queue... head=3;tail=2...head=50;tail=49...head=2;tail=1...
>>>
>>> ...now our thread gets back to work.
>> _tail/_xtail is 1 again, so we pass the compare-and-swap and
>> we'll do our push with next-tail==2 even though h==2.
>>>
>>>
>>>
>>>
>>> --- On Thu, 2/24/11, Eddie Kohler<kohler at cs.ucla.edu>
>> wrote:
>>>
>>>> From: Eddie Kohler<kohler at cs.ucla.edu>
>>>> Subject: Re: [Click] Conversion of threads
>>>> To: "Cliff Frey"<cliff at meraki.com>
>>>> Cc: click at pdos.csail.mit.edu
>>>> Date: Thursday, February 24, 2011, 10:30 AM
>>>> Hey
>>>>
>>>> I thought I found a bug in ThreadSafeQueue, but
>> convinced
>>>> myself it is OK.
>>>> There could be problems in the threading core
>> (which has
>>>> been changed for
>>>> coreperformance work).  However, I do not
>> observe
>>>> Click getting stuck.
>>>> Rather, I observe it performing radically
>> slowly.  The
>>>> attached config prints
>>>> counter information, etc. every second.  The
>> counters
>>>> keep going up.  Although
>>>> the InfiniteSources often look unscheduled, this
>> is because
>>>> they *are*
>>>> unscheduled much of the time; the
>> fast_reschedule() happens
>>>> at the end of
>>>> InfintieSource::run_task().  I don't know
>> what's going
>>>> on.  Commenting out all
>>>> notification does not help.
>>>>
>>>> Eddie
>>>>
>>>>
>>>> On 2/24/11 8:12 AM, Cliff Frey wrote:
>>>>> I wanted to benchmark the difference between
>>>> ThreadSafeQueue and
>>>>> one-queue-per-thread + RoundRobinScheduler +
>> Unqueue +
>>>> another-Queue.
>>>>>       However, I ended up
>> finding a bug.
>>>>>
>>>>> This config:
>>>>>
>>>>> elementclass Foo {
>>>>>        is :: InfiniteSource
>> ->   [0] output;
>>>>> };
>>>>>
>> f0::Foo,f1::Foo,f2::Foo,f3::Foo,f4::Foo,f5::Foo
>>>>> ->   master_q :: ThreadSafeQueue
>>>>> ->   uq2 :: Unqueue
>>>>> ->   c :: Counter(COUNT_CALL 4000000
>> stop)
>>>>> ->   Discard;
>>>>> StaticThreadSched(
>>>>> f0/is 0,
>>>>> f1/is 1,
>>>>> f2/is 2,
>>>>> f3/is 3,
>>>>> f4/is 4,
>>>>> f5/is 5
>>>>> uq2 7);
>>>>>
>>>>> When run with 'click --threads=8 foo.click'
>> ends up
>>>> getting suck, with an
>>>>> empty master_q and none of the infinitesource
>> elements
>>>> scheduled.  I'm running
>>>>> on a i7 CPU with 4 cores + hyperthreading,
>> giving 8
>>>> CPUs from linux's perspective.
>>>>>
>>>>> Also interesting is how many drops will be
>> seen at the
>>>> queue in this case (not
>>>>> too surprising really)
>>>>>
>>>>> read f0/is.scheduled
>>>>> false
>>>>> read master_q.length
>>>>> 0
>>>>> read master_q.drops
>>>>> 163540
>>>>> read c.count
>>>>> 231797
>>>>>
>>>>> Cliff
>>>>>
>>>>> On Thu, Feb 24, 2011 at 7:27 AM, Eddie
>> Kohler<kohler at cs.ucla.edu
>>>>> <mailto:kohler at cs.ucla.edu>>
>>>> wrote:
>>>>>
>>>>>         Try
>> ThreadSafeQueue at the
>>>> converge point, which as Bobby points out should
>> be
>>>>>         correct even
>> when
>>>> multithreaded, but doesn't require locks.
>> However, it
>>>> will
>>>>>         still be
>> slowish, as would be
>>>> anything that tried to enforce strict order,
>>>>>         since the queue
>> pointers will
>>>> bounce among cores.
>>>>>
>>>>>         Eddie
>>>>>
>>>>>
>>>>>         On 2/23/11 9:36
>> PM, Philip
>>>> Prindeville wrote:
>>>>>          >   I was
>> wondering... if I'm
>>>> running multiple duplicate paths each on its
>>>>>         own core and I
>> eventually need
>>>> to collect them up so I can resequence
>>>>>         packets and
>> pass them up into
>>>> user-space... how feasible is it to go from
>>>>>         threaded and
>> lockless to
>>>> converging into a single locked queue (fan-in)?
>>>>>          >
>>>>>          >   I figure
>> it would be the best
>>>> of both worlds because most of the
>>>>>         operations
>> could happen
>>>> threaded and without locking, and the locking
>> only
>>>>>         needs to be
>> enforced in the
>>>> final stage...
>>>>>          >
>>>>>          >   Thanks.
>>>>>          >
>>>>>          >   -Philip
>>>>>          >
>>>>>          >
>>>> _______________________________________________
>>>>>          >   click
>> mailing list
>>>>>          >   click at amsterdam.lcs.mit.edu
>>>> <mailto:click at amsterdam.lcs.mit.edu>
>>>>>          >   https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>>
>>>>
>>     _______________________________________________
>>>>>         click mailing
>> list
>>>>>         click at amsterdam.lcs.mit.edu
>>>> <mailto:click at amsterdam.lcs.mit.edu>
>>>>>         https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>>
>>>>>
>>>>
>>>> -----Inline Attachment Follows-----
>>>>
>>>> _______________________________________________
>>>> click mailing list
>>>> click at amsterdam.lcs.mit.edu
>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> click mailing list
>> click at amsterdam.lcs.mit.edu
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>
>
>
>


More information about the click mailing list