[Click] Race condition with FullNoteQueue signaling in multithreading

Beyers Cronje bcronje at gmail.com
Wed Oct 13 20:01:03 EDT 2010


Hi Eddie,

Thank you for your response and looking into this. I keep forgetting about
your excellent Script element, during my tests I used a much more
rudimentary method of verifying the issue :) I used a Counter element before
and after the queue and manually read the counter's rate and count handlers.
After a while I noticed that both counters rate dropped to 0 and and both
counters count stopped increasing, *permanently*. From what I can see the
tasks NEVER re-schedule themselves.
The only way I found to reschedule them was to manually issue a reset on the
queue, which seems to re-activate the notifier signals and in turn schedule
the tasks.

One thing I forgot to mention, I set SLEEPINESS_TRIGGER to 1, this seems to
trigger the condition quicker. Also, strangely if I set the first
InfiniteSoure LIMIT to 1000 instead of -1 (as per original config posted) it
also seems to trigger the condition much quicker, in fact I couldn't
reproduce the issue with both InfiniteSource elements set to -1 during a
10min test run where with LIMIT 1000 it always happens within 10min of
testing time.

I'm out of the office now, but will test with both your script and Cliff's
script tomorrow and post back my results.

Beyers

On Wed, Oct 13, 2010 at 5:06 PM, Eddie Kohler <kohler at cs.ucla.edu> wrote:

> Hi Beyers!
>
> I am not sure that I agree with your analysis about the queue getting
> "stuck."
>
> Here is the configuration I used, a tweak of yours:
>
> //////////////////////////////////////////////////////////////
>
> is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
> LIMIT -1, STOP false) -> q1::FullNoteQueue -> uq1::Unqueue -> Discard;
>
> is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
> LIMIT -1, STOP false) -> q2::FullNoteQueue -> uq2::Unqueue -> Discard;
>
>
> StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);
> Script(TYPE ACTIVE,
>        init x 0,
>        set x $(add $x 1),
>        goto foo $(or $(is1.scheduled) $(uq1.scheduled)),
>        print "$x: is1.scheduled false, uq1.scheduled false, q1.length
> $(q1.length)",
>        label foo,
>        goto bar $(or $(is2.scheduled) $(uq2.scheduled)),
>        print "$x: is2.scheduled false, uq2.scheduled false, q2.length
> $(q2.length)",
>        label bar,
>        wait 0.5s,
>        loop);
> //////////////////////////////////////////////////////////////
>
> This should print a message whenever neither source nor sink is scheduled.
>
> I also tried this with "FullNoteQueue" replaced with "ThreadSafeQueue."
>
> If a queue was truly getting 'stuck,' we would expect to see something like
>
> 745: is1.scheduled false, uq1.scheduled false
> 746: is1.scheduled false, uq1.scheduled false
> 747: is1.scheduled false, uq1.scheduled false
> 748: is1.scheduled false, uq1.scheduled false
> 749: is1.scheduled false, uq1.scheduled false....
>
> with no gaps, because something would go unscheduled *and stay that way*.
>
> But that's not what I see.  I see this:
>
> 1: is1.scheduled false, uq1.scheduled false, q1.length 17
> 1: is2.scheduled false, uq2.scheduled false, q2.length 0
> 20: is1.scheduled false, uq1.scheduled false, q1.length 14
> 22: is1.scheduled false, uq1.scheduled false, q1.length 0
> 28: is1.scheduled false, uq1.scheduled false, q1.length 14
> 41: is1.scheduled false, uq1.scheduled false, q1.length 0
> 47: is2.scheduled false, uq2.scheduled false, q2.length 0
> 55: is2.scheduled false, uq2.scheduled false, q2.length 0
> 56: is2.scheduled false, uq2.scheduled false, q2.length 0
> 76: is2.scheduled false, uq2.scheduled false, q2.length 0
>
> *** Now this, on the other hand, I would expect.  In a multithreaded
> environment, notification will cause some delay between the upstream &
> downstream sides of a queue.  This is because the upstream side must wake up
> the downstream side, or vice versa.  The cross-thread communication required
> for this can happen a bit slowly, leaving a moment (maybe more than a
> moment) where neither task appears to be scheduled.  (In fact, at least one
> of the tasks will be "pre-scheduled.")
>
> So my questions for you are this.  Did you really mean "stuck"?  Have you
> verified that the tasks NEVER re-schedule themselves?  Do you agree with my
> analysis?  Or, are you saying that the "pauses" are too long, even though
> the tasks do reschedule themselves?
>
> Eddie
>
>
>
> On 10/12/2010 05:23 PM, Beyers Cronje wrote:
>
>>  Hi Eddie, list,
>>
>> I've come across a multithreading bug in queue signal handling which I can
>> reproduce fairly consistently with userlevel multithreading Click. The
>> symptoms are that either the upstream or downstream task from the queue
>> are
>> unscheduled even though the notifier signal is active, this occurs when
>> the
>> queue either becomes full or empty. To illustrate here is the config I use
>> and some debug handler outputs:
>>
>> //******************** Config ************************
>> is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
>> LIMIT 1000, STOP false) ->  ThreadSafeQueue ->  uq1::Unqueue ->  Discard;
>> is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
>> LIMIT -1, STOP false) ->  q::ThreadSafeQueue ->  uq2::Unqueue ->  Discard;
>>
>> StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);
>>
>>
>> //******************** Debug Handler Output when upstream push task is
>> stuck
>> ************************
>> read q.length
>> 200 Read handler 'q.length' OK
>> DATA 1
>> 0
>>
>> read q.fullnote_state
>> 200 Read handler 'q.fullnote_state' OK
>> DATA 131
>> empty notifier off
>> task 0x25b8830 [uq2 :: Unqueue] unscheduled
>> full notifier on
>> task 0x25b8350 [is2 :: InfiniteSource] unscheduled
>>
>> //******************** Debug Handler Output when downstream pull task is
>> stuck ************************
>> read q.length
>> 200 Read handler 'q.length' OK
>> DATA 4
>> 1000
>>
>> read q.fullnote_state
>> 200 Read handler 'q.fullnote_state' OK
>> DATA 131
>> empty notifier on
>> task 0x1c6f830 [uq2 :: Unqueue] unscheduled
>> full notifier off
>> task 0x1c6f350 [is2 :: InfiniteSource] unscheduled
>>
>>
>> //*****************************************************************************************************************
>>
>> Clearly the notifier states are correct, but somehow the relevant task is
>> not rescheduled. The above config uses ThreadSafeQueue but I verified that
>> the same issue occurs when using FullNoteQueue.
>> The obvious places to look are
>>
>> ActiveNotifier::set_active,FullNoteQueue::push_success/push_failure/pull_success/pull_failure
>> but so far I haven't spotted anything wrong with the relevant code,
>> clearly
>> I'm overlooking something.
>>
>> Have you or anyone else on the list got any suggestions?
>>
>> If it helps, I'm running click source from a couple of weeks back, default
>> 64bit Fedora Core 13 kernel with preemption enabled
>> (2.6.33.3-85.fc13.x86_64 ), Intel Dual-Core CPU.
>> I start userlevel click with the following command:  'click --threads=4
>> conf/threadtest.click -p 777'
>>
>> Cheers
>>
>> Beyers
>> _______________________________________________
>> click mailing list
>> click at amsterdam.lcs.mit.edu
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>
>


More information about the click mailing list