[Click] Race condition with FullNoteQueue signaling in multithreading

Cliff Frey cliff at meraki.com
Wed Oct 13 12:01:53 EDT 2010


I am getting the stuck behavior that Beyers describes.  However if I add the
script, the problem for the most part goes away.

Specifically, if I use this config:

//////////////////////////////////////////////////////////////
is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
LIMIT 1000, STOP false)
 -> q1::ThreadSafeQueue
 -> uq1::Unqueue
 -> Discard;

is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
LIMIT -1, STOP false)
 -> q2::ThreadSafeQueue
 -> uq2::Unqueue
 -> Discard;

StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);

Script(TYPE ACTIVE,
       init x 0,
       set x $(add $x 1),
       read uq2.count,
       write uq2.reset_counts,
       wait 5s,
       loop);
//////////////////////////////////////////////////////////////

Then when I run with --threads=4, I get highly variable (and low) output:
uq2.count:
11973
uq2.count:
332166
uq2.count:
86556
uq2.count:
2884924
uq2.count:
419946

compared to if I run with threads=1
uq2.count:
12183593
uq2.count:
12186308
uq2.count:
12091170

This makes me think that it is a problem with the thread scheduling logic,
not with ThreadSafeQueue, because the scheduling of the Script task seems to
kick things correctly.  I can look into this more, but I'll wait for some
feedback from Eddie first...

Cliff

On Wed, Oct 13, 2010 at 8:06 AM, Eddie Kohler <kohler at cs.ucla.edu> wrote:

> Hi Beyers!
>
> I am not sure that I agree with your analysis about the queue getting
> "stuck."
>
> Here is the configuration I used, a tweak of yours:
>
> //////////////////////////////////////////////////////////////
> is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
> LIMIT -1, STOP false) -> q1::FullNoteQueue -> uq1::Unqueue -> Discard;
> is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
> LIMIT -1, STOP false) -> q2::FullNoteQueue -> uq2::Unqueue -> Discard;
>
> StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);
> Script(TYPE ACTIVE,
>        init x 0,
>        set x $(add $x 1),
>        goto foo $(or $(is1.scheduled) $(uq1.scheduled)),
>        print "$x: is1.scheduled false, uq1.scheduled false, q1.length
> $(q1.length)",
>        label foo,
>        goto bar $(or $(is2.scheduled) $(uq2.scheduled)),
>        print "$x: is2.scheduled false, uq2.scheduled false, q2.length
> $(q2.length)",
>        label bar,
>        wait 0.5s,
>        loop);
> //////////////////////////////////////////////////////////////
>
> This should print a message whenever neither source nor sink is scheduled.
>
> I also tried this with "FullNoteQueue" replaced with "ThreadSafeQueue."
>
> If a queue was truly getting 'stuck,' we would expect to see something like
>
> 745: is1.scheduled false, uq1.scheduled false
> 746: is1.scheduled false, uq1.scheduled false
> 747: is1.scheduled false, uq1.scheduled false
> 748: is1.scheduled false, uq1.scheduled false
> 749: is1.scheduled false, uq1.scheduled false....
>
> with no gaps, because something would go unscheduled *and stay that way*.
>
> But that's not what I see.  I see this:
>
> 1: is1.scheduled false, uq1.scheduled false, q1.length 17
> 1: is2.scheduled false, uq2.scheduled false, q2.length 0
> 20: is1.scheduled false, uq1.scheduled false, q1.length 14
> 22: is1.scheduled false, uq1.scheduled false, q1.length 0
> 28: is1.scheduled false, uq1.scheduled false, q1.length 14
> 41: is1.scheduled false, uq1.scheduled false, q1.length 0
> 47: is2.scheduled false, uq2.scheduled false, q2.length 0
> 55: is2.scheduled false, uq2.scheduled false, q2.length 0
> 56: is2.scheduled false, uq2.scheduled false, q2.length 0
> 76: is2.scheduled false, uq2.scheduled false, q2.length 0
>
> *** Now this, on the other hand, I would expect.  In a multithreaded
> environment, notification will cause some delay between the upstream &
> downstream sides of a queue.  This is because the upstream side must wake
> up
> the downstream side, or vice versa.  The cross-thread communication
> required
> for this can happen a bit slowly, leaving a moment (maybe more than a
> moment)
> where neither task appears to be scheduled.  (In fact, at least one of the
> tasks will be "pre-scheduled.")
>
> So my questions for you are this.  Did you really mean "stuck"?  Have you
> verified that the tasks NEVER re-schedule themselves?  Do you agree with my
> analysis?  Or, are you saying that the "pauses" are too long, even though
> the
> tasks do reschedule themselves?
>
> Eddie
>
>
> On 10/12/2010 05:23 PM, Beyers Cronje wrote:
> > Hi Eddie, list,
> >
> > I've come across a multithreading bug in queue signal handling which I
> can
> > reproduce fairly consistently with userlevel multithreading Click. The
> > symptoms are that either the upstream or downstream task from the queue
> are
> > unscheduled even though the notifier signal is active, this occurs when
> the
> > queue either becomes full or empty. To illustrate here is the config I
> use
> > and some debug handler outputs:
> >
> > //******************** Config ************************
> > is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
> > LIMIT 1000, STOP false) ->  ThreadSafeQueue ->  uq1::Unqueue ->  Discard;
> > is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
> > LIMIT -1, STOP false) ->  q::ThreadSafeQueue ->  uq2::Unqueue ->
>  Discard;
> >
> > StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);
> >
> >
> > //******************** Debug Handler Output when upstream push task is
> stuck
> > ************************
> > read q.length
> > 200 Read handler 'q.length' OK
> > DATA 1
> > 0
> >
> > read q.fullnote_state
> > 200 Read handler 'q.fullnote_state' OK
> > DATA 131
> > empty notifier off
> > task 0x25b8830 [uq2 :: Unqueue] unscheduled
> > full notifier on
> > task 0x25b8350 [is2 :: InfiniteSource] unscheduled
> >
> > //******************** Debug Handler Output when downstream pull task is
> > stuck ************************
> > read q.length
> > 200 Read handler 'q.length' OK
> > DATA 4
> > 1000
> >
> > read q.fullnote_state
> > 200 Read handler 'q.fullnote_state' OK
> > DATA 131
> > empty notifier on
> > task 0x1c6f830 [uq2 :: Unqueue] unscheduled
> > full notifier off
> > task 0x1c6f350 [is2 :: InfiniteSource] unscheduled
> >
> >
> //*****************************************************************************************************************
> >
> > Clearly the notifier states are correct, but somehow the relevant task is
> > not rescheduled. The above config uses ThreadSafeQueue but I verified
> that
> > the same issue occurs when using FullNoteQueue.
> > The obvious places to look are
> >
> ActiveNotifier::set_active,FullNoteQueue::push_success/push_failure/pull_success/pull_failure
> > but so far I haven't spotted anything wrong with the relevant code,
> clearly
> > I'm overlooking something.
> >
> > Have you or anyone else on the list got any suggestions?
> >
> > If it helps, I'm running click source from a couple of weeks back,
> default
> > 64bit Fedora Core 13 kernel with preemption enabled
> > (2.6.33.3-85.fc13.x86_64 ), Intel Dual-Core CPU.
> > I start userlevel click with the following command:  'click --threads=4
> > conf/threadtest.click -p 777'
> >
> > Cheers
> >
> > Beyers
> > _______________________________________________
> > click mailing list
> > click at amsterdam.lcs.mit.edu
> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>


More information about the click mailing list