[Click] Race condition with FullNoteQueue signaling in multithreading
Eddie Kohler
kohler at cs.ucla.edu
Wed Oct 13 14:20:40 EDT 2010
This script is very useful. Perhaps (unshockingly :| god damn it) there is
some problem with waking up a sleeping thread at userlevel; you may recall a
lot of retarded drama around that issue. The Script may help by cauisng
threads to sleep for limited amounts of time. I will look.
Eddie
On 10/13/2010 09:01 AM, Cliff Frey wrote:
> I am getting the stuck behavior that Beyers describes. However if I add
> the script, the problem for the most part goes away.
>
> Specifically, if I use this config:
>
> //////////////////////////////////////////////////////////////
> is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>,
> LIMIT 1000, STOP false)
> -> q1::ThreadSafeQueue
> -> uq1::Unqueue
> -> Discard;
>
> is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>,
> LIMIT -1, STOP false)
> -> q2::ThreadSafeQueue
> -> uq2::Unqueue
> -> Discard;
>
> StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);
>
> Script(TYPE ACTIVE,
> init x 0,
> set x $(add $x 1),
> read uq2.count,
> write uq2.reset_counts,
> wait 5s,
> loop);
> //////////////////////////////////////////////////////////////
>
> Then when I run with --threads=4, I get highly variable (and low) output:
> uq2.count:
> 11973
> uq2.count:
> 332166
> uq2.count:
> 86556
> uq2.count:
> 2884924
> uq2.count:
> 419946
>
> compared to if I run with threads=1
> uq2.count:
> 12183593
> uq2.count:
> 12186308
> uq2.count:
> 12091170
>
> This makes me think that it is a problem with the thread scheduling
> logic, not with ThreadSafeQueue, because the scheduling of the Script
> task seems to kick things correctly. I can look into this more, but
> I'll wait for some feedback from Eddie first...
>
> Cliff
>
> On Wed, Oct 13, 2010 at 8:06 AM, Eddie Kohler <kohler at cs.ucla.edu
> <mailto:kohler at cs.ucla.edu>> wrote:
>
> Hi Beyers!
>
> I am not sure that I agree with your analysis about the queue
> getting "stuck."
>
> Here is the configuration I used, a tweak of yours:
>
> //////////////////////////////////////////////////////////////
> is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>,
> LIMIT -1, STOP false) -> q1::FullNoteQueue -> uq1::Unqueue -> Discard;
> is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>,
> LIMIT -1, STOP false) -> q2::FullNoteQueue -> uq2::Unqueue -> Discard;
>
> StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);
> Script(TYPE ACTIVE,
> init x 0,
> set x $(add $x 1),
> goto foo $(or $(is1.scheduled) $(uq1.scheduled)),
> print "$x: is1.scheduled false, uq1.scheduled false,
> q1.length $(q1.length)",
> label foo,
> goto bar $(or $(is2.scheduled) $(uq2.scheduled)),
> print "$x: is2.scheduled false, uq2.scheduled false,
> q2.length $(q2.length)",
> label bar,
> wait 0.5s,
> loop);
> //////////////////////////////////////////////////////////////
>
> This should print a message whenever neither source nor sink is
> scheduled.
>
> I also tried this with "FullNoteQueue" replaced with "ThreadSafeQueue."
>
> If a queue was truly getting 'stuck,' we would expect to see
> something like
>
> 745: is1.scheduled false, uq1.scheduled false
> 746: is1.scheduled false, uq1.scheduled false
> 747: is1.scheduled false, uq1.scheduled false
> 748: is1.scheduled false, uq1.scheduled false
> 749: is1.scheduled false, uq1.scheduled false....
>
> with no gaps, because something would go unscheduled *and stay that
> way*.
>
> But that's not what I see. I see this:
>
> 1: is1.scheduled false, uq1.scheduled false, q1.length 17
> 1: is2.scheduled false, uq2.scheduled false, q2.length 0
> 20: is1.scheduled false, uq1.scheduled false, q1.length 14
> 22: is1.scheduled false, uq1.scheduled false, q1.length 0
> 28: is1.scheduled false, uq1.scheduled false, q1.length 14
> 41: is1.scheduled false, uq1.scheduled false, q1.length 0
> 47: is2.scheduled false, uq2.scheduled false, q2.length 0
> 55: is2.scheduled false, uq2.scheduled false, q2.length 0
> 56: is2.scheduled false, uq2.scheduled false, q2.length 0
> 76: is2.scheduled false, uq2.scheduled false, q2.length 0
>
> *** Now this, on the other hand, I would expect. In a multithreaded
> environment, notification will cause some delay between the upstream &
> downstream sides of a queue. This is because the upstream side must
> wake up
> the downstream side, or vice versa. The cross-thread communication
> required
> for this can happen a bit slowly, leaving a moment (maybe more than
> a moment)
> where neither task appears to be scheduled. (In fact, at least one
> of the
> tasks will be "pre-scheduled.")
>
> So my questions for you are this. Did you really mean "stuck"?
> Have you
> verified that the tasks NEVER re-schedule themselves? Do you agree
> with my
> analysis? Or, are you saying that the "pauses" are too long, even
> though the
> tasks do reschedule themselves?
>
> Eddie
>
>
> On 10/12/2010 05:23 PM, Beyers Cronje wrote:
> > Hi Eddie, list,
> >
> > I've come across a multithreading bug in queue signal handling
> which I can
> > reproduce fairly consistently with userlevel multithreading
> Click. The
> > symptoms are that either the upstream or downstream task from the
> queue are
> > unscheduled even though the notifier signal is active, this
> occurs when the
> > queue either becomes full or empty. To illustrate here is the
> config I use
> > and some debug handler outputs:
> >
> > //******************** Config ************************
> > is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00
> 08 00>,
> > LIMIT 1000, STOP false) -> ThreadSafeQueue -> uq1::Unqueue ->
> Discard;
> > is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00
> 08 00>,
> > LIMIT -1, STOP false) -> q::ThreadSafeQueue -> uq2::Unqueue ->
> Discard;
> >
> > StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);
> >
> >
> > //******************** Debug Handler Output when upstream push
> task is stuck
> > ************************
> > read q.length
> > 200 Read handler 'q.length' OK
> > DATA 1
> > 0
> >
> > read q.fullnote_state
> > 200 Read handler 'q.fullnote_state' OK
> > DATA 131
> > empty notifier off
> > task 0x25b8830 [uq2 :: Unqueue] unscheduled
> > full notifier on
> > task 0x25b8350 [is2 :: InfiniteSource] unscheduled
> >
> > //******************** Debug Handler Output when downstream pull
> task is
> > stuck ************************
> > read q.length
> > 200 Read handler 'q.length' OK
> > DATA 4
> > 1000
> >
> > read q.fullnote_state
> > 200 Read handler 'q.fullnote_state' OK
> > DATA 131
> > empty notifier on
> > task 0x1c6f830 [uq2 :: Unqueue] unscheduled
> > full notifier off
> > task 0x1c6f350 [is2 :: InfiniteSource] unscheduled
> >
> >
> //*****************************************************************************************************************
> >
> > Clearly the notifier states are correct, but somehow the relevant
> task is
> > not rescheduled. The above config uses ThreadSafeQueue but I
> verified that
> > the same issue occurs when using FullNoteQueue.
> > The obvious places to look are
> >
> ActiveNotifier::set_active,FullNoteQueue::push_success/push_failure/pull_success/pull_failure
> > but so far I haven't spotted anything wrong with the relevant
> code, clearly
> > I'm overlooking something.
> >
> > Have you or anyone else on the list got any suggestions?
> >
> > If it helps, I'm running click source from a couple of weeks
> back, default
> > 64bit Fedora Core 13 kernel with preemption enabled
> > (2.6.33.3-85.fc13.x86_64 ), Intel Dual-Core CPU.
> > I start userlevel click with the following command: 'click
> --threads=4
> > conf/threadtest.click -p 777'
> >
> > Cheers
> >
> > Beyers
> > _______________________________________________
> > click mailing list
> > click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu <mailto:click at amsterdam.lcs.mit.edu>
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
More information about the click
mailing list