[Click] Race condition with FullNoteQueue signaling in multithreading

Eddie Kohler kohler at cs.ucla.edu
Wed Oct 13 11:06:33 EDT 2010


Hi Beyers!

I am not sure that I agree with your analysis about the queue getting "stuck."

Here is the configuration I used, a tweak of yours:

//////////////////////////////////////////////////////////////
is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
LIMIT -1, STOP false) -> q1::FullNoteQueue -> uq1::Unqueue -> Discard;
is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
LIMIT -1, STOP false) -> q2::FullNoteQueue -> uq2::Unqueue -> Discard;

StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);
Script(TYPE ACTIVE,
	init x 0,
	set x $(add $x 1),
	goto foo $(or $(is1.scheduled) $(uq1.scheduled)),
	print "$x: is1.scheduled false, uq1.scheduled false, q1.length $(q1.length)",
	label foo,
	goto bar $(or $(is2.scheduled) $(uq2.scheduled)),
	print "$x: is2.scheduled false, uq2.scheduled false, q2.length $(q2.length)",
	label bar,
	wait 0.5s,
	loop);
//////////////////////////////////////////////////////////////

This should print a message whenever neither source nor sink is scheduled.

I also tried this with "FullNoteQueue" replaced with "ThreadSafeQueue."

If a queue was truly getting 'stuck,' we would expect to see something like

745: is1.scheduled false, uq1.scheduled false
746: is1.scheduled false, uq1.scheduled false
747: is1.scheduled false, uq1.scheduled false
748: is1.scheduled false, uq1.scheduled false
749: is1.scheduled false, uq1.scheduled false....

with no gaps, because something would go unscheduled *and stay that way*.

But that's not what I see.  I see this:

1: is1.scheduled false, uq1.scheduled false, q1.length 17
1: is2.scheduled false, uq2.scheduled false, q2.length 0
20: is1.scheduled false, uq1.scheduled false, q1.length 14
22: is1.scheduled false, uq1.scheduled false, q1.length 0
28: is1.scheduled false, uq1.scheduled false, q1.length 14
41: is1.scheduled false, uq1.scheduled false, q1.length 0
47: is2.scheduled false, uq2.scheduled false, q2.length 0
55: is2.scheduled false, uq2.scheduled false, q2.length 0
56: is2.scheduled false, uq2.scheduled false, q2.length 0
76: is2.scheduled false, uq2.scheduled false, q2.length 0

*** Now this, on the other hand, I would expect.  In a multithreaded 
environment, notification will cause some delay between the upstream & 
downstream sides of a queue.  This is because the upstream side must wake up 
the downstream side, or vice versa.  The cross-thread communication required 
for this can happen a bit slowly, leaving a moment (maybe more than a moment) 
where neither task appears to be scheduled.  (In fact, at least one of the 
tasks will be "pre-scheduled.")

So my questions for you are this.  Did you really mean "stuck"?  Have you 
verified that the tasks NEVER re-schedule themselves?  Do you agree with my 
analysis?  Or, are you saying that the "pauses" are too long, even though the 
tasks do reschedule themselves?

Eddie


On 10/12/2010 05:23 PM, Beyers Cronje wrote:
> Hi Eddie, list,
>
> I've come across a multithreading bug in queue signal handling which I can
> reproduce fairly consistently with userlevel multithreading Click. The
> symptoms are that either the upstream or downstream task from the queue are
> unscheduled even though the notifier signal is active, this occurs when the
> queue either becomes full or empty. To illustrate here is the config I use
> and some debug handler outputs:
>
> //******************** Config ************************
> is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
> LIMIT 1000, STOP false) ->  ThreadSafeQueue ->  uq1::Unqueue ->  Discard;
> is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
> LIMIT -1, STOP false) ->  q::ThreadSafeQueue ->  uq2::Unqueue ->  Discard;
>
> StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);
>
>
> //******************** Debug Handler Output when upstream push task is stuck
> ************************
> read q.length
> 200 Read handler 'q.length' OK
> DATA 1
> 0
>
> read q.fullnote_state
> 200 Read handler 'q.fullnote_state' OK
> DATA 131
> empty notifier off
> task 0x25b8830 [uq2 :: Unqueue] unscheduled
> full notifier on
> task 0x25b8350 [is2 :: InfiniteSource] unscheduled
>
> //******************** Debug Handler Output when downstream pull task is
> stuck ************************
> read q.length
> 200 Read handler 'q.length' OK
> DATA 4
> 1000
>
> read q.fullnote_state
> 200 Read handler 'q.fullnote_state' OK
> DATA 131
> empty notifier on
> task 0x1c6f830 [uq2 :: Unqueue] unscheduled
> full notifier off
> task 0x1c6f350 [is2 :: InfiniteSource] unscheduled
>
> //*****************************************************************************************************************
>
> Clearly the notifier states are correct, but somehow the relevant task is
> not rescheduled. The above config uses ThreadSafeQueue but I verified that
> the same issue occurs when using FullNoteQueue.
> The obvious places to look are
> ActiveNotifier::set_active,FullNoteQueue::push_success/push_failure/pull_success/pull_failure
> but so far I haven't spotted anything wrong with the relevant code, clearly
> I'm overlooking something.
>
> Have you or anyone else on the list got any suggestions?
>
> If it helps, I'm running click source from a couple of weeks back, default
> 64bit Fedora Core 13 kernel with preemption enabled
> (2.6.33.3-85.fc13.x86_64 ), Intel Dual-Core CPU.
> I start userlevel click with the following command:  'click --threads=4
> conf/threadtest.click -p 777'
>
> Cheers
>
> Beyers
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click


More information about the click mailing list