[Click] Race condition with FullNoteQueue signaling in multithreading

Beyers Cronje bcronje at gmail.com
Tue Oct 12 20:23:28 EDT 2010


Hi Eddie, list,

I've come across a multithreading bug in queue signal handling which I can
reproduce fairly consistently with userlevel multithreading Click. The
symptoms are that either the upstream or downstream task from the queue are
unscheduled even though the notifier signal is active, this occurs when the
queue either becomes full or empty. To illustrate here is the config I use
and some debug handler outputs:

//******************** Config ************************
is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
LIMIT 1000, STOP false) -> ThreadSafeQueue -> uq1::Unqueue -> Discard;
is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef  00 00 00 00 00 00  08 00>,
LIMIT -1, STOP false) -> q::ThreadSafeQueue -> uq2::Unqueue -> Discard;

StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3);


//******************** Debug Handler Output when upstream push task is stuck
************************
read q.length
200 Read handler 'q.length' OK
DATA 1
0

read q.fullnote_state
200 Read handler 'q.fullnote_state' OK
DATA 131
empty notifier off
task 0x25b8830 [uq2 :: Unqueue] unscheduled
full notifier on
task 0x25b8350 [is2 :: InfiniteSource] unscheduled

//******************** Debug Handler Output when downstream pull task is
stuck ************************
read q.length
200 Read handler 'q.length' OK
DATA 4
1000

read q.fullnote_state
200 Read handler 'q.fullnote_state' OK
DATA 131
empty notifier on
task 0x1c6f830 [uq2 :: Unqueue] unscheduled
full notifier off
task 0x1c6f350 [is2 :: InfiniteSource] unscheduled

//*****************************************************************************************************************

Clearly the notifier states are correct, but somehow the relevant task is
not rescheduled. The above config uses ThreadSafeQueue but I verified that
the same issue occurs when using FullNoteQueue.
The obvious places to look are
ActiveNotifier::set_active,FullNoteQueue::push_success/push_failure/pull_success/pull_failure
but so far I haven't spotted anything wrong with the relevant code, clearly
I'm overlooking something.

Have you or anyone else on the list got any suggestions?

If it helps, I'm running click source from a couple of weeks back, default
64bit Fedora Core 13 kernel with preemption enabled
(2.6.33.3-85.fc13.x86_64 ), Intel Dual-Core CPU.
I start userlevel click with the following command:  'click --threads=4
conf/threadtest.click -p 777'

Cheers

Beyers


More information about the click mailing list