[Click] NotifierQueue on multi-threaded Click problem.

Joonwoo Park joonwpark81 at gmail.com
Mon Sep 3 22:56:35 EDT 2007


Hi Eddie,

I test it in briefly, I'll do it in future for further more.

2007-08-29 (수), 17:50 -0700, Eddie Kohler wrote:
Jason,
>
> Other notes on your patch:
>
> - As mentioned, I would rather not hold Master's _task_lock for the duration
> of processing the pending list, and the current code tries to fix this
> another way.
> But let me know what you think.

There are some problems.
The Task::process_pending() calls add_pending() which is reference master's
member variable internally.
So to call Task::process_pending(), _task_lock is need to acquired or
synchronized by another method.
I'm attaching patch to fix it.

>
> - On the locks added to NotifierQueue and FullNoteQueue: As you probably
> saw, we
> already had code in these elements to handle the SMP case.  The code was
> commented
> out because in tests we saw that locks dropped performance a LOT; I
> forget how much,
> but it was a very large degradation.  It is not OK to add gratuitous
> locks to these
> functions.  (If you can prove me wrong, I'd be happy to know.)

Yes, I saw the NOTIFIER_LOCK was commented out but it didn't work even
though I turn it on.
About performance, I'd like to test about it in future.

>
> However, I think there is a way to solve the problem without locks:
> change this pattern
>
>      if (++_sleepiness == SLEEPINESS_TRIGGER)
>          _empty_note.sleep();
>
> to this pattern, which re-wakes the relevant Notifier on detecting the
> race condition:
>
>      if (++_sleepiness == SLEEPINESS_TRIGGER) {
>          _empty_note.sleep();
>          if (_head != _tail && !_empty_note.active())
>              _empty_note.wake();
>      }
>
> This change is checked in.  Does it work for you?

I think it has problems too.
Memorizing my test of the last month, It might _empty_note is need to
be synchronized.
IMO the puller and pusher is watching notifier race-conditionally.

>
> - More on those locks: The Click Queues, Simple/Notifier/FullNote, are
> all designed
> to be used when there is *at most one* concurrent pusher and *at most
> one* concurrent
> puller.  They allow pushes and pulls to happen in parallel, but they do
> *not* allow
> multiple pushes to happen in parallel, or multiple pulls.  This is now
> documented.
> If you want multiple concurrent pushes, look at MSQueue.
>

Thanks for your advise.

> - "The spinlock_bh needed because of push and pull can be called
> concurrently from softirq." : What do you mean?  I may be being stupid,
> but I don't think ANY pushes and pulls should be called from softirq
> context.  What softirq context are you thinking of?  Click runs almost
> entirely in user context (i.e. the kernel threads), or at least it should.
>
> - SpinlockBH: I'm pretty sure we will need more than
> SpinlockBH to support preemption...
>
> - SpinlockIRQ: Thanks!  Checked in.
>
> Thanks for this patch!! Sorry for the delay.
>
> Eddie
>
>
> Jason Park (Joonwoo Park) wrote:
> > Hi.
> > Recently I tried multi-threaded Click to get more performance.
> > But I got stopped queue in a few seconds.
> > For the test, I connected two click installed machine with e1000 gigabit
> > directly as sender and replier.
> > The replier made problem on multi-threaded click that run_task() of
> > ToDevice() won't be called any more.
> >
> > I worked patch for the problem and it contains.
> > - FullNoteQueue, NotifierQueue
> >  The spinlock_bh needed because of push and pull can be called
> > concurrently from softirq.
> >   But I didn't work for SimpleQueue, it might lock needed too.
> > - Task
> >   I think that the member _pending should be atomic, in more race
> > conditional situation, it can be problem. (You can try SLEEPNIESS_TRIGGER=1)
> > - Etc.
> >   Replace member function of SpinlockIRQ with the things from linux to
> > disable preempt when irq disabled. (But I didn't set CONFIG_PREEMPT for
> > this test)
> >   Add class SpinlockBH.
> >   Fix Master's lock. (Eddie, Am I right?)
> >   I'm running linux 2.6.19.2 SMP.
> >
> > I'll be pleased if this patch works.
> >
> > Jason Park (Joonwoo Park).

I tried a patch and It's doing lock-less way.
It passes pointers of master's member to process_pending(add_pending)
to prevent pollution of another thread.
How do you think?

Before and after applying my patch, I got too many assert.
So I changed it with WARN_ON_ONCE(), would you like to check it?

-
Signed-off-by: Joonwoo Park <joonwpark81 at gmail.com>

 include/click/task.hh   |    3 ++-_empty_note
 lib/master.cc           |    4 ++--
 lib/task.cc             |   24 +++++++++++++++---------
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/include/click/master.hh b/include/click/master.hh
diff --git a/include/click/task.hh b/include/click/task.hh
index 5abf9a0..8c71ece 100644
--- a/include/click/task.hh
+++ b/include/click/task.hh
@@ -128,8 +128,9 @@ class Task { public:
     Task& operator=(const Task&);
     void cleanup();

+    inline void _add_pending(unsigned, uintptr_t**, bool reschedule);
     void add_pending(bool reschedule);
-    void process_pending(RouterThread*);
+    void process_pending(RouterThread*, unsigned, uintptr_t**);
     inline void fast_schedule();
     void true_reschedule();
     inline void lock_tasks();
diff --git a/lib/master.cc b/lib/master.cc
index f1b4e65..2263896 100644
--- a/lib/master.cc
+++ b/lib/master.cc
@@ -385,7 +385,7 @@ Master::process_pending(RouterThread *thread)
     // switch lists
     unsigned cp = _current_pending;
     _current_pending = 1 - _current_pending;
-    assert(_pending_head[_current_pending] == 0 &&
_pending_tail[_current_pending] == &_pending_head[_current_pending]);
+    WARN_ON_ONCE(!(_pending_head[_current_pending] == 0 &&
_pending_tail[_current_pending] == &_pending_head[_current_pending]));
     thread->_pending = 0;
     _task_lock.release(flags);

@@ -394,7 +394,7 @@ Master::process_pending(RouterThread *thread)
     while (Task *t = Task::pending_to_task(_pending_head[cp])) {
  	_pending_head[cp] = t->_pending_nextptr;
 	t->_pending_nextptr = 0;
-	t->process_pending(thread);
+	t->process_pending(thread, 1 - cp, _pending_tail);
     }
     _pending_tail[cp] = &_pending_head[cp];
     _pending_lock[cp].release(flags);
diff --git a/lib/task.cc b/lib/task.cc
index e2728c6..5e81dc3 100644
--- a/lib/task.cc
+++ b/lib/task.cc
@@ -188,22 +188,27 @@ Task::attempt_lock_tasks()
     return false;
 }

-void
-Task::add_pending(bool reschedule)
+inline void
+Task::_add_pending(unsigned cp, uintptr_t **pt, bool reschedule)
 {
-    Master *m = _router->master();
-    SpinlockIRQ::flags_t flags = m->_task_lock.acquire();
     if (_router->_running >= Router::RUNNING_PREPARING) {
 	if (reschedule)
 	    _pending_reschedule = 1;
 	if (!_pending_nextptr) {
-	    unsigned cp = m->_current_pending;

I tried patch and It's doing lock-less way.
It passes pointers of master's
  	    _pending_nextptr = 2 | cp;
-	    *m->_pending_tail[cp] = reinterpret_cast<uintptr_t>(this) | cp;
-	    m->_pending_tail[cp] = &_pending_nextptr;
+	    *pt[cp] = reinterpret_cast<uintptr_t>(this) | cp;
+	    pt[cp] = &_pending_nextptr;
 	    _thread->add_pending();
 	}
     }
+}
+
+void
+Task::add_pending(bool reschedule)
+{
+    Master *m = _router->master();
+    SpinlockIRQ::flags_t flags = m->_task_lock.acquire();
+    _add_pending(m->_current_pending, m->_pending_tail, reschedule);
     m->_task_lock.release(flags);
 }

@@ -390,7 +395,7 @@ Task::move_thread(int thread_id)
 }

 void
-Task::process_pending(RouterThread *thread)
+Task::process_pending(RouterThread *thread, unsigned cp, uintptr_t** pt)
 {
     // must be called with thread->lock held

@@ -415,7 +420,8 @@ Task::process_pending(RouterThread *thread)
     }

     if (_pending_reschedule)
-	add_pending(true);
+	_add_pending(cp, pt, true);
+

 #if CLICK_BSDMODULE
     splx(s);
-

Mail address has been changed.
Jason Park (Joonwoo Park)



More information about the click mailing list