[Click] [PATCH 2/2] Task: Kill process_pending dead lock

Joonwoo Park joonwpark81 at gmail.com
Fri Sep 12 05:21:12 EDT 2008


Hi Eddie,

I took look into this lock up issue and I think I found something.

RoutherThread::driver() calls run_tasks() with locked tasks.
But after calling run_tasks(), current processor can be changed since
schedule() might be called (eg. ScheduleLinux element)
So I think that's problem.  How do you think?

Thanks,
Joonwoo

2008/9/11 Joonwoo Park <joonwpark81 at gmail.com>:
> Hi Eddie,
>
> Ah.. I totally cannot understand what I did. :-(
> I'm really sorry for that.
> Sometimes I had soft lock up problem while using BalancedThreadSched().
> And this patch helped it.  So I think I miss read RouterThread & Task codes.
> But I had no idea why this patch helped this problem still.
> Thank you for you explanation.  I think your analysis is correct
>
> Anyhow sometimes I can see soft lock up like this:
>
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: BUG: soft lockup - CPU#1
> stuck for 11s! [kclick:4211]
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: CPU 1:
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: Modules linked in: click
> proclikefs e1000 ppdev iptable_filter ip_tables x_tables parport_pc lp
> parport ipv6 floppy pcspkr forcedeth ext3 jbd
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: Pid: 4211, comm: kclick
> Not tainted 2.6.24.7-joonwpark #3
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: RIP:
> 0010:[<ffffffff881ffd3b>]  [<ffffffff881ffd3b>]
> :click:_ZN19BalancedThreadSched9run_timerEP5Timer+0x5ab/0x610
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: RSP:
> 0018:ffff81006b475d50  EFLAGS: 00000202
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: RAX: 0000000000000001
> RBX: ffff81006b475de0 RCX: 0000000000000001
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: RDX: ffff81006c98cd64
> RSI: ffff81006c98cd58 RDI: ffff81006d1e6210
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: RBP: ffff81006c8f2000
> R08: 0000000000000000 R09: 0000000000000001
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: R10: 0000000000000001
> R11: 0000000000000003 R12: ffff81000107f280
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: R13: ffff81006c8f2000
> R14: ffff81006b474000 R15: ffff8100806b4000
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: FS:
> 00002b010cf826e0(0000) GS:ffff81006f801578(0000)
> knlGS:0000000000000000
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: CS:  0010 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: CR2: 0000000000771d08
> CR3: 00000000371c0000 CR4: 00000000000006e0
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:
> Sep 11 00:17:34 joonwpark-desktop-64 kernel: Call Trace:
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff8816e7c3>]
> :click:_Z12element_hookP5TimerPv+0x13/0x20
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff88196928>]
> :click:_ZN6Master10run_timersEv+0x178/0x320
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff8818b093>]
> :click:_ZN12RouterThread6driverEv+0x353/0x5c0
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff88201d09>]
> :click:_Z11click_schedPv+0xe9/0x250
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:
> [floppy:_spin_unlock_irq+0x2b/0x60] _spin_unlock_irq+0x2b/0x30
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:
> [finish_task_switch+0x57/0x94] finish_task_switch+0x57/0x94
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [child_rip+0xa/0x12]
> child_rip+0xa/0x12
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:
> [finish_task_switch+0x57/0x94] finish_task_switch+0x57/0x94
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [restore_args+0x0/0x30]
> restore_args+0x0/0x30
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff88201c20>]
> :click:_Z11click_schedPv+0x0/0x250
> Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [child_rip+0x0/0x12]
> child_rip+0x0/0x12
>                                  Sep 11 00:17:34 joonwpark-desktop-64
> kernel:
>
>
> Moreover yesterday I updated click tree and after that, current
> updated click source, every time move_thread being called (I guess)
> click is chattering 'chatter: releasing someone else's lock'
> FYI, I put dump_stack() at Spinlock::release().  Below dump stack is it.
>
> Call Trace:
>  [<ffffffff8818b282>] :click:_ZN12RouterThread6driverEv+0x592/0x5d0
>  [<ffffffff88201d49>] :click:_Z11click_schedPv+0xe9/0x250
>  [<ffffffff804e4fef>] _spin_unlock_irq+0x2b/0x30
>  [<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
>  [<ffffffff8020cfe8>] child_rip+0xa/0x12
>  [<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
>  [<ffffffff8020c6ff>] restore_args+0x0/0x30
>  [<ffffffff88201c60>] :click:_Z11click_schedPv+0x0/0x250
>  [<ffffffff8020cfde>] child_rip+0x0/0x12
>
>
> I'm still digging.  You have any idea?
>
> Thanks!
> Joonwoo
>
> 2008/9/10 Eddie Kohler <kohler at cs.ucla.edu>:
>> Joonwoo,
>>
>> Thanks very much for this patch and as usual for all your Click work!!
>>
>> About this patch, though.  This patch does not make sense to me.  The
>> Task::add_pending() function does NOT attempt to lock the RouterThread's
>> _task_lock.  It locks the Master::_task_lock.  And none of the code that
>> manipulates Master::_task_lock looks like it could cause deadlock
>> (Master::_task_lock is held for short periods of time with no functions).
>> Task::add_pending() does call RouterThread::add_pending(), but this function
>> is safe to call without RouterThread::_task_lock.
>>
>> So, can you explain in more detail what deadlock you thought you fixed with
>> this patch?  Is my analysis wrong?  Does SpinlockIRQ need updating along the
>> lines of Spinlock to pin a thread to the current CPU?
>>
>> Eddie
>>
>>
>> Joonwoo Park wrote:
>>>
>>> Hello Eddie,
>>>
>>> I think the function Task::add_pending() should haven't called while
>>> task is locked since it might cause dead lock.
>>> This patch fixes it by unlocking before calling add_pending.
>>>
>>> Please consider applying this patch.
>>>
>>> Thanks,
>>> Joonwoo
>>
>>
>


More information about the click mailing list