[Click] [PATCH 2/2] Task: Kill process_pending dead lock
Joonwoo Park
joonwpark81 at gmail.com
Thu Sep 11 13:58:09 EDT 2008
Hi Eddie,
Ah.. I totally cannot understand what I did. :-(
I'm really sorry for that.
Sometimes I had soft lock up problem while using BalancedThreadSched().
And this patch helped it. So I think I miss read RouterThread & Task codes.
But I had no idea why this patch helped this problem still.
Thank you for you explanation. I think your analysis is correct
Anyhow sometimes I can see soft lock up like this:
Sep 11 00:17:34 joonwpark-desktop-64 kernel: BUG: soft lockup - CPU#1
stuck for 11s! [kclick:4211]
Sep 11 00:17:34 joonwpark-desktop-64 kernel: CPU 1:
Sep 11 00:17:34 joonwpark-desktop-64 kernel: Modules linked in: click
proclikefs e1000 ppdev iptable_filter ip_tables x_tables parport_pc lp
parport ipv6 floppy pcspkr forcedeth ext3 jbd
Sep 11 00:17:34 joonwpark-desktop-64 kernel: Pid: 4211, comm: kclick
Not tainted 2.6.24.7-joonwpark #3
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RIP:
0010:[<ffffffff881ffd3b>] [<ffffffff881ffd3b>]
:click:_ZN19BalancedThreadSched9run_timerEP5Timer+0x5ab/0x610
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RSP:
0018:ffff81006b475d50 EFLAGS: 00000202
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RAX: 0000000000000001
RBX: ffff81006b475de0 RCX: 0000000000000001
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RDX: ffff81006c98cd64
RSI: ffff81006c98cd58 RDI: ffff81006d1e6210
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RBP: ffff81006c8f2000
R08: 0000000000000000 R09: 0000000000000001
Sep 11 00:17:34 joonwpark-desktop-64 kernel: R10: 0000000000000001
R11: 0000000000000003 R12: ffff81000107f280
Sep 11 00:17:34 joonwpark-desktop-64 kernel: R13: ffff81006c8f2000
R14: ffff81006b474000 R15: ffff8100806b4000
Sep 11 00:17:34 joonwpark-desktop-64 kernel: FS:
00002b010cf826e0(0000) GS:ffff81006f801578(0000)
knlGS:0000000000000000
Sep 11 00:17:34 joonwpark-desktop-64 kernel: CS: 0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Sep 11 00:17:34 joonwpark-desktop-64 kernel: CR2: 0000000000771d08
CR3: 00000000371c0000 CR4: 00000000000006e0
Sep 11 00:17:34 joonwpark-desktop-64 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Sep 11 00:17:34 joonwpark-desktop-64 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 11 00:17:34 joonwpark-desktop-64 kernel:
Sep 11 00:17:34 joonwpark-desktop-64 kernel: Call Trace:
Sep 11 00:17:34 joonwpark-desktop-64 kernel: [<ffffffff8816e7c3>]
:click:_Z12element_hookP5TimerPv+0x13/0x20
Sep 11 00:17:34 joonwpark-desktop-64 kernel: [<ffffffff88196928>]
:click:_ZN6Master10run_timersEv+0x178/0x320
Sep 11 00:17:34 joonwpark-desktop-64 kernel: [<ffffffff8818b093>]
:click:_ZN12RouterThread6driverEv+0x353/0x5c0
Sep 11 00:17:34 joonwpark-desktop-64 kernel: [<ffffffff88201d09>]
:click:_Z11click_schedPv+0xe9/0x250
Sep 11 00:17:34 joonwpark-desktop-64 kernel:
[floppy:_spin_unlock_irq+0x2b/0x60] _spin_unlock_irq+0x2b/0x30
Sep 11 00:17:34 joonwpark-desktop-64 kernel:
[finish_task_switch+0x57/0x94] finish_task_switch+0x57/0x94
Sep 11 00:17:34 joonwpark-desktop-64 kernel: [child_rip+0xa/0x12]
child_rip+0xa/0x12
Sep 11 00:17:34 joonwpark-desktop-64 kernel:
[finish_task_switch+0x57/0x94] finish_task_switch+0x57/0x94
Sep 11 00:17:34 joonwpark-desktop-64 kernel: [restore_args+0x0/0x30]
restore_args+0x0/0x30
Sep 11 00:17:34 joonwpark-desktop-64 kernel: [<ffffffff88201c20>]
:click:_Z11click_schedPv+0x0/0x250
Sep 11 00:17:34 joonwpark-desktop-64 kernel: [child_rip+0x0/0x12]
child_rip+0x0/0x12
Sep 11 00:17:34 joonwpark-desktop-64
kernel:
Moreover yesterday I updated click tree and after that, current
updated click source, every time move_thread being called (I guess)
click is chattering 'chatter: releasing someone else's lock'
FYI, I put dump_stack() at Spinlock::release(). Below dump stack is it.
Call Trace:
[<ffffffff8818b282>] :click:_ZN12RouterThread6driverEv+0x592/0x5d0
[<ffffffff88201d49>] :click:_Z11click_schedPv+0xe9/0x250
[<ffffffff804e4fef>] _spin_unlock_irq+0x2b/0x30
[<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
[<ffffffff8020cfe8>] child_rip+0xa/0x12
[<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
[<ffffffff8020c6ff>] restore_args+0x0/0x30
[<ffffffff88201c60>] :click:_Z11click_schedPv+0x0/0x250
[<ffffffff8020cfde>] child_rip+0x0/0x12
I'm still digging. You have any idea?
Thanks!
Joonwoo
2008/9/10 Eddie Kohler <kohler at cs.ucla.edu>:
> Joonwoo,
>
> Thanks very much for this patch and as usual for all your Click work!!
>
> About this patch, though. This patch does not make sense to me. The
> Task::add_pending() function does NOT attempt to lock the RouterThread's
> _task_lock. It locks the Master::_task_lock. And none of the code that
> manipulates Master::_task_lock looks like it could cause deadlock
> (Master::_task_lock is held for short periods of time with no functions).
> Task::add_pending() does call RouterThread::add_pending(), but this function
> is safe to call without RouterThread::_task_lock.
>
> So, can you explain in more detail what deadlock you thought you fixed with
> this patch? Is my analysis wrong? Does SpinlockIRQ need updating along the
> lines of Spinlock to pin a thread to the current CPU?
>
> Eddie
>
>
> Joonwoo Park wrote:
>>
>> Hello Eddie,
>>
>> I think the function Task::add_pending() should haven't called while
>> task is locked since it might cause dead lock.
>> This patch fixes it by unlocking before calling add_pending.
>>
>> Please consider applying this patch.
>>
>> Thanks,
>> Joonwoo
>
>
More information about the click
mailing list