[Click] [PATCH 2/2] Task: Kill process_pending dead lock

Joonwoo Park joonwpark81 at gmail.com
Thu Sep 11 13:58:09 EDT 2008


Hi Eddie,

Ah.. I totally cannot understand what I did. :-(
I'm really sorry for that.
Sometimes I had soft lock up problem while using BalancedThreadSched().
And this patch helped it.  So I think I miss read RouterThread & Task codes.
But I had no idea why this patch helped this problem still.
Thank you for you explanation.  I think your analysis is correct

Anyhow sometimes I can see soft lock up like this:

Sep 11 00:17:34 joonwpark-desktop-64 kernel: BUG: soft lockup - CPU#1
stuck for 11s! [kclick:4211]
Sep 11 00:17:34 joonwpark-desktop-64 kernel: CPU 1:
Sep 11 00:17:34 joonwpark-desktop-64 kernel: Modules linked in: click
proclikefs e1000 ppdev iptable_filter ip_tables x_tables parport_pc lp
parport ipv6 floppy pcspkr forcedeth ext3 jbd
Sep 11 00:17:34 joonwpark-desktop-64 kernel: Pid: 4211, comm: kclick
Not tainted 2.6.24.7-joonwpark #3
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RIP:
0010:[<ffffffff881ffd3b>]  [<ffffffff881ffd3b>]
:click:_ZN19BalancedThreadSched9run_timerEP5Timer+0x5ab/0x610
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RSP:
0018:ffff81006b475d50  EFLAGS: 00000202
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RAX: 0000000000000001
RBX: ffff81006b475de0 RCX: 0000000000000001
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RDX: ffff81006c98cd64
RSI: ffff81006c98cd58 RDI: ffff81006d1e6210
Sep 11 00:17:34 joonwpark-desktop-64 kernel: RBP: ffff81006c8f2000
R08: 0000000000000000 R09: 0000000000000001
Sep 11 00:17:34 joonwpark-desktop-64 kernel: R10: 0000000000000001
R11: 0000000000000003 R12: ffff81000107f280
Sep 11 00:17:34 joonwpark-desktop-64 kernel: R13: ffff81006c8f2000
R14: ffff81006b474000 R15: ffff8100806b4000
Sep 11 00:17:34 joonwpark-desktop-64 kernel: FS:
00002b010cf826e0(0000) GS:ffff81006f801578(0000)
knlGS:0000000000000000
Sep 11 00:17:34 joonwpark-desktop-64 kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Sep 11 00:17:34 joonwpark-desktop-64 kernel: CR2: 0000000000771d08
CR3: 00000000371c0000 CR4: 00000000000006e0
Sep 11 00:17:34 joonwpark-desktop-64 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Sep 11 00:17:34 joonwpark-desktop-64 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 11 00:17:34 joonwpark-desktop-64 kernel:
Sep 11 00:17:34 joonwpark-desktop-64 kernel: Call Trace:
Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff8816e7c3>]
:click:_Z12element_hookP5TimerPv+0x13/0x20
Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff88196928>]
:click:_ZN6Master10run_timersEv+0x178/0x320
Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff8818b093>]
:click:_ZN12RouterThread6driverEv+0x353/0x5c0
Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff88201d09>]
:click:_Z11click_schedPv+0xe9/0x250
Sep 11 00:17:34 joonwpark-desktop-64 kernel:
[floppy:_spin_unlock_irq+0x2b/0x60] _spin_unlock_irq+0x2b/0x30
Sep 11 00:17:34 joonwpark-desktop-64 kernel:
[finish_task_switch+0x57/0x94] finish_task_switch+0x57/0x94
Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [child_rip+0xa/0x12]
child_rip+0xa/0x12
Sep 11 00:17:34 joonwpark-desktop-64 kernel:
[finish_task_switch+0x57/0x94] finish_task_switch+0x57/0x94
Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [restore_args+0x0/0x30]
restore_args+0x0/0x30
Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [<ffffffff88201c20>]
:click:_Z11click_schedPv+0x0/0x250
Sep 11 00:17:34 joonwpark-desktop-64 kernel:  [child_rip+0x0/0x12]
child_rip+0x0/0x12
                                  Sep 11 00:17:34 joonwpark-desktop-64
kernel:


Moreover yesterday I updated click tree and after that, current
updated click source, every time move_thread being called (I guess)
click is chattering 'chatter: releasing someone else's lock'
FYI, I put dump_stack() at Spinlock::release().  Below dump stack is it.

Call Trace:
 [<ffffffff8818b282>] :click:_ZN12RouterThread6driverEv+0x592/0x5d0
 [<ffffffff88201d49>] :click:_Z11click_schedPv+0xe9/0x250
 [<ffffffff804e4fef>] _spin_unlock_irq+0x2b/0x30
 [<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
 [<ffffffff8020cfe8>] child_rip+0xa/0x12
 [<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
 [<ffffffff8020c6ff>] restore_args+0x0/0x30
 [<ffffffff88201c60>] :click:_Z11click_schedPv+0x0/0x250
 [<ffffffff8020cfde>] child_rip+0x0/0x12


I'm still digging.  You have any idea?

Thanks!
Joonwoo

2008/9/10 Eddie Kohler <kohler at cs.ucla.edu>:
> Joonwoo,
>
> Thanks very much for this patch and as usual for all your Click work!!
>
> About this patch, though.  This patch does not make sense to me.  The
> Task::add_pending() function does NOT attempt to lock the RouterThread's
> _task_lock.  It locks the Master::_task_lock.  And none of the code that
> manipulates Master::_task_lock looks like it could cause deadlock
> (Master::_task_lock is held for short periods of time with no functions).
> Task::add_pending() does call RouterThread::add_pending(), but this function
> is safe to call without RouterThread::_task_lock.
>
> So, can you explain in more detail what deadlock you thought you fixed with
> this patch?  Is my analysis wrong?  Does SpinlockIRQ need updating along the
> lines of Spinlock to pin a thread to the current CPU?
>
> Eddie
>
>
> Joonwoo Park wrote:
>>
>> Hello Eddie,
>>
>> I think the function Task::add_pending() should haven't called while
>> task is locked since it might cause dead lock.
>> This patch fixes it by unlocking before calling add_pending.
>>
>> Please consider applying this patch.
>>
>> Thanks,
>> Joonwoo
>
>


More information about the click mailing list