[Click] [PATCH 2/2] Task: Kill process_pending dead lock

Joonwoo Park joonwpark81 at gmail.com
Mon Sep 15 22:16:38 EDT 2008


Hi Eddie,

I agree with your blocking task execution as a solution.
However I got a following soft lock up problem with your patch.
With a quick review, it's seems to block_tasks() and driver_tasks()
doesn't support recursive lock. (please correct me if I am wrong)
So when BalancedThreadSched's run_timer try to lock the tasks, it
looks like goes hang.

Here is my oops message and gdb output.  I used my 2.6.24 patched
kernel. I'm sorry for that.

Thanks,
Joonwoo

joonwpark at joonwpark-desktop-64:~/SRC5/click/linuxmodule$ BUG: soft
lockup - CPU#0 stuck for 11s! [kclick:3116]
SysRq : Changing Loglevel
Loglevel set to 9
BUG: soft lockup - CPU#0 stuck for 11s! [kclick:3116]
CPU 0:
Modules linked in: click proclikefs e1000 iptable_filter ip_tables
x_tables parport_pc lp parport ipv6 floppy pcspkr forcedeth ext3 jbd
Pid: 3116, comm: kclick Not tainted 2.6.24.7-joonwpark #3
RIP: 0010:[<ffffffff881f818a>]  [<ffffffff881f818a>]
:click:_ZN19BalancedThreadSched9run_timerEP5Timer+0x58a/0x630
RSP: 0018:ffff8100370d7d30  EFLAGS: 00000286
RAX: ffff8100370d4000 RBX: ffff8100370d7dc0 RCX: ffff810037892430
RDX: 00000000ffffffff RSI: ffff81003792fcd0 RDI: ffff81003792fc60
RBP: ffffffff806b7b10 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000001
R13: ffff810080643000 R14: ffff8100370d6000 R15: 0000000000000001
FS:  00002acdb07f76e0(0000) GS:ffffffff806ae000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000007ad008 CR3: 000000006bdf2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
 [<ffffffff88166803>] :click:_Z12element_hookP5TimerPv+0x13/0x20
 [<ffffffff8818ebc8>] :click:_ZN6Master10run_timersEv+0x178/0x320
 [<ffffffff88183349>] :click:_ZN12RouterThread6driverEv+0x5b9/0x6f0
 [<ffffffff881f9ffe>] :click:_Z11click_schedPv+0xfe/0x260
 [<ffffffff804e4fef>] _spin_unlock_irq+0x2b/0x30
 [<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
 [<ffffffff8020cfe8>] child_rip+0xa/0x12
 [<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
 [<ffffffff8020c6ff>] restore_args+0x0/0x30
 [<ffffffff881f9f00>] :click:_Z11click_schedPv+0x0/0x260
 [<ffffffff8020cfde>] child_rip+0x0/0x12


joonwpark at joonwpark-desktop-64:~/SRC5/click/linuxmodule$ gdb click.ko
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
info line *(gdb) info line *_ZN19BalancedThreadSched9run_timerEP5Timer+0x58a
Line 311 of "/home/joonwpark/SRC5/click/linuxmodule/../include/click/routerthread.hh"
   starts at address 0x9c1ba <_ZN19BalancedThreadSched9run_timerEP5Timer+1418>
   and ends at 0x9c1be <_ZN19BalancedThreadSched9run_timerEP5Timer+1422>.
(gdb) l "/home/joonwpark/SRC5/click/linuxmodule/../include/click/routerthread.hh:311
306         assert(!current_thread_is_running());
307         if (!scheduled)
308             ++_task_blocker_waiting;
309         while (1) {
310             int32_t blocker = _task_blocker.value();
311             if (blocker >= 0
312                 && _task_blocker.compare_and_swap(blocker, blocker + 1))
313                 break;
314             if (nice) {
315     #if CLICK_LINUXMODULE
(gdb)


2008/9/15 Eddie Kohler <kohler at cs.ucla.edu>:
> Joonwoo,
>
>> I took look into this lock up issue and I think I found something.
>>
>> RoutherThread::driver() calls run_tasks() with locked tasks.
>> But after calling run_tasks(), current processor can be changed since
>> schedule() might be called (eg. ScheduleLinux element)
>> So I think that's problem.  How do you think?
>
> I totally agree that this could be a problem.
>
> It looks like EXCLUSIVE handlers never really worked before. :(
>
> So my current analysis is this.  It is not appropriate for a thread to call
> blocking functions and/or schedule() when that thread has prevented
> preemption via get_cpu().  My prior patches prevented preemption.
>
> The solution is to separate "locking the task list" from "blocking task
> execution."  Clickfs, when executing an exclusive handler, "blocks task
> execution."  A thread that wants to examine the task list "locks" the list.
>
> This commit:
> http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=ede0c6b0a1cface05e8d8e2e3496ee7fcd5ee143
> introduces separate APIs for locking the list and blocking task execution.
>  Exclusive handlers block task execution, but do not lock the task list.  I
> believe that task execution, in this patch, does not prevent preemption.  I
> believe the locking works out too.  User-level multithreading tests appear
> OK.
>
> Any willing stresstesters?  Pretty please? :)
>
> Eddie
>


More information about the click mailing list