[Click] [PATCH 2/2] Task: Kill process_pending dead lock

Eddie Kohler kohler at cs.ucla.edu
Tue Sep 16 13:23:41 EDT 2008


And how many threads?

Eddie


Joonwoo Park wrote:
> Hi Eddie,
> 
> I guess so that you intended to they are recursive. :-)
> Here is the config can cause lock up without device elements.
> 
> ----
> s0::RatedSource(DATASIZE 128) -> EtherEncap(0x0800, FF:FF:FF:FF:FF:FF,
> FF:FF:FF:FF:FF:FF) -> Discard
> s1::InfiniteSource(DATASIZE 128) -> EtherEncap(0x0800,
> FF:FF:FF:FF:FF:FF, FF:FF:FF:FF:FF:FF) -> Discard
> 
> sched::BalancedThreadSched(100);
> ----
> 
> Thanks!
> Joonwoo
> 
> 2008/9/16 Eddie Kohler <kohler at cs.ucla.edu>:
>> Hi Joonwoo,
>>
>> I intended block_tasks() and driver_lock_tasks() to be recursive.  I could
>> certainly have failed!  Can you tell me more about the configuration you're
>> running?  Can you cause a soft lockup even without device elements (such as
>> with InfiniteSources)?
>>
>> Eddie
>>
>>
>> Joonwoo Park wrote:
>>> Hi Eddie,
>>>
>>> I agree with your blocking task execution as a solution.
>>> However I got a following soft lock up problem with your patch.
>>> With a quick review, it's seems to block_tasks() and driver_tasks()
>>> doesn't support recursive lock. (please correct me if I am wrong)
>>> So when BalancedThreadSched's run_timer try to lock the tasks, it
>>> looks like goes hang.
>>>
>>> Here is my oops message and gdb output.  I used my 2.6.24 patched
>>> kernel. I'm sorry for that.
>>>
>>> Thanks,
>>> Joonwoo
>>>
>>> joonwpark at joonwpark-desktop-64:~/SRC5/click/linuxmodule$ BUG: soft
>>> lockup - CPU#0 stuck for 11s! [kclick:3116]
>>> SysRq : Changing Loglevel
>>> Loglevel set to 9
>>> BUG: soft lockup - CPU#0 stuck for 11s! [kclick:3116]
>>> CPU 0:
>>> Modules linked in: click proclikefs e1000 iptable_filter ip_tables
>>> x_tables parport_pc lp parport ipv6 floppy pcspkr forcedeth ext3 jbd
>>> Pid: 3116, comm: kclick Not tainted 2.6.24.7-joonwpark #3
>>> RIP: 0010:[<ffffffff881f818a>]  [<ffffffff881f818a>]
>>> :click:_ZN19BalancedThreadSched9run_timerEP5Timer+0x58a/0x630
>>> RSP: 0018:ffff8100370d7d30  EFLAGS: 00000286
>>> RAX: ffff8100370d4000 RBX: ffff8100370d7dc0 RCX: ffff810037892430
>>> RDX: 00000000ffffffff RSI: ffff81003792fcd0 RDI: ffff81003792fc60
>>> RBP: ffffffff806b7b10 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000001
>>> R13: ffff810080643000 R14: ffff8100370d6000 R15: 0000000000000001
>>> FS:  00002acdb07f76e0(0000) GS:ffffffff806ae000(0000)
>>> knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> CR2: 00000000007ad008 CR3: 000000006bdf2000 CR4: 00000000000006e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>
>>> Call Trace:
>>>  [<ffffffff88166803>] :click:_Z12element_hookP5TimerPv+0x13/0x20
>>>  [<ffffffff8818ebc8>] :click:_ZN6Master10run_timersEv+0x178/0x320
>>>  [<ffffffff88183349>] :click:_ZN12RouterThread6driverEv+0x5b9/0x6f0
>>>  [<ffffffff881f9ffe>] :click:_Z11click_schedPv+0xfe/0x260
>>>  [<ffffffff804e4fef>] _spin_unlock_irq+0x2b/0x30
>>>  [<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
>>>  [<ffffffff8020cfe8>] child_rip+0xa/0x12
>>>  [<ffffffff8022e0b6>] finish_task_switch+0x57/0x94
>>>  [<ffffffff8020c6ff>] restore_args+0x0/0x30
>>>  [<ffffffff881f9f00>] :click:_Z11click_schedPv+0x0/0x260
>>>  [<ffffffff8020cfde>] child_rip+0x0/0x12
>>>
>>>
>>> joonwpark at joonwpark-desktop-64:~/SRC5/click/linuxmodule$ gdb click.ko
>>> GNU gdb 6.8-debian
>>> Copyright (C) 2008 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later
>>> <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-linux-gnu"...
>>> info line *(gdb) info line
>>> *_ZN19BalancedThreadSched9run_timerEP5Timer+0x58a
>>> Line 311 of
>>> "/home/joonwpark/SRC5/click/linuxmodule/../include/click/routerthread.hh"
>>>   starts at address 0x9c1ba
>>> <_ZN19BalancedThreadSched9run_timerEP5Timer+1418>
>>>   and ends at 0x9c1be <_ZN19BalancedThreadSched9run_timerEP5Timer+1422>.
>>> (gdb) l
>>> "/home/joonwpark/SRC5/click/linuxmodule/../include/click/routerthread.hh:311
>>> 306         assert(!current_thread_is_running());
>>> 307         if (!scheduled)
>>> 308             ++_task_blocker_waiting;
>>> 309         while (1) {
>>> 310             int32_t blocker = _task_blocker.value();
>>> 311             if (blocker >= 0
>>> 312                 && _task_blocker.compare_and_swap(blocker, blocker +
>>> 1))
>>> 313                 break;
>>> 314             if (nice) {
>>> 315     #if CLICK_LINUXMODULE
>>> (gdb)
>>>
>>>
>>> 2008/9/15 Eddie Kohler <kohler at cs.ucla.edu>:
>>>> Joonwoo,
>>>>
>>>>> I took look into this lock up issue and I think I found something.
>>>>>
>>>>> RoutherThread::driver() calls run_tasks() with locked tasks.
>>>>> But after calling run_tasks(), current processor can be changed since
>>>>> schedule() might be called (eg. ScheduleLinux element)
>>>>> So I think that's problem.  How do you think?
>>>> I totally agree that this could be a problem.
>>>>
>>>> It looks like EXCLUSIVE handlers never really worked before. :(
>>>>
>>>> So my current analysis is this.  It is not appropriate for a thread to
>>>> call
>>>> blocking functions and/or schedule() when that thread has prevented
>>>> preemption via get_cpu().  My prior patches prevented preemption.
>>>>
>>>> The solution is to separate "locking the task list" from "blocking task
>>>> execution."  Clickfs, when executing an exclusive handler, "blocks task
>>>> execution."  A thread that wants to examine the task list "locks" the
>>>> list.
>>>>
>>>> This commit:
>>>>
>>>> http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=ede0c6b0a1cface05e8d8e2e3496ee7fcd5ee143
>>>> introduces separate APIs for locking the list and blocking task
>>>> execution.
>>>>  Exclusive handlers block task execution, but do not lock the task list.
>>>>  I
>>>> believe that task execution, in this patch, does not prevent preemption.
>>>>  I
>>>> believe the locking works out too.  User-level multithreading tests
>>>> appear
>>>> OK.
>>>>
>>>> Any willing stresstesters?  Pretty please? :)
>>>>
>>>> Eddie
>>>>
>>


More information about the click mailing list