[Click] Unable to handle kernel paging request at virtualaddress

Eddie Kohler kohler at cs.ucla.edu
Mon Feb 5 18:57:59 EST 2007


Hi Thomas,

Documenting the Spinlock objects may have revealed the problem.  Click 
spinlocks, which are hand built -- I don't know why, ask Benjie Chen or Robert 
Morris -- are only useful when Click is compiled with --enable-multithread. 
In particular, a Click Spinlock WILL NOT prevent interactions between a Click 
thread and a handler.

I think we should just fix this, by making Spinlock shorthand for a regular 
old linux spinlock_t of some kind.  This will probably slow down Click, but it 
seems worth it.

All: Speak now or forever hold your peace.
Eddie


Paine, Thomas Asa wrote:
>         Ok, so back to this dang problem.  I was able to cut down on these crashes, but not able to eliminate them.  I did finally notice something very strange.  The "virtual address" it is referencing in this dump is actually valid map data!!  0x40e21c89 is hex value of a network byte ordered IP address which was in the map at the time of the crash (0x40E21C89 = 1088560265 = 64.226.28.137 network byte order).  I'm assuming it came from register edx from the crash below.
>         I've provided the disassembly for _ZN7HashMapIjjE6insertERKjS2_.  These crashes are induced by pounding on Spinlocks(ReadWriteLocks) acquires and releases from different threads.  Is it possible Spinlock's assembly is causing GCC to clobber some registers?  I don't know how else map data ended up as a memory reference.  Should I be adding something to the CXXFLAGS or doing something special with gcc?  Very frustrated at this point.
> 
> gcc is version 4.1.0
> CXXFLAGS=-g
> Kernel is 2.6.16.13
> Click is CVS
> 
> ************** the crash *******************
> 
> Unable to handle kernel paging request at virtual address 40e21c89
>  printing eip:
> f88b2387
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> Modules linked in: tms click proclikefs e1000
> CPU:    0
> EIP:    0060:[<f88b2387>]    Tainted: P      VLI
> EFLAGS: 00010202   (2.6.16.13-tms #6)
> EIP is at _ZN7HashMapIjjE6insertERKjS2_+0x57/0xe2 [tms]
> eax: 0000c331   ebx: 00007fff   ecx: c19f0e80   edx: 40e21c89
> esi: f61f2480   edi: 00001b9d   ebp: f7f65cb0   esp: f7f65c4c
> ds: 007b   es: 007b   ss: 0068
> Process bash (pid: 1046, threadinfo=f7f64000 task=c19fba90)
> Stack: <0>f7f65c54 00000000 00000002 00000040 cdc20016 1600c2cd f88a2615 f61f2480
>        f7f65cb0 f7f65cd0 00000001 00000000 cdc20016 f7f64000 f7f64000 f61f3880
>        1600ffff f6819760 00000001 f7d4bd80 00000000 00000000 00000000 0000ffff
> Call Trace:
>  [<f88a2615>] _ZN16BlacklistElement14addToList_cidrER6Stringj+0x235/0x2ae [tms]
>  [<f88a1768>] _ZN16BlacklistElement13write_handlerERK6StringP7ElementPvP12ErrorHandler+0x10a/0x252 [tms]
>  [<f8c5d50a>] _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x144/0x1ca [click]
>  [<f8c67497>] _ZN11HandlerCall6assignEP7ElementRK6StringS4_iP12ErrorHandler+0x27/0x1e8 [click]
>  [<f8c68177>] _ZN11HandlerCall10call_writeEP7ElementRK6StringS4_P12ErrorHandler+0x93/0xcc [click]
>  [<f88afc74>] _ZN13TMSController9list_cidrE6StringS0_+0x1b8/0x2a8 [tms]
>  [<f88b0e90>] _ZN13TMSController13write_handlerERK6StringP7ElementPvP12ErrorHandler+0x1f4/0x458 [tms]
>  [<c0157cc8>] cache_alloc_refill+0x80/0x511
>  [<f8c3c4a2>] _ZN6String4MemoC1Eii+0x24/0x2c [click]
>  [<f8c5d50a>] _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x144/0x1ca [click]
>  [<f8c438e4>] click_lalloc+0x3c/0x3e [click]
>  [<f8c3ce8d>] _ZN6String6appendEPKci+0x2f/0x76 [click]
>  [<f8cc26e7>] handler_flush+0x577/0x57e [click]
>  [<c0159929>] filp_close+0x28/0x6d
>  [<c016cd46>] sys_dup2+0xc5/0xee
>  [<c0102c9d>] syscall_call+0x7/0xb
> Code: d2 74 17 3b 0a 75 f5 8b 4c 24 24 8b 01 89 42 04 31 c0 83 c4 08 5b 5e 5f 5d c3 8b 46 0c 3b 46 10 73 41 8b 4e 14 8b 11 85 d2 74 5a <8b> 02 89 01 89 d3 8b 45 00 89 03 8b 4c 24 24 8b 01 89 43 04 8d
> 
> 
> 
> (gdb) disassemble _ZN7HashMapIjjE6insertERKjS2_
> Dump of assembler code for function HashMap:
> 0x00000000 <hatred_of_rebecca+0>:       sub    $0x10,%esp
> 0x00000003 <hatred_of_rebecca+3>:       movl   $0x0,0xc(%esp)
> 0x0000000b <hatred_of_rebecca+11>:      lea    0xc(%esp),%eax
> 0x0000000f <hatred_of_rebecca+15>:      mov    %eax,0x8(%esp)
> 0x00000013 <hatred_of_rebecca+19>:      movl   $0x0,0x4(%esp)
> 0x0000001b <hatred_of_rebecca+27>:      mov    0x18(%esp),%eax
> 0x0000001f <hatred_of_rebecca+31>:      mov    %eax,(%esp)
> 0x00000022 <hatred_of_rebecca+34>:      call   0x23 <hatred_of_rebecca+35>
> 0x00000027 <hatred_of_rebecca+39>:      test   %al,%al
> 0x00000029 <hatred_of_rebecca+41>:      je     0x3a <__module_depends+14>
> 0x0000002b <hatred_of_rebecca+43>:      mov    0xc(%esp),%edx
> 0x0000002f <__module_depends+3>:        test   %edx,%edx
> End of assembler dump.
> (gdb)
> 
> 
> 
> 40E21C89 -> 1088560265 -> 64.226.28.137
> 
> 
> Thanks,
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    Thomas Paine (paineta at uwec.edu)
>    University of Wisconsin - Eau Claire
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> -----Original Message-----
> From: click-bounces at pdos.csail.mit.edu [mailto:click-bounces at pdos.csail.mit.edu] On Behalf Of Paine, Thomas Asa
> Sent: Thursday, October 26, 2006 12:13 PM
> To: Beyers Cronje
> Cc: click at pdos.csail.mit.edu
> Subject: Re: [Click] Unable to handle kernel paging request at virtualaddress
> 
> Beyers,
>         No, my maps are small.  I had set up some very tight testing conditions and the map sizes were very small and appear to have no bearing on the crashing.  It only appears to tank when there is SMP involved.  It runs fine without SMP or with SMP with 0 packet load.  You remember my post back in September...
> https://amsterdam.lcs.mit.edu/pipermail/click/2006-September/005176.html
> , well I don't think I'm out of the woods with that issue.  I don't think SMP, P4 Hyperthreading, and Spinlock (click/include/click/sync.hh) are working right together.  My system can run like a rock star, however when I call the write_handlers from /proc (which then causes 1000's of handler calls between elements) do I see the issues.  My hypothesis is that my sched_setaffinity() hack job handled the race condition with the click's main thread (and only one because of my prior post), however the system call into click is still making the locks vulnerable, thus the locks not working.
>         I complied click and my package with the -g compiler option (via CXXFLAGS), produced the oops, located the exact line in source code (via gdb/info line/disassemble) and was able to verify the assembly and registers are correct at the time of crashes.  It would also explain why it seems to always pick on a couple of my elements, they are the ones handling the packets and have the most lock exposure.
>         I must admit, the whole oops/gdb experience is worth its weight in gold, however, I'd much rather be improving the my software rather than wrestling with it.  These are painful times...
>         Can anyone confirm or deny any issues with Spinlock and P4 Hyperthreading w/ SMP?
> 
> Thanks,
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    Thomas Paine (paineta at uwec.edu)
>    University of Wisconsin - Eau Claire
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> 
> ________________________________
> 
> From: Beyers Cronje [mailto:bcronje at gmail.com]
> Sent: Wednesday, October 25, 2006 5:50 PM
> To: Paine, Thomas Asa
> Cc: click at pdos.csail.mit.edu
> Subject: Re: [Click] Unable to handle kernel paging request at virtual address
> 
> 
> Hi Thomas,
> 
> How big is your hashmap at the point of the crash? Hashmap only supports
> a maximum of 32k entries due to kmalloc limitation. If this is your
> issue, you can work around this by creating multiple hashmaps and hash
> inserts across them.
> 
> Maybe hashmap should also be a CLICK_LALLOC candidate.
> 
> Cheers
> 
> Beyers
> 
> 
> On 10/25/06, Paine, Thomas Asa < PAINETA at uwec.edu
> <mailto:PAINETA at uwec.edu> > wrote:
> 
>         Ok, I started a long email but soon realized it would be
> confusing as
>         hell, so let me try the quick and dirty post first :)
> 
>                 I'm getting oops, with EIP always pointing to
> HashMap.insert.
>         However, the call trace can differ (somewhat) and the top of the
> call
>         trace is not always in a method using HashMap.insert (.find
> though).
>         Eliminating calls seen in the trace seems to have no affect on
>         preventing an oops.  Are there any open issues/changes with
> HashMap?  Is
>         this a memory allocation error?  Possible incorrect kernel
> setting?
>         Anyone have troubleshooting recommendations for this error?
> I've
>         attached my current kernel config for 2.6.16.13.  I'm not 100%
> sure this
>         has anything to do specifically with hashmap, or if it is just
> hashmap
>         manifesting the error, for whatever the real reason might be.
> 
>         Looking for some guidance, thanks.
> 
> 
>         Unable to handle kernel paging request at virtual address
> f9291c89
>         printing eip:
>         e08eea67
>         *pde = 00000000
>         Oops: 0000 [#1]
>         SMP
>         Modules linked in: tms click proclikefs e1000
>         CPU:    1
>         EIP:    0060:[<e08eea67>]    Tainted: P      VLI
>         EFLAGS: 00010282   (2.6.16.13-tms #3)
>         EIP is at _ZN7HashMapIjjE6insertERKjS2_+0x57/0xe2 [tms]
>         eax: 00003fff   ebx: 00007fff   ecx: dff152c0   edx: f9291c89
>         esi: dfa23520   edi: 000066de   ebp: df703c88   esp: df703c48
>         ds: 007b   es: 007b   ss: 0068
>         Process bash (pid: 1030, threadinfo=df702000 task=c1b76a50)
>         Stack: <0>e08e135f dfa23520 dfa93a00 dfb67180 dfa939c0 df703c88
> e08e1706
>         dfa23520
>                df703c88 df703cac dfb67180 00000001 00000004 00000000
> 00000000
>         df703cb0
>                652a1c89 dfa939c0 00000000 df703cb0 dfa939c0 dfe69760
> e08e181e
>         dfa939c0
>         Call Trace:
>         [<e08e135f>] _ZN16BlacklistElement14is_whitelistedEm+0x5f/0xc6
> [tms]
>         [<e08e1706>]
> _ZN16BlacklistElement9addToListER6Stringj+0x172/0x1c8
>         [tms]
>         [<e08e181e>]
> 
> _ZN16BlacklistElement13write_handlerERK6StringP7ElementPvP12ErrorHandler
> 
>         +0xc2/0x144 [tms]
>         [<e0c5d50a>]
> 
> _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x144/0x1ca
>         [click]
>         [<e0c67497>]
> 
> _ZN11HandlerCall6assignEP7ElementRK6StringS4_iP12ErrorHandler+0x27/0x1e8
> 
>         [click]
>         [<e0c68177>]
> 
> _ZN11HandlerCall10call_writeEP7ElementRK6StringS4_P12ErrorHandler+0x93/0
>         xcc [click]
>         [<e0c68177>]
> 
> _ZN11HandlerCall10call_writeEP7ElementRK6StringS4_P12ErrorHandler+0x93/0
> 
>         xcc [click]
>         [<e08e9c48>] _ZN13TMSController4listEm6String+0x6a/0x1de [tms]
>         [<e0c438e4>] click_lalloc+0x3c/0x3e [click]
>         [<e0c3c4a2>] _ZN6String4MemoC1Eii+0x24/0x2c [click]
>         [<e08ec238>]
> _ZN13TMSController14whitelist_cidrE6String+0xa8/0xde [tms]
>         [<e08ecf40>]
> 
> _ZN13TMSController13write_handlerERK6StringP7ElementPvP12ErrorHandler+0x
>         21a/0x2e2 [tms]
>         [<e0c438e4>] click_lalloc+0x3c/0x3e [click]
>         [<e0c3c4a2>] _ZN6String4MemoC1Eii+0x24/0x2c [click]
>         [<e0c3c57a>] _ZN6String14append_garbageEi+0xa4/0x150 [click]
>         [<e0c5d50a>]
> 
> _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x144/0x1ca
>         [click]
>         [<e0c438e4>] click_lalloc+0x3c/0x3e [click]
>         [<e0c3ce8d>] _ZN6String6appendEPKci+0x2f/0x76 [click]
>         [<e0cc26e7>] handler_flush+0x577/0x57e [click]
>         [<c0159c49>] filp_close+0x28/0x6d
>         [<c016d1c6>] sys_dup2+0xc5/0xee
>         [<c0102cb9>] syscall_call+0x7/0xb
>         Code: d2 74 17 3b 0a 75 f5 8b 4c 24 24 8b 01 89 42 04 31 c0 83
> c4 08 5b
>         5e 5f 5d c3 8b 46 0c 3b 46 10 73 41 8b 4e 14 8b 11 85 d2 74 5a
> <8b> 02
>         89 01 89 d3 8b 45 00 89 03 8b 4c 24 24 8b 01 89 43 04 8d
> 
> 
> 
>         Thanks,
>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>            Thomas Paine (paineta at uwec.edu)
>            University of Wisconsin - Eau Claire
>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> 
>         _______________________________________________
>         click mailing list
>         click at amsterdam.lcs.mit.edu
>         https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click


More information about the click mailing list