[Click] Unable to handle kernel paging request at virtualaddress

Paine, Thomas Asa PAINETA at uwec.edu
Mon Nov 20 12:12:35 EST 2006


        Ok, so back to this dang problem.  I was able to cut down on these crashes, but not able to eliminate them.  I did finally notice something very strange.  The "virtual address" it is referencing in this dump is actually valid map data!!  0x40e21c89 is hex value of a network byte ordered IP address which was in the map at the time of the crash (0x40E21C89 = 1088560265 = 64.226.28.137 network byte order).  I'm assuming it came from register edx from the crash below.
        I've provided the disassembly for _ZN7HashMapIjjE6insertERKjS2_.  These crashes are induced by pounding on Spinlocks(ReadWriteLocks) acquires and releases from different threads.  Is it possible Spinlock's assembly is causing GCC to clobber some registers?  I don't know how else map data ended up as a memory reference.  Should I be adding something to the CXXFLAGS or doing something special with gcc?  Very frustrated at this point.

gcc is version 4.1.0
CXXFLAGS=-g
Kernel is 2.6.16.13
Click is CVS

************** the crash *******************

Unable to handle kernel paging request at virtual address 40e21c89
 printing eip:
f88b2387
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: tms click proclikefs e1000
CPU:    0
EIP:    0060:[<f88b2387>]    Tainted: P      VLI
EFLAGS: 00010202   (2.6.16.13-tms #6)
EIP is at _ZN7HashMapIjjE6insertERKjS2_+0x57/0xe2 [tms]
eax: 0000c331   ebx: 00007fff   ecx: c19f0e80   edx: 40e21c89
esi: f61f2480   edi: 00001b9d   ebp: f7f65cb0   esp: f7f65c4c
ds: 007b   es: 007b   ss: 0068
Process bash (pid: 1046, threadinfo=f7f64000 task=c19fba90)
Stack: <0>f7f65c54 00000000 00000002 00000040 cdc20016 1600c2cd f88a2615 f61f2480
       f7f65cb0 f7f65cd0 00000001 00000000 cdc20016 f7f64000 f7f64000 f61f3880
       1600ffff f6819760 00000001 f7d4bd80 00000000 00000000 00000000 0000ffff
Call Trace:
 [<f88a2615>] _ZN16BlacklistElement14addToList_cidrER6Stringj+0x235/0x2ae [tms]
 [<f88a1768>] _ZN16BlacklistElement13write_handlerERK6StringP7ElementPvP12ErrorHandler+0x10a/0x252 [tms]
 [<f8c5d50a>] _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x144/0x1ca [click]
 [<f8c67497>] _ZN11HandlerCall6assignEP7ElementRK6StringS4_iP12ErrorHandler+0x27/0x1e8 [click]
 [<f8c68177>] _ZN11HandlerCall10call_writeEP7ElementRK6StringS4_P12ErrorHandler+0x93/0xcc [click]
 [<f88afc74>] _ZN13TMSController9list_cidrE6StringS0_+0x1b8/0x2a8 [tms]
 [<f88b0e90>] _ZN13TMSController13write_handlerERK6StringP7ElementPvP12ErrorHandler+0x1f4/0x458 [tms]
 [<c0157cc8>] cache_alloc_refill+0x80/0x511
 [<f8c3c4a2>] _ZN6String4MemoC1Eii+0x24/0x2c [click]
 [<f8c5d50a>] _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x144/0x1ca [click]
 [<f8c438e4>] click_lalloc+0x3c/0x3e [click]
 [<f8c3ce8d>] _ZN6String6appendEPKci+0x2f/0x76 [click]
 [<f8cc26e7>] handler_flush+0x577/0x57e [click]
 [<c0159929>] filp_close+0x28/0x6d
 [<c016cd46>] sys_dup2+0xc5/0xee
 [<c0102c9d>] syscall_call+0x7/0xb
Code: d2 74 17 3b 0a 75 f5 8b 4c 24 24 8b 01 89 42 04 31 c0 83 c4 08 5b 5e 5f 5d c3 8b 46 0c 3b 46 10 73 41 8b 4e 14 8b 11 85 d2 74 5a <8b> 02 89 01 89 d3 8b 45 00 89 03 8b 4c 24 24 8b 01 89 43 04 8d



(gdb) disassemble _ZN7HashMapIjjE6insertERKjS2_
Dump of assembler code for function HashMap:
0x00000000 <hatred_of_rebecca+0>:       sub    $0x10,%esp
0x00000003 <hatred_of_rebecca+3>:       movl   $0x0,0xc(%esp)
0x0000000b <hatred_of_rebecca+11>:      lea    0xc(%esp),%eax
0x0000000f <hatred_of_rebecca+15>:      mov    %eax,0x8(%esp)
0x00000013 <hatred_of_rebecca+19>:      movl   $0x0,0x4(%esp)
0x0000001b <hatred_of_rebecca+27>:      mov    0x18(%esp),%eax
0x0000001f <hatred_of_rebecca+31>:      mov    %eax,(%esp)
0x00000022 <hatred_of_rebecca+34>:      call   0x23 <hatred_of_rebecca+35>
0x00000027 <hatred_of_rebecca+39>:      test   %al,%al
0x00000029 <hatred_of_rebecca+41>:      je     0x3a <__module_depends+14>
0x0000002b <hatred_of_rebecca+43>:      mov    0xc(%esp),%edx
0x0000002f <__module_depends+3>:        test   %edx,%edx
End of assembler dump.
(gdb)



40E21C89 -> 1088560265 -> 64.226.28.137


Thanks,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Thomas Paine (paineta at uwec.edu)
   University of Wisconsin - Eau Claire
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-----Original Message-----
From: click-bounces at pdos.csail.mit.edu [mailto:click-bounces at pdos.csail.mit.edu] On Behalf Of Paine, Thomas Asa
Sent: Thursday, October 26, 2006 12:13 PM
To: Beyers Cronje
Cc: click at pdos.csail.mit.edu
Subject: Re: [Click] Unable to handle kernel paging request at virtualaddress

Beyers,
        No, my maps are small.  I had set up some very tight testing conditions and the map sizes were very small and appear to have no bearing on the crashing.  It only appears to tank when there is SMP involved.  It runs fine without SMP or with SMP with 0 packet load.  You remember my post back in September...
https://amsterdam.lcs.mit.edu/pipermail/click/2006-September/005176.html
, well I don't think I'm out of the woods with that issue.  I don't think SMP, P4 Hyperthreading, and Spinlock (click/include/click/sync.hh) are working right together.  My system can run like a rock star, however when I call the write_handlers from /proc (which then causes 1000's of handler calls between elements) do I see the issues.  My hypothesis is that my sched_setaffinity() hack job handled the race condition with the click's main thread (and only one because of my prior post), however the system call into click is still making the locks vulnerable, thus the locks not working.
        I complied click and my package with the -g compiler option (via CXXFLAGS), produced the oops, located the exact line in source code (via gdb/info line/disassemble) and was able to verify the assembly and registers are correct at the time of crashes.  It would also explain why it seems to always pick on a couple of my elements, they are the ones handling the packets and have the most lock exposure.
        I must admit, the whole oops/gdb experience is worth its weight in gold, however, I'd much rather be improving the my software rather than wrestling with it.  These are painful times...
        Can anyone confirm or deny any issues with Spinlock and P4 Hyperthreading w/ SMP?

Thanks,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Thomas Paine (paineta at uwec.edu)
   University of Wisconsin - Eau Claire
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



________________________________

From: Beyers Cronje [mailto:bcronje at gmail.com]
Sent: Wednesday, October 25, 2006 5:50 PM
To: Paine, Thomas Asa
Cc: click at pdos.csail.mit.edu
Subject: Re: [Click] Unable to handle kernel paging request at virtual address


Hi Thomas,

How big is your hashmap at the point of the crash? Hashmap only supports
a maximum of 32k entries due to kmalloc limitation. If this is your
issue, you can work around this by creating multiple hashmaps and hash
inserts across them.

Maybe hashmap should also be a CLICK_LALLOC candidate.

Cheers

Beyers


On 10/25/06, Paine, Thomas Asa < PAINETA at uwec.edu
<mailto:PAINETA at uwec.edu> > wrote:

        Ok, I started a long email but soon realized it would be
confusing as
        hell, so let me try the quick and dirty post first :)

                I'm getting oops, with EIP always pointing to
HashMap.insert.
        However, the call trace can differ (somewhat) and the top of the
call
        trace is not always in a method using HashMap.insert (.find
though).
        Eliminating calls seen in the trace seems to have no affect on
        preventing an oops.  Are there any open issues/changes with
HashMap?  Is
        this a memory allocation error?  Possible incorrect kernel
setting?
        Anyone have troubleshooting recommendations for this error?
I've
        attached my current kernel config for 2.6.16.13.  I'm not 100%
sure this
        has anything to do specifically with hashmap, or if it is just
hashmap
        manifesting the error, for whatever the real reason might be.

        Looking for some guidance, thanks.


        Unable to handle kernel paging request at virtual address
f9291c89
        printing eip:
        e08eea67
        *pde = 00000000
        Oops: 0000 [#1]
        SMP
        Modules linked in: tms click proclikefs e1000
        CPU:    1
        EIP:    0060:[<e08eea67>]    Tainted: P      VLI
        EFLAGS: 00010282   (2.6.16.13-tms #3)
        EIP is at _ZN7HashMapIjjE6insertERKjS2_+0x57/0xe2 [tms]
        eax: 00003fff   ebx: 00007fff   ecx: dff152c0   edx: f9291c89
        esi: dfa23520   edi: 000066de   ebp: df703c88   esp: df703c48
        ds: 007b   es: 007b   ss: 0068
        Process bash (pid: 1030, threadinfo=df702000 task=c1b76a50)
        Stack: <0>e08e135f dfa23520 dfa93a00 dfb67180 dfa939c0 df703c88
e08e1706
        dfa23520
               df703c88 df703cac dfb67180 00000001 00000004 00000000
00000000
        df703cb0
               652a1c89 dfa939c0 00000000 df703cb0 dfa939c0 dfe69760
e08e181e
        dfa939c0
        Call Trace:
        [<e08e135f>] _ZN16BlacklistElement14is_whitelistedEm+0x5f/0xc6
[tms]
        [<e08e1706>]
_ZN16BlacklistElement9addToListER6Stringj+0x172/0x1c8
        [tms]
        [<e08e181e>]

_ZN16BlacklistElement13write_handlerERK6StringP7ElementPvP12ErrorHandler

        +0xc2/0x144 [tms]
        [<e0c5d50a>]

_ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x144/0x1ca
        [click]
        [<e0c67497>]

_ZN11HandlerCall6assignEP7ElementRK6StringS4_iP12ErrorHandler+0x27/0x1e8

        [click]
        [<e0c68177>]

_ZN11HandlerCall10call_writeEP7ElementRK6StringS4_P12ErrorHandler+0x93/0
        xcc [click]
        [<e0c68177>]

_ZN11HandlerCall10call_writeEP7ElementRK6StringS4_P12ErrorHandler+0x93/0

        xcc [click]
        [<e08e9c48>] _ZN13TMSController4listEm6String+0x6a/0x1de [tms]
        [<e0c438e4>] click_lalloc+0x3c/0x3e [click]
        [<e0c3c4a2>] _ZN6String4MemoC1Eii+0x24/0x2c [click]
        [<e08ec238>]
_ZN13TMSController14whitelist_cidrE6String+0xa8/0xde [tms]
        [<e08ecf40>]

_ZN13TMSController13write_handlerERK6StringP7ElementPvP12ErrorHandler+0x
        21a/0x2e2 [tms]
        [<e0c438e4>] click_lalloc+0x3c/0x3e [click]
        [<e0c3c4a2>] _ZN6String4MemoC1Eii+0x24/0x2c [click]
        [<e0c3c57a>] _ZN6String14append_garbageEi+0xa4/0x150 [click]
        [<e0c5d50a>]

_ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x144/0x1ca
        [click]
        [<e0c438e4>] click_lalloc+0x3c/0x3e [click]
        [<e0c3ce8d>] _ZN6String6appendEPKci+0x2f/0x76 [click]
        [<e0cc26e7>] handler_flush+0x577/0x57e [click]
        [<c0159c49>] filp_close+0x28/0x6d
        [<c016d1c6>] sys_dup2+0xc5/0xee
        [<c0102cb9>] syscall_call+0x7/0xb
        Code: d2 74 17 3b 0a 75 f5 8b 4c 24 24 8b 01 89 42 04 31 c0 83
c4 08 5b
        5e 5f 5d c3 8b 46 0c 3b 46 10 73 41 8b 4e 14 8b 11 85 d2 74 5a
<8b> 02
        89 01 89 d3 8b 45 00 89 03 8b 4c 24 24 8b 01 89 43 04 8d



        Thanks,
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           Thomas Paine (paineta at uwec.edu)
           University of Wisconsin - Eau Claire
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



        _______________________________________________
        click mailing list
        click at amsterdam.lcs.mit.edu
        https://amsterdam.lcs.mit.edu/mailman/listinfo/click







_______________________________________________
click mailing list
click at amsterdam.lcs.mit.edu
https://amsterdam.lcs.mit.edu/mailman/listinfo/click



More information about the click mailing list