[Click] GRO bug with Click

Abdulla Alwabel abdulla.wabel at gmail.com
Tue Sep 3 20:16:38 EDT 2013


Dear Click guys,
          We are facing a problem with a node running click that connects
two machines. This problem happens only with using TCP. The hardware
complains about a hang, persistently. Check the kernel log appended at the
end of this message.

            Before explaining the cause of the problem, I will give a brief
summary of our click setup. We setup click as L-2 packet forwarder. It gets
a frame from one interface and either discard it or send it out through
another interface after holding for some time determined by LAN settings

.Recent drivers and Linux kernel tries to combine packets with similar
header parameters into one socket buffer before shipping them upstairs.
This feature is called Generic-Receive-Offload GRO. Click gets packet from
the kernel, from the driver actually, and send it to the wire after some
massaging. This causes the hardware to hang because it doesn’t know how to
segment. A solution is to prepare the buffer by calling “gro_skb_segment”.

--- a/todevice.cc

+++ b/todevice.cc

@@ -480,6 +480,8 @@ ToDevice::queue_packet(Packet *p, struct netdev_queue
*txq)

        skb_put(skb1, need_tail);

     }



+    if(skb_is_gso(skb1) ) skb_gso_segment(skb1,dev->features);

+

     // set the device annotation;

     // apparently some devices in Linux 2.6 require it

     skb1->dev = dev;


In our case we disable GRO before setting up Click...


This is the kernel error log.

{{{

  273.820359] e1000e 0000:03:00.1: eth2: Detected Hardware Unit Hang:

[  273.820360]   TDH                  <c0>

[  273.820361]   TDT                  <d8>

[  273.820362]   next_to_use          <d8>

[  273.820363]   next_to_clean        <c0>[  273.820364]
buffer_info[next_to_clean]:

[  273.820364]   time_stamp           <ffffde2d>

[  273.820365]   next_to_watch        <c0>

[  273.820366]   jiffies              <ffffe66f>

[  273.820367]   next_to_watch.status <0>

[  273.820368] MAC Status             <80387>

[  273.820369] PHY Status             <792d>

[  273.820369] PHY 1000BASE-T Status  <3800>

[  273.820370] PHY Extended Status    <3000>

[  273.820371] PCI Status             <10>

[  277.820398] e1000e 0000:03:00.1: eth2: Detected Hardware Unit Hang:

[  277.820400]   TDH                  <c0>

[  277.820401]   TDT                  <d8>

[  277.820401]   next_to_use          <d8>

[  277.820402]   next_to_clean        <c0>

[  277.820403] buffer_info[next_to_clean]:[  277.820404]   time_stamp
    <ffffde2d>

[  277.820404]   next_to_watch        <c0>

[  277.820405]   jiffies              <ffffea57>

[  277.820406]   next_to_watch.status <0>

[  277.820407] MAC Status             <80387>

[  277.820408] PHY Status             <792d>

[  277.820408] PHY 1000BASE-T Status  <3800>

[  277.820409] PHY Extended Status    <3000>

[  277.820410] PCI Status             <10>

[  277.824018] ------------[ cut here ]------------

[  277.824030] WARNING: at
/build/buildd/linux-3.2.0/net/sched/sch_generic.c:255
dev_watchdog+0x25a/0x270()

[  277.824033] Hardware name: PowerEdge 860

[  277.824035] NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out

[  277.824036] Modules linked in: click(O) proclikefs(O) nfsd nfs lockd
fscache auth_rpcgss nfs_acl sunrpc radeon ttm drm_kms_helper drm dcdbas
i3000_edac edac_core i2c_algo_bit shpchp psmouse serio_raw mac_hid lp
parport e1000 e1000e[  277.824062] Pid: 0, comm: swapper/0 Tainted: G
    O 3.2.0-27-generic #43-Ubuntu

[  277.824065] Call Trace:

[  277.824067]  <IRQ>  [<ffffffff8106729f>] warn_slowpath_common+0x7f/0xc0

[  277.824078]  [<ffffffff81067396>] warn_slowpath_fmt+0x46/0x50

[  277.824088]  [<ffffffff81024282>] ? x86_pmu_enable+0x1f2/0x270

[  277.824091]  [<ffffffff8155f2ca>] dev_watchdog+0x25a/0x270

[  277.824096]  [<ffffffff81110cc0>] ? perf_rotate_context+0x110/0x220

[  277.824099]  [<ffffffff8155f070>] ? qdisc_reset+0x50/0x50

[  277.824101]  [<ffffffff8155f070>] ? qdisc_reset+0x50/0x50

[  277.824106]  [<ffffffff810761a6>] call_timer_fn+0x46/0x160

[  277.824109]  [<ffffffff8155f070>] ? qdisc_reset+0x50/0x50

[  277.824113]  [<ffffffff81077af2>] run_timer_softirq+0x132/0x2a0

[  277.824119]  [<ffffffff81095225>] ? ktime_get+0x65/0xe0

[  277.824125]  [<ffffffff8106ea48>] __do_softirq+0xa8/0x210

[  277.824128]  [<ffffffff8101a779>] ? read_tsc+0x9/0x20

[  277.824132]  [<ffffffff8109c1b4>] ? tick_program_event+0x24/0x30

[  277.824137]  [<ffffffff816644ec>] call_softirq+0x1c/0x30

[  277.824141]  [<ffffffff81015305>] do_softirq+0x65/0xa0

[  277.824144]  [<ffffffff8106ee2e>] irq_exit+0x8e/0xb0

[  277.824147]  [<ffffffff81664e8e>] smp_apic_timer_interrupt+0x6e/0x99

[  277.824150]  [<ffffffff81662d5e>] apic_timer_interrupt+0x6e/0x80

[  277.824152]  <EOI>  [<ffffffff8107894d>] ?
get_next_timer_interrupt+0x8d/0x120

[  277.824157]  [<ffffffff8101be45>] ? mwait_idle+0x95/0x210

[  277.824160]  [<ffffffff81012236>] cpu_idle+0xd6/0x120

[  277.824164]  [<ffffffff816205fe>] rest_init+0x72/0x74

[  277.824171]  [<ffffffff81cfbc03>] start_kernel+0x3b0/0x3bd

[  277.824174]  [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136

[  277.824177]  [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140

[  277.824180]  [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc

[  277.824182] ---[ end trace 9f25206935d2c245 ]---

}}}
Regards


More information about the click mailing list