[Click] Packet loss even at low sending rate

Bingyang LIU bjornliu at gmail.com
Tue Dec 6 20:14:51 EST 2011


Hi all,

Finally I found that it was NOT PollDevice who dropped packets, rather, the
kernel did it.
I tried "ifconfig" and found that it dropped those packets.

Thanks very much, and does any know how to stop the kernel from dropping
packets?

Bingyang

On Mon, Dec 5, 2011 at 10:45 PM, Bingyang LIU <bjornliu at gmail.com> wrote:

> Hi Beyers,
>
> Is there any method where I can schedule PollDevice more often than
> default?
>
> Thanks
> Bingyang
>
>
> On Mon, Dec 5, 2011 at 3:37 PM, Beyers Cronje <bcronje at gmail.com> wrote:
>
>> Hi Bingyang,
>>
>> I understood that. What I was referring to was the empty notifier
>> implementation of ThreadSafeQueue, which basically allows ToDevice to sleep
>> when there are no packets in the queue. HOWEVER, I just had a look at
>> ToDevice code and realized that if you are using PollDevice then ToDevice
>> always gets scheduled and ignores the empty notifier as ToDevice is used to
>> clean the device transmit ring.
>>
>> I haven't used PollDevice in a long time, but if memory serves if you are
>> seeing drops from PollDevice it means PollDevice is not getting scheduled
>> fast enough to pull packets from the device which leads to RX ring
>> overflows. Try increasing RX ring to 4096 to better cater for bursts, or
>> simplify your configuration to try and narrow down what is using too much
>> CPU time. Use MarkIPHeader instead of CheckIPHeader which does not do any
>> checksum checks to see if that makes any difference.
>>
>> Beyers
>>
>>
>> On Mon, Dec 5, 2011 at 10:16 PM, Bingyang LIU <bjornliu at gmail.com> wrote:
>>
>>> Hi Beyers,
>>>
>>> Thanks very much. But what I found is that the PollDevice dropped
>>> packets, not the Queues.
>>>
>>> The "count" handler of PollDevice reported less packets than those sent
>>> to the NIC.
>>> Besides, the number of output packets was equal to the "Count" of the
>>> PollDevice, which means that no packet got lost between PollDevice and the
>>> corresponding output ToDevice (so I guess the Queues are not the problems).
>>>
>>> It seems nothing with multi-threading either, because the packet loss by
>>> PollDevice happened to the single-thread router as well.
>>>
>>> best
>>> Bingyang
>>> On Mon, Dec 5, 2011 at 3:04 PM, Beyers Cronje <bcronje at gmail.com> wrote:
>>>
>>>> Hi Bingyang,
>>>>
>>>> Personally I would use ThreadSafeQueue as it implements the full and
>>>> empty
>>>> notifiers which CPUQueue does not (if I remember correctly). This should
>>>> somewhat help with lower packet rates.
>>>>
>>>> Beyers
>>>>
>>>> On Mon, Dec 5, 2011 at 8:57 PM, Bingyang LIU <bjornliu at gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Beyes,
>>>> >
>>>> > Please check the script below, and thanks very much!
>>>> >
>>>> > // The packet flow in the router is as follows:
>>>> > //
>>>> > // pd2::PollDevice(eth2) -> ...processing... -> td1::ToDevice(eth1)
>>>> > // pd1::PollDevice(eth1) -> ...processing... -> td2::ToDevice(eth2)
>>>> > // pd0::PollDevice(eth0) -> ...processing... -> td0::ToDevice(eth0)
>>>> > // pd3::PollDevice(eth3) -> ...processing... -> td3::ToDevice(eth3)
>>>> > //
>>>> > // Note that it is not a standard IP router, and components such as
>>>> > DeclTTL are eliminated.
>>>> > //
>>>> >
>>>> > AddressInfo(router1-w1       10.0.1.1        00:15:17:57:bd:c6,
>>>> > //eth2
>>>> >             router1-w2       10.0.2.1        00:15:17:57:bd:c5,
>>>> > //eth1
>>>> >             router1-w3       10.0.3.1        00:15:17:57:bd:c4,
>>>> > //eth0
>>>> >             router1-w4       10.0.4.1        00:15:17:57:bd:c7,
>>>> > //eth3
>>>> >             user1-w1         10.0.1.2        00:15:17:57:c7:4e,
>>>> > //eth2
>>>> >             user2-w2         10.0.2.2        00:15:17:57:c4:86,
>>>> > //eth2
>>>> >             user3-w3         10.0.3.2        00:15:17:57:c6:ca,
>>>> > //eth2
>>>> >             user4-w4         10.0.4.2        00:15:17:57:c4:3a);
>>>> >  //eth2
>>>> >
>>>> > c1 :: Classifier(12/0806 20/0001,
>>>> >                  12/0806 20/0002,
>>>> >                  12/0800,
>>>> >                  -);
>>>> > c2 :: Classifier(12/0806 20/0001,
>>>> >                  12/0806 20/0002,
>>>> >                  12/0800,
>>>> >                  -);
>>>> > c3 :: Classifier(12/0806 20/0001,
>>>> >                  12/0806 20/0002,
>>>> >                  12/0800,
>>>> >                  -);
>>>> > c4 :: Classifier(12/0806 20/0001,
>>>> >                  12/0806 20/0002,
>>>> >                  12/0800,
>>>> >                  -);
>>>> >
>>>> > q0 :: Discard; //ToHost;
>>>> > q1 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w1, user1-w1) ->
>>>> td2
>>>> > :: ToDevice(eth2);
>>>> > q2 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w2, user2-w2) ->
>>>> td1
>>>> > :: ToDevice(eth1);
>>>> > q3 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w3, user3-w3) ->
>>>> td0
>>>> > :: ToDevice(eth0);
>>>> > q4 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w4, user4-w4) ->
>>>> td3
>>>> > :: ToDevice(eth3);
>>>> >
>>>> > rt :: LookupIPRouteMP(10.0.1.0/32 0, 10.0.1.1/32 0, 10.0.1.255/32 0,
>>>> >                       10.0.2.0/32 0, 10.0.2.1/32 0, 10.0.2.255/32 0,
>>>> >                       10.0.3.0/32 0, 10.0.3.1/32 0, 10.0.3.255/32 0,
>>>> >                       10.0.4.0/32 0, 10.0.4.1/32 0, 10.0.4.255/32 0,
>>>> >                       10.0.1.0/24 1, 10.0.2.0/24 2, 10.0.3.0/24 3,
>>>> >                       10.0.4.0/24 4, 0.0.0.0/0 0);
>>>> > rt[0] -> Discard;
>>>> > rt[1] -> q1;
>>>> > rt[2] -> k4::Counter -> q2;
>>>> > rt[3] -> q3;
>>>> > rt[4] -> q4;
>>>> >
>>>> > pd2 :: PollDevice(eth2) -> c1;
>>>> > c1[0] -> q0;
>>>> > c1[1] -> q0;
>>>> > c1[2] -> Strip(14) -> CheckIPHeader() -> rt;
>>>> > c1[3] -> Discard;
>>>> > pd1 :: PollDevice(eth1) -> c2;
>>>> > c2[0] -> q0;
>>>> > c2[1] -> q0;
>>>> > c2[2] -> Strip(14) -> CheckIPHeader() -> rt;
>>>> > c2[3] -> Discard;
>>>> > pd0 :: PollDevice(eth0) -> c3;
>>>> > c3[0] -> q0;
>>>> > c3[1] -> q0;
>>>> > c3[2] -> Strip(14) -> CheckIPHeader() -> rt;
>>>> > c3[3] -> Discard;
>>>> > pd3 :: PollDevice(eth3) -> c4;
>>>> > c4[0] -> q0;
>>>> > c4[1] -> q0;
>>>> > c4[2] -> Strip(14) -> CheckIPHeader() -> rt;
>>>> > c4[3] -> Discard;
>>>> >
>>>> > StaticThreadSched(pd2 0, td1 0, pd1 1, td2 1, pd0 2, td3 2, pd3 3,
>>>> td0 3);
>>>> >
>>>> >
>>>> > Bingyang
>>>> >
>>>> > On Mon, Dec 5, 2011 at 3:01 AM, Beyers Cronje <bcronje at gmail.com>
>>>> wrote:
>>>> >
>>>> >> For interest sake can you post your the config you are using?
>>>> >>
>>>> >> On Mon, Dec 5, 2011 at 4:37 AM, Bingyang LIU <bjornliu at gmail.com>
>>>> wrote:
>>>> >>
>>>> >> > Hi all,
>>>> >> >
>>>> >> > I need some help on PollDevice. I found that PollDevice caused some
>>>> >> packet
>>>> >> > loss rate (less than 1%) even at low input rate (50kpps).
>>>> >> >
>>>> >> > To be accurate, I found the statistics of switch, which showed
>>>> that the
>>>> >> > number of packets sent out the switch port to the machine's
>>>> interface
>>>> >> was
>>>> >> > 20000000, while the "Count" handler of the PollDevice element was
>>>> >> 19960660.
>>>> >> >
>>>> >> > I tuned the driver buffer by "ethtool -G eth0 rx 2096" (default is
>>>> 256),
>>>> >> > but nothing got better.
>>>> >> >
>>>> >> > Could anyone help me with this?
>>>> >> >
>>>> >> > Thanks very much.
>>>> >> > Bingyang
>>>> >> >
>>>> >> > On Sun, Dec 4, 2011 at 3:38 PM, Bingyang LIU <bjornliu at gmail.com>
>>>> >> wrote:
>>>> >> >
>>>> >> > > Hi~
>>>> >> > >
>>>> >> > > I used CPUQueue and found that it didn't drop packets any more.
>>>> So the
>>>> >> > > only problem is that PollDevice drops packets, actually I think
>>>> it
>>>> >> > couldn't
>>>> >> > > poll all the ready packets from devices.
>>>> >> > >
>>>> >> > > Does anyone has similar problem with PollDevice, and is there any
>>>> >> > solution
>>>> >> > > or best practices?
>>>> >> > >
>>>> >> > > best
>>>> >> > > Bingyang
>>>> >> > >
>>>> >> > >
>>>> >> > > On Sun, Dec 4, 2011 at 2:10 PM, Bingyang LIU <bjornliu at gmail.com
>>>> >
>>>> >> wrote:
>>>> >> > >
>>>> >> > >> Hi Cliff,
>>>> >> > >>
>>>> >> > >> I couldn't use multi-threading when using FromDevice. When I
>>>> used
>>>> >> > >> multi-threading and FromDevice together, the system crashed. So
>>>> I
>>>> >> had to
>>>> >> > >> use single thread when using FromDevice, and use 4 threads when
>>>> using
>>>> >> > >> PollDevice.
>>>> >> > >>
>>>> >> > >> The router I tested has four gigabit interfaces, each of which
>>>> >> connected
>>>> >> > >> with a host. All hosts sent packets to each other at the given
>>>> rate.
>>>> >> > >> When the sending rate is 50kpps (200kpps in total), FromDevice
>>>> gave
>>>> >> the
>>>> >> > >> output ratio of 99.74%, while PollDevice gave 99.94%.
>>>> >> > >> When the sending rate is 200kpps (800kpps in total), FromDevice
>>>> only
>>>> >> > gave
>>>> >> > >> the output ratio of 62.63%, while PollDevice gave 99.29%.
>>>> >> > >>
>>>> >> > >> That's why I think PollDevice works much better than FromDevice.
>>>> >> > >> Actually, both of them cause some packet loss at low input rate.
>>>> >> > >>
>>>> >> > >> And I think click 1.8 is also a mainline source code. But you
>>>> are
>>>> >> right,
>>>> >> > >> I should try 2.0. However, I'm not sure whether the same thing
>>>> will
>>>> >> > happen
>>>> >> > >> to 2.0.
>>>> >> > >>
>>>> >> > >> best
>>>> >> > >> Bingyang
>>>> >> > >>
>>>> >> > >>
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> On Sun, Dec 4, 2011 at 2:29 AM, Cliff Frey <cliff at meraki.com>
>>>> wrote:
>>>> >> > >>
>>>> >> > >>> What performance numbers did you see when using FromDevice
>>>> instead
>>>> >> of
>>>> >> > >>> PollDevice?
>>>> >> > >>>
>>>> >> > >>> Have you tried mainline click?
>>>> >> > >>>
>>>> >> > >>>
>>>> >> > >>> On Sat, Dec 3, 2011 at 10:57 PM, Bingyang Liu <
>>>> bjornliu at gmail.com
>>>> >> > >wrote:
>>>> >> > >>>
>>>> >> > >>>> Thanks Cliff. Ya, I have tried fromdevice, and it gave worse
>>>> >> > >>>> performance.
>>>> >> > >>>>
>>>> >> > >>>> I think Queue should be a very mature element, and there
>>>> should
>>>> >> not be
>>>> >> > >>>> a bug there. But the experiment results told me that
>>>> something got
>>>> >> > wrong.
>>>> >> > >>>> Should I use a thread safe queue instead of queue, when I use
>>>> >> > multithreads?
>>>> >> > >>>>
>>>> >> > >>>> Thanks
>>>> >> > >>>> Bingyang
>>>> >> > >>>>
>>>> >> > >>>> Sent from my iPhone
>>>> >> > >>>>
>>>> >> > >>>> On Dec 4, 2011, at 12:31 AM, Cliff Frey <cliff at meraki.com>
>>>> wrote:
>>>> >> > >>>>
>>>> >> > >>>> You could try FromDevice instead of PollDevice.  I'd expect
>>>> that it
>>>> >> > >>>> would work fine.  If it is not high performance enough, it
>>>> would be
>>>> >> > great
>>>> >> > >>>> if you should share your performance numbers just to have
>>>> another
>>>> >> > datapoint.
>>>> >> > >>>>
>>>> >> > >>>> I doubt that Queue has a bug, you could try latest click
>>>> sources
>>>> >> > though
>>>> >> > >>>> just in case.  As for finding/fixing any polldevice issues, I
>>>> don't
>>>> >> > have
>>>> >> > >>>> anything to help you there...
>>>> >> > >>>>
>>>> >> > >>>> Cliff
>>>> >> > >>>>
>>>> >> > >>>> On Sat, Dec 3, 2011 at 8:49 PM, Bingyang LIU <
>>>> bjornliu at gmail.com
>>>> >> > >wrote:
>>>> >> > >>>>
>>>> >> > >>>>> Hi Cliff,
>>>> >> > >>>>>
>>>> >> > >>>>> Thank you very much for your help. I followed your
>>>> suggestion and
>>>> >> got
>>>> >> > >>>>> some results.
>>>> >> > >>>>>
>>>> >> > >>>>> 1. It turned out that "PollDevice" failed to get all the
>>>> packets
>>>> >> from
>>>> >> > >>>>> NIC, even if the packet sending rate is only 200kpps with the
>>>> >> packet
>>>> >> > size
>>>> >> > >>>>> of 64B.
>>>> >> > >>>>> 2. I used "grep . /click/.e/*/drops", all of them reported 0
>>>> >> drops.
>>>> >> > >>>>> 3. I put a counter between every two connected elements, to
>>>> >> determine
>>>> >> > >>>>> which element dropped packet. Finally I found a queue dropped
>>>> >> > packets,
>>>> >> > >>>>> because the downstream counter reported less "count" than the
>>>> >> > upstream one.
>>>> >> > >>>>> However, it was straight that this queue still reported 0
>>>> drops. I
>>>> >> > think
>>>> >> > >>>>> there might be some bug with the element, or I mis-used the
>>>> >> elements.
>>>> >> > >>>>>
>>>> >> > >>>>> So I have two questions. First, how can I make PollDevice
>>>> work
>>>> >> > better,
>>>> >> > >>>>> which means that it won't drop packets at low rate. (Should
>>>> I use
>>>> >> > Stride
>>>> >> > >>>>> Scheduler?) Second, is there any bug with Queue in Click
>>>> 1.8.0, in
>>>> >> > terms of
>>>> >> > >>>>> dropping packets without reporting the drops?
>>>> >> > >>>>>
>>>> >> > >>>>> My experiment environment and configuration:
>>>> >> > >>>>> * Hardware: CPU Inter Xeon X3210 (quad core at 2.13Ghz), 4GB
>>>> RAM.
>>>> >> (a
>>>> >> > >>>>> server on deterlab)
>>>> >> > >>>>> * Software: Ubuntu8.04 + Click1.8, with PollDevice and
>>>> >> > >>>>> multi-thread enabled.
>>>> >> > >>>>> * Configuration: ./configure
>>>> >> > >>>>> --with-linux=/usr/src/linux-2.6.24.7 --enable-ipsec
>>>> --enable-warp9
>>>> >> > >>>>> --enable-multithread=4
>>>> >> > >>>>> * Installation: sudo click-install --thread=4
>>>> site7_router1.click
>>>> >> > >>>>>
>>>> >> > >>>>>  thanks!
>>>> >> > >>>>> best
>>>> >> > >>>>> Bingyang
>>>> >> > >>>>>
>>>> >> > >>>>> On Sat, Dec 3, 2011 at 12:42 PM, Cliff Frey <
>>>> cliff at meraki.com>
>>>> >> > wrote:
>>>> >> > >>>>>
>>>> >> > >>>>
>>>> >> > >>>
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> --
>>>> >> > >> Bingyang Liu
>>>> >> > >> Network Architecture Lab, Network Center,Tsinghua Univ.
>>>> >> > >> Beijing, China
>>>> >> > >> Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>>> >> > >>
>>>> >> > >
>>>> >> > >
>>>> >> > >
>>>> >> > > --
>>>> >> > > Bingyang Liu
>>>> >> > > Network Architecture Lab, Network Center,Tsinghua Univ.
>>>> >> > > Beijing, China
>>>> >> > > Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>>> >> > >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > Bingyang Liu
>>>> >> > Network Architecture Lab, Network Center,Tsinghua Univ.
>>>> >> > Beijing, China
>>>> >> > Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>>> >> > _______________________________________________
>>>> >> > click mailing list
>>>> >> > click at amsterdam.lcs.mit.edu
>>>> >> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>> >> >
>>>> >> _______________________________________________
>>>> >> click mailing list
>>>> >> click at amsterdam.lcs.mit.edu
>>>> >> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Bingyang Liu
>>>> > Network Architecture Lab, Network Center,Tsinghua Univ.
>>>> > Beijing, China
>>>> > Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>>> >
>>>> _______________________________________________
>>>> click mailing list
>>>> click at amsterdam.lcs.mit.edu
>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>
>>>
>>>
>>>
>>> --
>>> Bingyang Liu
>>> Network Architecture Lab, Network Center,Tsinghua Univ.
>>> Beijing, China
>>> Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>>
>>
>>
>
>
> --
> Bingyang Liu
> Network Architecture Lab, Network Center,Tsinghua Univ.
> Beijing, China
> Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>



-- 
Bingyang Liu
Network Architecture Lab, Network Center,Tsinghua Univ.
Beijing, China
Home Page: http://netarchlab.tsinghua.edu.cn/~liuby


More information about the click mailing list