[Click] Packet loss even at low sending rate

Beyers Cronje bcronje at gmail.com
Mon Dec 5 15:37:21 EST 2011


Hi Bingyang,

I understood that. What I was referring to was the empty notifier
implementation of ThreadSafeQueue, which basically allows ToDevice to sleep
when there are no packets in the queue. HOWEVER, I just had a look at
ToDevice code and realized that if you are using PollDevice then ToDevice
always gets scheduled and ignores the empty notifier as ToDevice is used to
clean the device transmit ring.

I haven't used PollDevice in a long time, but if memory serves if you are
seeing drops from PollDevice it means PollDevice is not getting scheduled
fast enough to pull packets from the device which leads to RX ring
overflows. Try increasing RX ring to 4096 to better cater for bursts, or
simplify your configuration to try and narrow down what is using too much
CPU time. Use MarkIPHeader instead of CheckIPHeader which does not do any
checksum checks to see if that makes any difference.

Beyers

On Mon, Dec 5, 2011 at 10:16 PM, Bingyang LIU <bjornliu at gmail.com> wrote:

> Hi Beyers,
>
> Thanks very much. But what I found is that the PollDevice dropped packets,
> not the Queues.
>
> The "count" handler of PollDevice reported less packets than those sent to
> the NIC.
> Besides, the number of output packets was equal to the "Count" of the
> PollDevice, which means that no packet got lost between PollDevice and the
> corresponding output ToDevice (so I guess the Queues are not the problems).
>
> It seems nothing with multi-threading either, because the packet loss by
> PollDevice happened to the single-thread router as well.
>
> best
> Bingyang
> On Mon, Dec 5, 2011 at 3:04 PM, Beyers Cronje <bcronje at gmail.com> wrote:
>
>> Hi Bingyang,
>>
>> Personally I would use ThreadSafeQueue as it implements the full and empty
>> notifiers which CPUQueue does not (if I remember correctly). This should
>> somewhat help with lower packet rates.
>>
>> Beyers
>>
>> On Mon, Dec 5, 2011 at 8:57 PM, Bingyang LIU <bjornliu at gmail.com> wrote:
>>
>> > Hi Beyes,
>> >
>> > Please check the script below, and thanks very much!
>> >
>> > // The packet flow in the router is as follows:
>> > //
>> > // pd2::PollDevice(eth2) -> ...processing... -> td1::ToDevice(eth1)
>> > // pd1::PollDevice(eth1) -> ...processing... -> td2::ToDevice(eth2)
>> > // pd0::PollDevice(eth0) -> ...processing... -> td0::ToDevice(eth0)
>> > // pd3::PollDevice(eth3) -> ...processing... -> td3::ToDevice(eth3)
>> > //
>> > // Note that it is not a standard IP router, and components such as
>> > DeclTTL are eliminated.
>> > //
>> >
>> > AddressInfo(router1-w1       10.0.1.1        00:15:17:57:bd:c6,
>> > //eth2
>> >             router1-w2       10.0.2.1        00:15:17:57:bd:c5,
>> > //eth1
>> >             router1-w3       10.0.3.1        00:15:17:57:bd:c4,
>> > //eth0
>> >             router1-w4       10.0.4.1        00:15:17:57:bd:c7,
>> > //eth3
>> >             user1-w1         10.0.1.2        00:15:17:57:c7:4e,
>> > //eth2
>> >             user2-w2         10.0.2.2        00:15:17:57:c4:86,
>> > //eth2
>> >             user3-w3         10.0.3.2        00:15:17:57:c6:ca,
>> > //eth2
>> >             user4-w4         10.0.4.2        00:15:17:57:c4:3a);
>> >  //eth2
>> >
>> > c1 :: Classifier(12/0806 20/0001,
>> >                  12/0806 20/0002,
>> >                  12/0800,
>> >                  -);
>> > c2 :: Classifier(12/0806 20/0001,
>> >                  12/0806 20/0002,
>> >                  12/0800,
>> >                  -);
>> > c3 :: Classifier(12/0806 20/0001,
>> >                  12/0806 20/0002,
>> >                  12/0800,
>> >                  -);
>> > c4 :: Classifier(12/0806 20/0001,
>> >                  12/0806 20/0002,
>> >                  12/0800,
>> >                  -);
>> >
>> > q0 :: Discard; //ToHost;
>> > q1 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w1, user1-w1) -> td2
>> > :: ToDevice(eth2);
>> > q2 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w2, user2-w2) -> td1
>> > :: ToDevice(eth1);
>> > q3 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w3, user3-w3) -> td0
>> > :: ToDevice(eth0);
>> > q4 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w4, user4-w4) -> td3
>> > :: ToDevice(eth3);
>> >
>> > rt :: LookupIPRouteMP(10.0.1.0/32 0, 10.0.1.1/32 0, 10.0.1.255/32 0,
>> >                       10.0.2.0/32 0, 10.0.2.1/32 0, 10.0.2.255/32 0,
>> >                       10.0.3.0/32 0, 10.0.3.1/32 0, 10.0.3.255/32 0,
>> >                       10.0.4.0/32 0, 10.0.4.1/32 0, 10.0.4.255/32 0,
>> >                       10.0.1.0/24 1, 10.0.2.0/24 2, 10.0.3.0/24 3,
>> >                       10.0.4.0/24 4, 0.0.0.0/0 0);
>> > rt[0] -> Discard;
>> > rt[1] -> q1;
>> > rt[2] -> k4::Counter -> q2;
>> > rt[3] -> q3;
>> > rt[4] -> q4;
>> >
>> > pd2 :: PollDevice(eth2) -> c1;
>> > c1[0] -> q0;
>> > c1[1] -> q0;
>> > c1[2] -> Strip(14) -> CheckIPHeader() -> rt;
>> > c1[3] -> Discard;
>> > pd1 :: PollDevice(eth1) -> c2;
>> > c2[0] -> q0;
>> > c2[1] -> q0;
>> > c2[2] -> Strip(14) -> CheckIPHeader() -> rt;
>> > c2[3] -> Discard;
>> > pd0 :: PollDevice(eth0) -> c3;
>> > c3[0] -> q0;
>> > c3[1] -> q0;
>> > c3[2] -> Strip(14) -> CheckIPHeader() -> rt;
>> > c3[3] -> Discard;
>> > pd3 :: PollDevice(eth3) -> c4;
>> > c4[0] -> q0;
>> > c4[1] -> q0;
>> > c4[2] -> Strip(14) -> CheckIPHeader() -> rt;
>> > c4[3] -> Discard;
>> >
>> > StaticThreadSched(pd2 0, td1 0, pd1 1, td2 1, pd0 2, td3 2, pd3 3, td0
>> 3);
>> >
>> >
>> > Bingyang
>> >
>> > On Mon, Dec 5, 2011 at 3:01 AM, Beyers Cronje <bcronje at gmail.com>
>> wrote:
>> >
>> >> For interest sake can you post your the config you are using?
>> >>
>> >> On Mon, Dec 5, 2011 at 4:37 AM, Bingyang LIU <bjornliu at gmail.com>
>> wrote:
>> >>
>> >> > Hi all,
>> >> >
>> >> > I need some help on PollDevice. I found that PollDevice caused some
>> >> packet
>> >> > loss rate (less than 1%) even at low input rate (50kpps).
>> >> >
>> >> > To be accurate, I found the statistics of switch, which showed that
>> the
>> >> > number of packets sent out the switch port to the machine's interface
>> >> was
>> >> > 20000000, while the "Count" handler of the PollDevice element was
>> >> 19960660.
>> >> >
>> >> > I tuned the driver buffer by "ethtool -G eth0 rx 2096" (default is
>> 256),
>> >> > but nothing got better.
>> >> >
>> >> > Could anyone help me with this?
>> >> >
>> >> > Thanks very much.
>> >> > Bingyang
>> >> >
>> >> > On Sun, Dec 4, 2011 at 3:38 PM, Bingyang LIU <bjornliu at gmail.com>
>> >> wrote:
>> >> >
>> >> > > Hi~
>> >> > >
>> >> > > I used CPUQueue and found that it didn't drop packets any more. So
>> the
>> >> > > only problem is that PollDevice drops packets, actually I think it
>> >> > couldn't
>> >> > > poll all the ready packets from devices.
>> >> > >
>> >> > > Does anyone has similar problem with PollDevice, and is there any
>> >> > solution
>> >> > > or best practices?
>> >> > >
>> >> > > best
>> >> > > Bingyang
>> >> > >
>> >> > >
>> >> > > On Sun, Dec 4, 2011 at 2:10 PM, Bingyang LIU <bjornliu at gmail.com>
>> >> wrote:
>> >> > >
>> >> > >> Hi Cliff,
>> >> > >>
>> >> > >> I couldn't use multi-threading when using FromDevice. When I used
>> >> > >> multi-threading and FromDevice together, the system crashed. So I
>> >> had to
>> >> > >> use single thread when using FromDevice, and use 4 threads when
>> using
>> >> > >> PollDevice.
>> >> > >>
>> >> > >> The router I tested has four gigabit interfaces, each of which
>> >> connected
>> >> > >> with a host. All hosts sent packets to each other at the given
>> rate.
>> >> > >> When the sending rate is 50kpps (200kpps in total), FromDevice
>> gave
>> >> the
>> >> > >> output ratio of 99.74%, while PollDevice gave 99.94%.
>> >> > >> When the sending rate is 200kpps (800kpps in total), FromDevice
>> only
>> >> > gave
>> >> > >> the output ratio of 62.63%, while PollDevice gave 99.29%.
>> >> > >>
>> >> > >> That's why I think PollDevice works much better than FromDevice.
>> >> > >> Actually, both of them cause some packet loss at low input rate.
>> >> > >>
>> >> > >> And I think click 1.8 is also a mainline source code. But you are
>> >> right,
>> >> > >> I should try 2.0. However, I'm not sure whether the same thing
>> will
>> >> > happen
>> >> > >> to 2.0.
>> >> > >>
>> >> > >> best
>> >> > >> Bingyang
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> On Sun, Dec 4, 2011 at 2:29 AM, Cliff Frey <cliff at meraki.com>
>> wrote:
>> >> > >>
>> >> > >>> What performance numbers did you see when using FromDevice
>> instead
>> >> of
>> >> > >>> PollDevice?
>> >> > >>>
>> >> > >>> Have you tried mainline click?
>> >> > >>>
>> >> > >>>
>> >> > >>> On Sat, Dec 3, 2011 at 10:57 PM, Bingyang Liu <
>> bjornliu at gmail.com
>> >> > >wrote:
>> >> > >>>
>> >> > >>>> Thanks Cliff. Ya, I have tried fromdevice, and it gave worse
>> >> > >>>> performance.
>> >> > >>>>
>> >> > >>>> I think Queue should be a very mature element, and there should
>> >> not be
>> >> > >>>> a bug there. But the experiment results told me that something
>> got
>> >> > wrong.
>> >> > >>>> Should I use a thread safe queue instead of queue, when I use
>> >> > multithreads?
>> >> > >>>>
>> >> > >>>> Thanks
>> >> > >>>> Bingyang
>> >> > >>>>
>> >> > >>>> Sent from my iPhone
>> >> > >>>>
>> >> > >>>> On Dec 4, 2011, at 12:31 AM, Cliff Frey <cliff at meraki.com>
>> wrote:
>> >> > >>>>
>> >> > >>>> You could try FromDevice instead of PollDevice.  I'd expect
>> that it
>> >> > >>>> would work fine.  If it is not high performance enough, it
>> would be
>> >> > great
>> >> > >>>> if you should share your performance numbers just to have
>> another
>> >> > datapoint.
>> >> > >>>>
>> >> > >>>> I doubt that Queue has a bug, you could try latest click sources
>> >> > though
>> >> > >>>> just in case.  As for finding/fixing any polldevice issues, I
>> don't
>> >> > have
>> >> > >>>> anything to help you there...
>> >> > >>>>
>> >> > >>>> Cliff
>> >> > >>>>
>> >> > >>>> On Sat, Dec 3, 2011 at 8:49 PM, Bingyang LIU <
>> bjornliu at gmail.com
>> >> > >wrote:
>> >> > >>>>
>> >> > >>>>> Hi Cliff,
>> >> > >>>>>
>> >> > >>>>> Thank you very much for your help. I followed your suggestion
>> and
>> >> got
>> >> > >>>>> some results.
>> >> > >>>>>
>> >> > >>>>> 1. It turned out that "PollDevice" failed to get all the
>> packets
>> >> from
>> >> > >>>>> NIC, even if the packet sending rate is only 200kpps with the
>> >> packet
>> >> > size
>> >> > >>>>> of 64B.
>> >> > >>>>> 2. I used "grep . /click/.e/*/drops", all of them reported 0
>> >> drops.
>> >> > >>>>> 3. I put a counter between every two connected elements, to
>> >> determine
>> >> > >>>>> which element dropped packet. Finally I found a queue dropped
>> >> > packets,
>> >> > >>>>> because the downstream counter reported less "count" than the
>> >> > upstream one.
>> >> > >>>>> However, it was straight that this queue still reported 0
>> drops. I
>> >> > think
>> >> > >>>>> there might be some bug with the element, or I mis-used the
>> >> elements.
>> >> > >>>>>
>> >> > >>>>> So I have two questions. First, how can I make PollDevice work
>> >> > better,
>> >> > >>>>> which means that it won't drop packets at low rate. (Should I
>> use
>> >> > Stride
>> >> > >>>>> Scheduler?) Second, is there any bug with Queue in Click
>> 1.8.0, in
>> >> > terms of
>> >> > >>>>> dropping packets without reporting the drops?
>> >> > >>>>>
>> >> > >>>>> My experiment environment and configuration:
>> >> > >>>>> * Hardware: CPU Inter Xeon X3210 (quad core at 2.13Ghz), 4GB
>> RAM.
>> >> (a
>> >> > >>>>> server on deterlab)
>> >> > >>>>> * Software: Ubuntu8.04 + Click1.8, with PollDevice and
>> >> > >>>>> multi-thread enabled.
>> >> > >>>>> * Configuration: ./configure
>> >> > >>>>> --with-linux=/usr/src/linux-2.6.24.7 --enable-ipsec
>> --enable-warp9
>> >> > >>>>> --enable-multithread=4
>> >> > >>>>> * Installation: sudo click-install --thread=4
>> site7_router1.click
>> >> > >>>>>
>> >> > >>>>>  thanks!
>> >> > >>>>> best
>> >> > >>>>> Bingyang
>> >> > >>>>>
>> >> > >>>>> On Sat, Dec 3, 2011 at 12:42 PM, Cliff Frey <cliff at meraki.com>
>> >> > wrote:
>> >> > >>>>>
>> >> > >>>>
>> >> > >>>
>> >> > >>
>> >> > >>
>> >> > >> --
>> >> > >> Bingyang Liu
>> >> > >> Network Architecture Lab, Network Center,Tsinghua Univ.
>> >> > >> Beijing, China
>> >> > >> Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>> >> > >>
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Bingyang Liu
>> >> > > Network Architecture Lab, Network Center,Tsinghua Univ.
>> >> > > Beijing, China
>> >> > > Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Bingyang Liu
>> >> > Network Architecture Lab, Network Center,Tsinghua Univ.
>> >> > Beijing, China
>> >> > Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>> >> > _______________________________________________
>> >> > click mailing list
>> >> > click at amsterdam.lcs.mit.edu
>> >> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>> >> >
>> >> _______________________________________________
>> >> click mailing list
>> >> click at amsterdam.lcs.mit.edu
>> >> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>> >>
>> >
>> >
>> >
>> > --
>> > Bingyang Liu
>> > Network Architecture Lab, Network Center,Tsinghua Univ.
>> > Beijing, China
>> > Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>> >
>> _______________________________________________
>> click mailing list
>> click at amsterdam.lcs.mit.edu
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>
>
>
>
> --
> Bingyang Liu
> Network Architecture Lab, Network Center,Tsinghua Univ.
> Beijing, China
> Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>


More information about the click mailing list