[Click] Packet loss even at low sending rate

Bingyang LIU bjornliu at gmail.com
Mon Dec 5 22:45:39 EST 2011


Hi Beyers,

Is there any method where I can schedule PollDevice more often than
default?

Thanks
Bingyang

On Mon, Dec 5, 2011 at 3:37 PM, Beyers Cronje <bcronje at gmail.com> wrote:

> Hi Bingyang,
>
> I understood that. What I was referring to was the empty notifier
> implementation of ThreadSafeQueue, which basically allows ToDevice to sleep
> when there are no packets in the queue. HOWEVER, I just had a look at
> ToDevice code and realized that if you are using PollDevice then ToDevice
> always gets scheduled and ignores the empty notifier as ToDevice is used to
> clean the device transmit ring.
>
> I haven't used PollDevice in a long time, but if memory serves if you are
> seeing drops from PollDevice it means PollDevice is not getting scheduled
> fast enough to pull packets from the device which leads to RX ring
> overflows. Try increasing RX ring to 4096 to better cater for bursts, or
> simplify your configuration to try and narrow down what is using too much
> CPU time. Use MarkIPHeader instead of CheckIPHeader which does not do any
> checksum checks to see if that makes any difference.
>
> Beyers
>
>
> On Mon, Dec 5, 2011 at 10:16 PM, Bingyang LIU <bjornliu at gmail.com> wrote:
>
>> Hi Beyers,
>>
>> Thanks very much. But what I found is that the PollDevice dropped
>> packets, not the Queues.
>>
>> The "count" handler of PollDevice reported less packets than those sent
>> to the NIC.
>> Besides, the number of output packets was equal to the "Count" of the
>> PollDevice, which means that no packet got lost between PollDevice and the
>> corresponding output ToDevice (so I guess the Queues are not the problems).
>>
>> It seems nothing with multi-threading either, because the packet loss by
>> PollDevice happened to the single-thread router as well.
>>
>> best
>> Bingyang
>> On Mon, Dec 5, 2011 at 3:04 PM, Beyers Cronje <bcronje at gmail.com> wrote:
>>
>>> Hi Bingyang,
>>>
>>> Personally I would use ThreadSafeQueue as it implements the full and
>>> empty
>>> notifiers which CPUQueue does not (if I remember correctly). This should
>>> somewhat help with lower packet rates.
>>>
>>> Beyers
>>>
>>> On Mon, Dec 5, 2011 at 8:57 PM, Bingyang LIU <bjornliu at gmail.com> wrote:
>>>
>>> > Hi Beyes,
>>> >
>>> > Please check the script below, and thanks very much!
>>> >
>>> > // The packet flow in the router is as follows:
>>> > //
>>> > // pd2::PollDevice(eth2) -> ...processing... -> td1::ToDevice(eth1)
>>> > // pd1::PollDevice(eth1) -> ...processing... -> td2::ToDevice(eth2)
>>> > // pd0::PollDevice(eth0) -> ...processing... -> td0::ToDevice(eth0)
>>> > // pd3::PollDevice(eth3) -> ...processing... -> td3::ToDevice(eth3)
>>> > //
>>> > // Note that it is not a standard IP router, and components such as
>>> > DeclTTL are eliminated.
>>> > //
>>> >
>>> > AddressInfo(router1-w1       10.0.1.1        00:15:17:57:bd:c6,
>>> > //eth2
>>> >             router1-w2       10.0.2.1        00:15:17:57:bd:c5,
>>> > //eth1
>>> >             router1-w3       10.0.3.1        00:15:17:57:bd:c4,
>>> > //eth0
>>> >             router1-w4       10.0.4.1        00:15:17:57:bd:c7,
>>> > //eth3
>>> >             user1-w1         10.0.1.2        00:15:17:57:c7:4e,
>>> > //eth2
>>> >             user2-w2         10.0.2.2        00:15:17:57:c4:86,
>>> > //eth2
>>> >             user3-w3         10.0.3.2        00:15:17:57:c6:ca,
>>> > //eth2
>>> >             user4-w4         10.0.4.2        00:15:17:57:c4:3a);
>>> >  //eth2
>>> >
>>> > c1 :: Classifier(12/0806 20/0001,
>>> >                  12/0806 20/0002,
>>> >                  12/0800,
>>> >                  -);
>>> > c2 :: Classifier(12/0806 20/0001,
>>> >                  12/0806 20/0002,
>>> >                  12/0800,
>>> >                  -);
>>> > c3 :: Classifier(12/0806 20/0001,
>>> >                  12/0806 20/0002,
>>> >                  12/0800,
>>> >                  -);
>>> > c4 :: Classifier(12/0806 20/0001,
>>> >                  12/0806 20/0002,
>>> >                  12/0800,
>>> >                  -);
>>> >
>>> > q0 :: Discard; //ToHost;
>>> > q1 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w1, user1-w1) ->
>>> td2
>>> > :: ToDevice(eth2);
>>> > q2 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w2, user2-w2) ->
>>> td1
>>> > :: ToDevice(eth1);
>>> > q3 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w3, user3-w3) ->
>>> td0
>>> > :: ToDevice(eth0);
>>> > q4 :: CPUQueue(10000) -> EtherEncap(0x0800, router1-w4, user4-w4) ->
>>> td3
>>> > :: ToDevice(eth3);
>>> >
>>> > rt :: LookupIPRouteMP(10.0.1.0/32 0, 10.0.1.1/32 0, 10.0.1.255/32 0,
>>> >                       10.0.2.0/32 0, 10.0.2.1/32 0, 10.0.2.255/32 0,
>>> >                       10.0.3.0/32 0, 10.0.3.1/32 0, 10.0.3.255/32 0,
>>> >                       10.0.4.0/32 0, 10.0.4.1/32 0, 10.0.4.255/32 0,
>>> >                       10.0.1.0/24 1, 10.0.2.0/24 2, 10.0.3.0/24 3,
>>> >                       10.0.4.0/24 4, 0.0.0.0/0 0);
>>> > rt[0] -> Discard;
>>> > rt[1] -> q1;
>>> > rt[2] -> k4::Counter -> q2;
>>> > rt[3] -> q3;
>>> > rt[4] -> q4;
>>> >
>>> > pd2 :: PollDevice(eth2) -> c1;
>>> > c1[0] -> q0;
>>> > c1[1] -> q0;
>>> > c1[2] -> Strip(14) -> CheckIPHeader() -> rt;
>>> > c1[3] -> Discard;
>>> > pd1 :: PollDevice(eth1) -> c2;
>>> > c2[0] -> q0;
>>> > c2[1] -> q0;
>>> > c2[2] -> Strip(14) -> CheckIPHeader() -> rt;
>>> > c2[3] -> Discard;
>>> > pd0 :: PollDevice(eth0) -> c3;
>>> > c3[0] -> q0;
>>> > c3[1] -> q0;
>>> > c3[2] -> Strip(14) -> CheckIPHeader() -> rt;
>>> > c3[3] -> Discard;
>>> > pd3 :: PollDevice(eth3) -> c4;
>>> > c4[0] -> q0;
>>> > c4[1] -> q0;
>>> > c4[2] -> Strip(14) -> CheckIPHeader() -> rt;
>>> > c4[3] -> Discard;
>>> >
>>> > StaticThreadSched(pd2 0, td1 0, pd1 1, td2 1, pd0 2, td3 2, pd3 3, td0
>>> 3);
>>> >
>>> >
>>> > Bingyang
>>> >
>>> > On Mon, Dec 5, 2011 at 3:01 AM, Beyers Cronje <bcronje at gmail.com>
>>> wrote:
>>> >
>>> >> For interest sake can you post your the config you are using?
>>> >>
>>> >> On Mon, Dec 5, 2011 at 4:37 AM, Bingyang LIU <bjornliu at gmail.com>
>>> wrote:
>>> >>
>>> >> > Hi all,
>>> >> >
>>> >> > I need some help on PollDevice. I found that PollDevice caused some
>>> >> packet
>>> >> > loss rate (less than 1%) even at low input rate (50kpps).
>>> >> >
>>> >> > To be accurate, I found the statistics of switch, which showed that
>>> the
>>> >> > number of packets sent out the switch port to the machine's
>>> interface
>>> >> was
>>> >> > 20000000, while the "Count" handler of the PollDevice element was
>>> >> 19960660.
>>> >> >
>>> >> > I tuned the driver buffer by "ethtool -G eth0 rx 2096" (default is
>>> 256),
>>> >> > but nothing got better.
>>> >> >
>>> >> > Could anyone help me with this?
>>> >> >
>>> >> > Thanks very much.
>>> >> > Bingyang
>>> >> >
>>> >> > On Sun, Dec 4, 2011 at 3:38 PM, Bingyang LIU <bjornliu at gmail.com>
>>> >> wrote:
>>> >> >
>>> >> > > Hi~
>>> >> > >
>>> >> > > I used CPUQueue and found that it didn't drop packets any more.
>>> So the
>>> >> > > only problem is that PollDevice drops packets, actually I think it
>>> >> > couldn't
>>> >> > > poll all the ready packets from devices.
>>> >> > >
>>> >> > > Does anyone has similar problem with PollDevice, and is there any
>>> >> > solution
>>> >> > > or best practices?
>>> >> > >
>>> >> > > best
>>> >> > > Bingyang
>>> >> > >
>>> >> > >
>>> >> > > On Sun, Dec 4, 2011 at 2:10 PM, Bingyang LIU <bjornliu at gmail.com>
>>> >> wrote:
>>> >> > >
>>> >> > >> Hi Cliff,
>>> >> > >>
>>> >> > >> I couldn't use multi-threading when using FromDevice. When I used
>>> >> > >> multi-threading and FromDevice together, the system crashed. So I
>>> >> had to
>>> >> > >> use single thread when using FromDevice, and use 4 threads when
>>> using
>>> >> > >> PollDevice.
>>> >> > >>
>>> >> > >> The router I tested has four gigabit interfaces, each of which
>>> >> connected
>>> >> > >> with a host. All hosts sent packets to each other at the given
>>> rate.
>>> >> > >> When the sending rate is 50kpps (200kpps in total), FromDevice
>>> gave
>>> >> the
>>> >> > >> output ratio of 99.74%, while PollDevice gave 99.94%.
>>> >> > >> When the sending rate is 200kpps (800kpps in total), FromDevice
>>> only
>>> >> > gave
>>> >> > >> the output ratio of 62.63%, while PollDevice gave 99.29%.
>>> >> > >>
>>> >> > >> That's why I think PollDevice works much better than FromDevice.
>>> >> > >> Actually, both of them cause some packet loss at low input rate.
>>> >> > >>
>>> >> > >> And I think click 1.8 is also a mainline source code. But you are
>>> >> right,
>>> >> > >> I should try 2.0. However, I'm not sure whether the same thing
>>> will
>>> >> > happen
>>> >> > >> to 2.0.
>>> >> > >>
>>> >> > >> best
>>> >> > >> Bingyang
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >> On Sun, Dec 4, 2011 at 2:29 AM, Cliff Frey <cliff at meraki.com>
>>> wrote:
>>> >> > >>
>>> >> > >>> What performance numbers did you see when using FromDevice
>>> instead
>>> >> of
>>> >> > >>> PollDevice?
>>> >> > >>>
>>> >> > >>> Have you tried mainline click?
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> On Sat, Dec 3, 2011 at 10:57 PM, Bingyang Liu <
>>> bjornliu at gmail.com
>>> >> > >wrote:
>>> >> > >>>
>>> >> > >>>> Thanks Cliff. Ya, I have tried fromdevice, and it gave worse
>>> >> > >>>> performance.
>>> >> > >>>>
>>> >> > >>>> I think Queue should be a very mature element, and there should
>>> >> not be
>>> >> > >>>> a bug there. But the experiment results told me that something
>>> got
>>> >> > wrong.
>>> >> > >>>> Should I use a thread safe queue instead of queue, when I use
>>> >> > multithreads?
>>> >> > >>>>
>>> >> > >>>> Thanks
>>> >> > >>>> Bingyang
>>> >> > >>>>
>>> >> > >>>> Sent from my iPhone
>>> >> > >>>>
>>> >> > >>>> On Dec 4, 2011, at 12:31 AM, Cliff Frey <cliff at meraki.com>
>>> wrote:
>>> >> > >>>>
>>> >> > >>>> You could try FromDevice instead of PollDevice.  I'd expect
>>> that it
>>> >> > >>>> would work fine.  If it is not high performance enough, it
>>> would be
>>> >> > great
>>> >> > >>>> if you should share your performance numbers just to have
>>> another
>>> >> > datapoint.
>>> >> > >>>>
>>> >> > >>>> I doubt that Queue has a bug, you could try latest click
>>> sources
>>> >> > though
>>> >> > >>>> just in case.  As for finding/fixing any polldevice issues, I
>>> don't
>>> >> > have
>>> >> > >>>> anything to help you there...
>>> >> > >>>>
>>> >> > >>>> Cliff
>>> >> > >>>>
>>> >> > >>>> On Sat, Dec 3, 2011 at 8:49 PM, Bingyang LIU <
>>> bjornliu at gmail.com
>>> >> > >wrote:
>>> >> > >>>>
>>> >> > >>>>> Hi Cliff,
>>> >> > >>>>>
>>> >> > >>>>> Thank you very much for your help. I followed your suggestion
>>> and
>>> >> got
>>> >> > >>>>> some results.
>>> >> > >>>>>
>>> >> > >>>>> 1. It turned out that "PollDevice" failed to get all the
>>> packets
>>> >> from
>>> >> > >>>>> NIC, even if the packet sending rate is only 200kpps with the
>>> >> packet
>>> >> > size
>>> >> > >>>>> of 64B.
>>> >> > >>>>> 2. I used "grep . /click/.e/*/drops", all of them reported 0
>>> >> drops.
>>> >> > >>>>> 3. I put a counter between every two connected elements, to
>>> >> determine
>>> >> > >>>>> which element dropped packet. Finally I found a queue dropped
>>> >> > packets,
>>> >> > >>>>> because the downstream counter reported less "count" than the
>>> >> > upstream one.
>>> >> > >>>>> However, it was straight that this queue still reported 0
>>> drops. I
>>> >> > think
>>> >> > >>>>> there might be some bug with the element, or I mis-used the
>>> >> elements.
>>> >> > >>>>>
>>> >> > >>>>> So I have two questions. First, how can I make PollDevice work
>>> >> > better,
>>> >> > >>>>> which means that it won't drop packets at low rate. (Should I
>>> use
>>> >> > Stride
>>> >> > >>>>> Scheduler?) Second, is there any bug with Queue in Click
>>> 1.8.0, in
>>> >> > terms of
>>> >> > >>>>> dropping packets without reporting the drops?
>>> >> > >>>>>
>>> >> > >>>>> My experiment environment and configuration:
>>> >> > >>>>> * Hardware: CPU Inter Xeon X3210 (quad core at 2.13Ghz), 4GB
>>> RAM.
>>> >> (a
>>> >> > >>>>> server on deterlab)
>>> >> > >>>>> * Software: Ubuntu8.04 + Click1.8, with PollDevice and
>>> >> > >>>>> multi-thread enabled.
>>> >> > >>>>> * Configuration: ./configure
>>> >> > >>>>> --with-linux=/usr/src/linux-2.6.24.7 --enable-ipsec
>>> --enable-warp9
>>> >> > >>>>> --enable-multithread=4
>>> >> > >>>>> * Installation: sudo click-install --thread=4
>>> site7_router1.click
>>> >> > >>>>>
>>> >> > >>>>>  thanks!
>>> >> > >>>>> best
>>> >> > >>>>> Bingyang
>>> >> > >>>>>
>>> >> > >>>>> On Sat, Dec 3, 2011 at 12:42 PM, Cliff Frey <cliff at meraki.com
>>> >
>>> >> > wrote:
>>> >> > >>>>>
>>> >> > >>>>
>>> >> > >>>
>>> >> > >>
>>> >> > >>
>>> >> > >> --
>>> >> > >> Bingyang Liu
>>> >> > >> Network Architecture Lab, Network Center,Tsinghua Univ.
>>> >> > >> Beijing, China
>>> >> > >> Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>> >> > >>
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > --
>>> >> > > Bingyang Liu
>>> >> > > Network Architecture Lab, Network Center,Tsinghua Univ.
>>> >> > > Beijing, China
>>> >> > > Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>> >> > >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Bingyang Liu
>>> >> > Network Architecture Lab, Network Center,Tsinghua Univ.
>>> >> > Beijing, China
>>> >> > Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>> >> > _______________________________________________
>>> >> > click mailing list
>>> >> > click at amsterdam.lcs.mit.edu
>>> >> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>> >> >
>>> >> _______________________________________________
>>> >> click mailing list
>>> >> click at amsterdam.lcs.mit.edu
>>> >> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Bingyang Liu
>>> > Network Architecture Lab, Network Center,Tsinghua Univ.
>>> > Beijing, China
>>> > Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>> >
>>> _______________________________________________
>>> click mailing list
>>> click at amsterdam.lcs.mit.edu
>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
>>
>>
>>
>> --
>> Bingyang Liu
>> Network Architecture Lab, Network Center,Tsinghua Univ.
>> Beijing, China
>> Home Page: http://netarchlab.tsinghua.edu.cn/~liuby
>>
>
>


-- 
Bingyang Liu
Network Architecture Lab, Network Center,Tsinghua Univ.
Beijing, China
Home Page: http://netarchlab.tsinghua.edu.cn/~liuby


More information about the click mailing list