[Click] Large latency with RouteBricks setup

Mihai Dobrescu mihai.dobrescu at gmail.com
Tue Aug 30 09:35:24 EDT 2011


Inline comments.

On Mon, Aug 29, 2011 at 7:08 PM, George Porter <gmporter at cs.ucsd.edu> wrote:
> Hi Mihai,
>
> I've attached some more latency results from RouteBricks.  I was able
> to vary the 5-tuple on the generated packets, and have latency
> measurements for RouteBricks under a few different circumstances
> (figure is attached):
>  - ixgbe configured with a batch size of 16
>  - ixgbe configured with a batch size of 4
>  - Click/minfwdtest.click with a BURST size of 16
>  - Click/minfwdtest.click with a BURST size of 4 (I also tried BURST
> size of 1 but the latency was really high so I didn't include here)
>
> The latency ranges from about 14us up to 100-120us in the 99th
> percentile, for an ixgbe batch factor of 4.  Setting the batch setting
> lower than 4 for some reason caused packets to no longer be delivered
> to Click.  I'm still looking into that.

I would say that the numbers don't look bad.
>
> I also noticed that when I didn't randomize the 5-tuple (indicated by
> the 'uniform' lines) the latency was much lower.  Does anyone on this
> list know why the latency would be higher with a randomized 5-tupe?  I
> would assume that it has to do with packets being spread across four
> NIC queues instead of one queue, leading to those packets being
> delivered by ixgbe up to Click 1/4th as often, leading to 4x the
> latency.

You might experience some software/hardware overhead when using
multi-queuing. You can have a look at this PRESTO paper
http://conferences.sigcomm.org/co-next/2010/Workshops/PRESTO/PRESTO_papers/05-Manesh.pdf
>
> This represents the lowest latency I've been able to achieve so far.
> If you have any other suggestions let me know.  Thanks again for all
> your help--it has been great getting RouteBricks up and running.
>

Sending at the max loss-free rate might increase the queuing, thus the latency.
Have you tried forwarding at a lower rate? Maybe change the size of
the queues (e.g. from CPUQueue(1000) to CPUQueue(500)) if you can
generate non-bursty traffic.

Mihai

>
> On Fri, Aug 26, 2011 at 8:21 AM, George Porter <gmporter at cs.ucsd.edu> wrote:
>> Thanks Mihai,
>>
>>>> Looking at the latency, it seems that for small bursts of packets
>>>> (e.g., 8, 16, 32 and thereabouts) the latency through RouteBricks is
>>>> very low--about 14us or so.
>>>
>>> That's good news :)
>>
>> Yep.  I believe that this single packet or small groups of packet
>> latency results show the benefits of a polling 10G driver (since none
>> of the other RouteBricks-specific functionality is being engaged, such
>> as the multi-queue support and/or the distributed overlay forwarding
>> network).  Adding NAPI support to the ixgbe driver would be really
>> great to prevent packets from getting "stuck" in the ixgbe driver
>> (which prevents things like ssh from working since the SYN packet
>> isn't forwarded to the destination.
>>
>>>> When I try large packet bursts (e.g., 16,000 packets back to back)
>>>> there is an interesting queueing-type behavior in which the latency
>>>> for the earlier packets is 14us, and then it linearly climbs up to
>>>> ~560us or so, and then jumps back down to 14 and repeats in a sawtooth
>>>> pattern.
>>>
>>> There are 2 forms of batching: NIC batching and Click batching. The
>>> described behavior might be related to Click batching. You could check
>>> the "BURST" parameters in PollDevice and ToDevice to adjust the Click
>>> batching.
>>
>> I will try that today.  However the sawtooth is over 10,000+ packets,
>> and so I don't think it is related to a small constant amount of
>> batching.  Otherwise I would expect latency of packets to go up for
>> 8-16 packets then go back down.  What I see now is that the first
>> 10-20 packets have latencies in the 12-14us range, then it linearly
>> goes up to ~500-1000us over the next 10,000 packets, then there is a
>> discontinuous jump back to 14us and the process repeats itself.  I
>> think maybe the issue is that only one kernel thread is handling the
>> packets at this time.
>>
>>> The RSS delivery to a particular queue is done based on the
>>> (SrcIP,DstIP,SrcPort,DstPort) tuple. If you send packets using the
>>> same headers, they will all end up in the same queue.
>>
>> I'm going to vary the 5-tuple and try again to see if I can spread the
>> packets across the nic queues, which  should uniformly spread the
>> packets across those nic queues.
>>
>> Thanks,
>> George
>>
>



More information about the click mailing list