[Click] User level Click Queues vs Kernel Queues

Wed Jul 6 03:57:42 EDT 2005

Hi Michael,

On Jun 11, 2005, at 2:19 PM, Michael Sirivianos wrote:
> Hi,
>
> We are trying to setup a simple experiment with userlevel click to
> measure the performance degradation of a router performing some crypto
> computations.
>
> We would prefer to use it at kernel space, but 2.4 kernels do not  
> really
> work on our PC's (unless we spend all our time trying to choose the
> correct compiling options) and due to gcc/glib incompatibilities we
> cannot even compile them.

I agree with Beyers that this might be worth your time; also, Click  
now works on 2.6 kernels (mostly).

> our configuration is sthng like:
>
> fromDevice->classifier-> Queue(200) -> Unqueue->StripIpheader-
>
>> ...OurElement->ARPQuerier.
>>
>
> However, we observe that no matter how expensive the computation in
> OurElement is and no matter how much the sender rate increases, the
> Queue never builds up over 1 packet. Instead we have packet losses,  
> that
> we have not been able to pinpoint exactly where they happen. By the
> network monitor tools we infer, its not in the Ethernet interface, and
> most likely not in the IP input queue at the kernel.

This does not surprise me.  Userlevel Click is single-threaded.  (So  
is kernel Click without --enable-multithread.)  What is happening is  
probably something like this:

    1. FromDevice reads a single packet, emits to Queue
    2. Unqueue runs, pushes to OurElement
       OurElement runs for a really really long time
       In the meantime, the k->u queue fills up and eventually overflows
    3. Repeat 1-2 indefinitely.

> We theorize that we may have drops at a kernel/userspace queue but we
> would like an opinion on that.

I buy it.

> Am I correct to assume that  a frame is directly forwarded from the
> device to a kernel queue and then userlevel click which is a single
> thread  reads the frame from this in-kernel queue?

Yes.

> Then click proceeds
> with processing the packets at the rest of the modules including our
> Queue element? Thus, the same thread puts the packet in the click  
> queue
> and then processes it. This is the only explanation I could find  
> for not
> being able to build up our Click queue over 1 packet.

Yes!

> Is it impossible to conduct this experiment at the userlevel?
> At the kernel level would the NIC be the producer of packets that
> directly places frame in our Click Queue and the Elements Unqueue->...
> would be the consumers?

Well, not quite.  The NIC would produce packets, but a Click thread  
would take them from the NIC and push them to the Queue.

If you really want effective parallelism, there are two ways to go.

1. SMP kernel + --enable-multithread + a thread schedule that forces  
FromDevice and Unqueue to run on different CPUs.

2. (Even better:)  Rewrite your long-running element so that it  
doesn't block.  For example, your element could contain a Task and  
Unqueue functionality.  When it had nothing to do, it would read a  
packet from upstream (like Unqueue does),  start processing that  
packet.  But during a long-running computation, it would periodically  
yield control, and resume where it left off when it is next  
scheduled.  (I.e. event-driven style!)

I would do 2.  It is probably a bit easier than it appears.

Eddie

>
> Thanks,
> Michael
>
>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>