[Click] Click on Multi-core problem

Mon Jul 18 12:34:39 EDT 2011

Hi,

My CPU cores are still suffering from high number of interrupts per second
generated by the transmit NICs ( used only for packets transmission by
ToDevice elements). I tried to decrease the number of interrupts using
Interrupt transmit delay and interrupts throttling parameters of the e1000
driver, but this does not help. As soon as I decrease the number of
interrupts I start losing packets and the performance degrades. Any way, I
was wondering if the ToDevice really needs to use interrupts when a packet
is sent ??? Does the tx DMA ring cleaning function *e1000 _tx _clean *need
an interrupt to start cleaning the DMA descriptors and buffers ????  what
other ToDevice functions require interrupts to get triggered ? I would
really appreciate some explanation regarding the ToDevice element in order
to deal with my problem.

Thank you

Ahmed

On Thu, Jul 14, 2011 at 3:29 PM, ahmed A. <amego83 at gmail.com> wrote:

> Hi all,
>
> Thank you for the tips and feedback that you provided. Finally, I managed
> to allocate the source of the problem. I found out that my output cards (the
> ones attached to the ToDevices) generate a considerable number of interrupts
> (32000 intr/sec) and  linux somehow assigns the interrupt-handling of those
> interrupts to specific cores, the cores which give me the bad forwarding
> performance.  So, as soon as I assign the interrupt handling to different
> core, the performance of the bad core gets to normal (the expected
> performance). I still do not know why I got this high number of interrupts
> in the transmission path. What about CLICK ToDevice, does it allow or
> require these interrupts ?? I am looking now for a way to reduce the number
> of interrupts, so any tips would be useful.
>
>
> Regards,
> Ahmed
>
> On Wed, Jul 13, 2011 at 6:33 PM, Eddie Kohler <kohler at cs.ucla.edu> wrote:
>
>> Hey Adam,
>>
>> Have you added these papers to the wiki page for such things??
>>
>> Eddie
>>
>>
>>
>> On 7/13/11 2:50 AM, Adam Greenhalgh wrote:
>>
>>> sorry for the shameless self plug but a number of these papers might
>>> be of use to you, they explain some of the issues you will be seeing
>>> with multi cpu issues and click.
>>>
>>> http://www.comp.lancs.ac.uk/~**laurent/papers/egi_npc.pdf<http://www.comp.lancs.ac.uk/~laurent/papers/egi_npc.pdf>
>>>
>>> http://www.comp.lancs.ac.uk/~**laurent/papers/high_perf_**
>>> vrouters-CoNEXT08.pdf<http://www.comp.lancs.ac.uk/~laurent/papers/high_perf_vrouters-CoNEXT08.pdf>
>>>
>>> http://www.comp.lancs.ac.uk/~**laurent/papers/fairness_**
>>> vrouters-PRESTO08.pdf<http://www.comp.lancs.ac.uk/~laurent/papers/fairness_vrouters-PRESTO08.pdf>
>>>
>>> Adam
>>>
>>> On 12 July 2011 15:53, ahmed A.<amego83 at gmail.com>  wrote:
>>>
>>>> Hi,
>>>>
>>>> I am examining the forwarding performance of CLICK in our four core CPU
>>>> machine. I am assigning two different simple forwarding path to
>>>> two different core each time and watch the forwarding rate using a CLICK
>>>> counter. My CLICK configuration file is as follow :
>>>>
>>>> d1::PollDevice(eth24,PROMISC true,BURST 16) ->    queue1::Queue(10000)
>>>> ->
>>>> c1::Counter ->  td1::ToDevice(eth22);
>>>> pd2::PollDevice(eth25,PROMISC true,BURST 16) ->  queue2::Queue(10000) ->
>>>> c2::Counter ->  td2::ToDevice(eth23);
>>>>
>>>>        Idle ->  ToDevice(eth24);
>>>>        Idle ->  ToDevice(eth25);
>>>>
>>>>
>>>>        StaticThreadSched(pd1 0,td1 0,pd2 1,td2 1);
>>>>
>>>>        CpuPin(0 0,1 1);
>>>>
>>>> I was expecting to have almost the same forwarding rate (counter rate)
>>>> for
>>>> both paths whatever was the assigned cores, but actually I got different
>>>> results
>>>> depending on the cores that I use, for example when I use core 0 and 1,
>>>> I
>>>> got *1.0 Million Packets Per Second MPPS* for core 0 and *1.42 MPPS* for
>>>> core 1. for core 1 and 2
>>>> I got *1.39 MPPS* for both cores. and for core 2 and 3 I got *1.42 MPPS*
>>>> for
>>>> core 2 and core *1 MPPS* for core 2. In summary, there  alway are two
>>>> cores
>>>> with bad performance comparing to the
>>>> other cores.
>>>>
>>>> By checking the *monitored_empty_polls_rate* of the cores, I found out
>>>> that
>>>> the cores with bad performance has 0.781 monitored_empty_polls_rate
>>>> whereas
>>>> the good
>>>> cores has 207111 *monitored_empty_polls_rate*. The number of dropped
>>>> packets
>>>> in the NIC port assigned to the bad cores are much bigger than number of
>>>> dropped packets assigned to the good cores. My explanation is the
>>>> Polldevice
>>>> is not
>>>> getting enough CPU cycles (i.e not scheduled enough) to poll packets and
>>>> refill the DMA ring with skb buffers, but I have no idea why ????
>>>>
>>>> Does the Linux scheduler interfere with  click ? I checked the load at
>>>> each
>>>> core using *top* but I could not see any other processes running on the
>>>> bad
>>>> cores, they are idle all the time.
>>>> I would appreciate any tips or help.
>>>>
>>>> thank you in advance
>>>>
>>>> Ahmed
>>>>
>>>> PS: I have 2.6.18 kernel running in fedora filesystem with CLICK 1.6 and
>>>> e1000 batched driver
>>>> ______________________________**_________________
>>>> click mailing list
>>>> click at amsterdam.lcs.mit.edu
>>>> https://amsterdam.lcs.mit.edu/**mailman/listinfo/click<https://amsterdam.lcs.mit.edu/mailman/listinfo/click>
>>>>
>>>>
>>> ______________________________**_________________
>>> click mailing list
>>> click at amsterdam.lcs.mit.edu
>>> https://amsterdam.lcs.mit.edu/**mailman/listinfo/click<https://amsterdam.lcs.mit.edu/mailman/listinfo/click>
>>>
>>
>