[Click] Userlevel performance issues
Eddie Kohler
kohler at cs.ucla.edu
Thu Feb 28 00:33:43 EST 2008
Robert, FYI, wonder if this will help.
http://read.cs.ucla.edu/gitweb?p=click;a=commit;h=b57446d37f8aa63e51259141cc969b403d7fb913
Eddie
Robert Ross wrote:
> This seems to be the opposite of our observations though. In effect,
> our FromDump appeared to dominate processing at the cost of ToDevice
> performance. The FromDump element always introduced traffic at the
> desired rate and timing. ToDevice, however, dropped to a very low pull
> rate, essentially causing queues to immediately fill up and drop.
>
> On Thu, 2008-02-07 at 10:35, Eddie Kohler wrote:
>> /It would have affected kernel mode as well. But the problem was not strictly
>> a PERFORMANCE problem, Click still performed well for them, it just took 100%
>> cpu. Your FromDump -> ... might have been affected because FromDump can use
>> timers in some configs, and thus would take lower priority than busy waiting.
>>
>> Eddie
>>
>>
>> Robert Ross wrote:
>> > Do you know if this problem would affect kernel-mode performance as
>> > well, or was this isolated to userlevel only?
>> >
>> > On Tue, 2008-02-05 at 17:41, Eddie Kohler wrote:
>> >> /Hi Robert,
>> >>
>> >> I wonder if your observed weirdness with LinkUnqueue was due to the
>> >> 100%-CPU-on-DelayUnqueue problem recently reported. Maybe if you tried the
>> >> configuration now?
>> >>
>> >> Eddie
>> >>
>> >>
>> >> Robert Ross wrote:
>> >> > I'm not sure what this means, but we have been able to completely avoid
>> >> > this problem by using kernel-level Click with the experimental
>> >> > FromUserDevice, and a user-level click reading FromDump and pushing
>> >> > packets out on a custom ToRawFile element.
>> >> >
>> >> > I will gladly put together and test a simple configuration. It would be
>> >> > identical to the configuration I had attached except for switching the
>> >> > Socket() to a FromDump(). I will run some more tests and send you the
>> >> > monitor.csv output from our script elements.
>> >> >
>> >> > BTW, we used the monitor.csv output file in tandem with the Java-based
>> >> > LiveGraph to see real-time statistics on Click performance. You can
>> >> > also use Livegraph after the fact to open up and view our Monitor.csv
>> >> > file on your end once I send you output. It has been a very nice
>> >> > marriage of capabilities for real-time analysis with minimal coding.
>> >> > We've done something similar in kernel-level, but had to write a custom
>> >> > java application to output the monitor.csv since kernel configurations
>> >> > cannot output directly to files.
>> >> >
>> >> >
>> >> > Robert Ross
>> >> > DSCI Inc.
>> >> > Office: 732.542.3113 x173
>> >> > Home: 609.702.8114
>> >> > Cell: 609.509.5139
>> >> > Fax: 253.550.6198
>> >> >
>> >> > -----Original Message-----
>> >> > From: Eddie Kohler [mailto:kohler at cs.ucla.edu]
>> >> > Sent: Tuesday, January 29, 2008 2:39 PM
>> >> > To: Robert Ross
>> >> > Cc: Beyers Cronje; click at amsterdam.lcs.mit.edu
>> >> > Subject: Re: [Click] Userlevel performance issues
>> >> >
>> >> > Hi Robert,
>> >> >
>> >> > The *job* of LinkUnqueue is specifically to throttle performance. It is
>> >> > designed to output packets at the bandwidth specified. This will cause
>> >> > a lower rate, pinned to that bandwidth!
>> >> >
>> >> > The numbers you report are kind of reasonable. Click parses bandwidths
>> >> > as powers of 10, which is the networking standard as far as I can tell.
>> >> > So 512Kbps = 512000bps = 64000Bps; 190p/s at this rate implies 336B
>> >> > packets. So 1360p/s, for your highest bandwidth LinkUnqueue, assuming
>> >> > the same packet length, is roughly half what it "should" be. That's not
>> >> > great, but it's not terrible.
>> >> >
>> >> > I have not run your configuration with Sockets, but I have with
>> >> > InfiniteSources, and so forth, and have observed LinkUnqueue outputing
>> >> > packets at the correct rate. In fact I checked in an update to Counter,
>> >> > to give it bit_rate and byte_rate handlers, making this easier to see.
>> >> >
>> >> > LinkUnqueue should affect the upstream Socket elements only indirectly.
>> >> > LinkUnqueue stops pulling from its input when the emulated link is full.
>> >> > This will cause an upstream Queue to fill up. Some elements might
>> >> > notice that Queue's full state and stop producing packets (since those
>> >> > packets will only be dropped). The InfiniteSource and user-level
>> >> > FromHost elements have this behavior. However, your use of
>> >> > NotifierQueue (instead of Queue) would neutralize this effect, since
>> >> > NotifierQueue doesn't provide full notification.
>> >> >
>> >> > I am unsure in the end whether you are observing a bug or correct
>> >> > behavior.
>> >> > Here are a couple questions to help us figure it out.
>> >> >
>> >> > - Re: FromDump and ToDevice. Can you reduce the configuration as much
>> >> > as possible, and tell us what rates ToDevice achieves without FromDump,
>> >> > and what it achieves with FromDump? Your mail isn't specific about the
>> >> > configuration or the performance numbers.
>> >> >
>> >> > - Re: LinkUnqueue. Can you send the output of your configuration (cool
>> >> > use of define and Script btw), as well as the configuration? Again,
>> >> > with InfiniteSource I see expected behavior, and I would not expect
>> >> > LinkUnqueue to throttle Socket.
>> >> >
>> >> > It may be that you are finding an unfortunate interaction between
>> >> > Click's task handlers and its file descriptor handlers -- something we
>> >> > could potentially fix. But without specific numbers it's hard to tell.
>> >> >
>> >> > Eddie
>> >> >
>> >> >
>> >> > Robert Ross wrote:
>> >> >> The only clear item that seems to have a marked difference is the
>> >> >> LinkUnqueue element. The fact that our ToDevice and FromDevice/Socket
>> >> >
>> >> >> performance appears to be related somehow to the configuration of a
>> >> >> LinkUnqueue element sitting in the middle of our configuration is too
>> >> >> obvious to ignore. Does LinkUnqueue perform some kind of
>> >> >> upstream/downstream notification to these elements, causing them to
>> >> >> throttle their behavior based on LinkUnqueue?
>> >> >>
>> >> >> In our tests, with all other elements remaining the same, here is what
>> >> >
>> >> >> we found from two independent read handler counts:
>> >> >>
>> >> >> LinkUnqueue("512Kbps") = Maximum ~190 packets/second pushed from the
>> >> >> Socket element and pulled by the ToDevice element
>> >> >> LinkUnqueue("1Mbps") = Maxmum ~290 packets/second pushed from the
>> >> >> Socket element and pulled by the ToDevice element
>> >> >> LinkUnqueue("2Mbps") = Maximum ~490 packets/second pushed from the
>> >> >> Socket element and pulled by the ToDevice element
>> >> >> LinkUnqueue("4Mbps") = Maximum ~780 packets/second pushed from the
>> >> >> Socket element and pulled by the ToDevice element
>> >> >> LinkUnqueue("6Mbps") = Maximum ~980 packets/second pushed from the
>> >> >> Socket element and pulled by the ToDevice element
>> >> >> LinkUnqueue("8Mbps") = Maximum ~1360 packets/second pushed from the
>> >> >> Socket element and pulled by the ToDevice element
>> >> >>
>> >> >> It is also telling that independant handler counters corroborate
>> >> >> exactly the same maximum packets per second in two very different
>> >> >> places in the configuration. Clearly you can see that the limitation
>> >> >> on processing is completely artificial and not an actual performance
>> >> >> problem, since increasing LinkUnqueue increases the performance in a
>> >> >> very controlled and obvious manner.
>> >> >>
>> >> >> I have attached a simple configuration that examines specific handlers
>> >> >
>> >> >> and outputs values each second to a CSV file for analysis. The
>> >> >> configuration is scaled back to complete simplicity, yet has the same
>> >> >> performance as our actual configuration which has a much more
>> >> >> complicated configuration. Nevertheless, the performance is identical
>> >> >
>> >> >> and seems to point squarely at LinkUnqueue.
>> >> >>
>> >> >> What is LinkUnqueue doing that could be causing this type of effect on
>> >> >
>> >> >> FromHost, Socket and ToDevice?
>> >> >>
>> >> >>
>> >> >> ________________________________
>> >> >>
>> >> >> From: Robert Ross
>> >> >> Sent: Friday, January 25, 2008 7:40 PM
>> >> >> To: 'Beyers Cronje'
>> >> >> Cc: click at pdos.csail.mit.edu
>> >> >> Subject: RE: [Click] Userlevel performance issues
>> >> >>
>> >> >>
>> >> >> Sorry, I wasn't clear that the queues are necessary for our
>> >> >> configuration. The configuration is somewhat complex. I was only
>> >> >> attempting to highlight the important parts.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> ________________________________
>> >> >>
>> >> >> From: Beyers Cronje [mailto:bcronje at gmail.com]
>> >> >> Sent: Friday, January 25, 2008 7:31 PM
>> >> >> To: Robert Ross
>> >> >> Cc: click at pdos.csail.mit.edu
>> >> >> Subject: Re: [Click] Userlevel performance issues
>> >> >>
>> >> >>
>> >> >> Hi Robert,
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> * We first found that when UserLevel Click started pulling
>> >> >> from a
>> >> >> PCAP file, the performance of the ToDevice() appeared to drop
>> >> >> sharply.
>> >> >> What I mean by this is that the ToDevice() pull handler reported
>> >> >
>> >> >> values
>> >> >> in the range of 200 packets/second once the PCAP file started
>> >> >> reading.
>> >> >> This resulted in the outbound queue just prior to the ToDevice()
>> >> >
>> >> >> filling
>> >> >> up and eventually overflowing because the packet rate in the
>> >> > PCAP
>> >> >> file
>> >> >> is far more than 200 packets/second.
>> >> >>
>> >> >>
>> >> >> You dont have to use a queue between FromDump and ToDevice as FromDump
>> >> >
>> >> >> is an agnostic element. In other words you can connect Todevice
>> >> >> directly to FromDump which should ensure that at least no packets are
>> >> >> dropped and you should see best ToDevice performance.
>> >> >>
>> >> >> Also there are a few tuning parameters. Try tuning your NIC TX Ring
>> >> >> size. On the e1000 driver the default TX ring size is 256, experiment
>> >> >> with different value to see if it makes a difference.ToDevice uses a
>> >> >> packet socket from transmit, so it might be worth experimenting with
>> >> >> /proc/sys/net/core/wmem_default /proc/sys/net/core/wmem_max
>> >> >>
>> >> >>
>> >> >> Beyers
>> >> >>
>> >> >>
>> >> >>
>> >> >> ----------------------------------------------------------------------
>> >> >> --
>> >> >>
>> >> >> _______________________________________________
>> >> >> click mailing list
>> >> >> click at amsterdam.lcs.mit.edu
>> >> >> //_//_https://amsterdam.lcs.mit.edu/mailman/listinfo/click_/_/
More information about the click
mailing list