[Click] click-fastclassifier breaks alignment alignment
rootkit85 at yahoo.it
rootkit85 at yahoo.it
Mon Apr 26 06:51:50 EDT 2010
On Fri, Apr 23, 2010 at 4:54 PM, Eddie Kohler <kohler at cs.ucla.edu> wrote:
> Hi Matteo,
>
> I think there is something else going on. Click-fastclassifier is designed
> to handle alignment correctly, but this requires that the input
> configuration has been passed through click-align. Do you pass your config
> through click-align BEFORE passing it to click-fastclassifier? I think the
> answer is Yes. In that case, then, can you confirm whether the alignment
> click-align produces is correct? Can you send your configuration both
> before & after click-align?
>
> E
>
>
> On 4/23/10 5:36 AM, rootkit85 at yahoo.it wrote:
>>
>> click-fastclassier generates code which produces _lots_ of unaligned
>> accesses on MIPS platform.
>> For example this code snippet:
>>
>> FastClassifier_a_aeth::length_unchecked_push(Packet *p)
>> {
>> const unsigned *data = (const unsigned *)(p->data() - 2);
>> step_0:
>> if ((data[3]& 0xffff) != 0x800)
>> goto step_4;
>> [...]
>>
>> is wrong even if the packet is properly aligned, as it starts reading
>> double words from a pointer which address %4 != 0
>> this could be easily fixed by using uint8 in the byte matching code.
>> As the fastclassifier tool is quite a lot of code, do you know any
>> fast way to change it in order to fix?
>>
>> Thanks a lot,
>> Matteo Croce
>> _______________________________________________
>> click mailing list
>> click at amsterdam.lcs.mit.edu
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
Well I have a consideration about click's alignment process.
click-align adds Align elements in the configuration which does
Packet::push() which
in turns does memmove()
With some arch like the one i'm using (MIPS 24K) it seems that doing
~50 unaligned accesses is
faster than copying 1500 bytes of data. I managed to run an unaligned
configuration and I gained ~15 MB/sec of real throughput.
So the best solution for alignment accesses would be accessing memory
with char*.
I get an huge improvement in both throughput and processing time with
this small patch:
--- a/elements/standard/classifier.cc 2010-04-12 18:56:27.000000000 +0200
+++ b/elements/standard/classifier.cc 2010-04-26 12:45:20.677465574 +0200
@@ -1127,7 +1127,8 @@
}
do {
- uint32_t data = *((const uint32_t *)(packet_data + ex[pos].offset));
+ const unsigned char *datac = packet_data + ex[pos].offset;
+ uint32_t data = (datac[0] << 24) | (datac[1] << 16) | (datac[2]
<< 8) | datac[3];
data &= ex[pos].mask.u;
pos = ex[pos].j[data == ex[pos].value.u];
} while (pos > 0);
More information about the click
mailing list