[Click] RFC: Annotations

Andrew Brampton brampton+click at gmail.com
Wed Nov 25 18:16:19 EST 2009


For those of you who weren't just at SyClick there was some discussion
on how to improve annotations within Click. The work was split into
two parts, firstly a way to add any number of arbitrary annotations to
a packet, and secondly, a way to tag each element with a list of which
annotations they require and change, and then figure out if a elements
reads a annotation which has not been previously set.

I'm interested in the former annotation change, and I would like to
continue the discussion about it on this mailing list. There was a lot
of discussion at SyClick, so not all of this is my work. I would
appreciate if others gave feedback and suggestions. Additionally I'm
writing this from a airport without wifi, so my facts may be slightly
wrong.

Description:
 Packet annotations are a way to add extra information to a packet as
it flows through Click. Examples include; a paint annotation which
virtually colours a packet, a destination annotation that allows
destination IP address to be attached to a packet, even if it doesn't
have a valid header. Currently annotations seem to be simple integers,
ranging from one to a few bytes. Each packet can also contain multiple
annotations. (I would be interested to know how many and which
annotations people are typically using?)

Problem:
 P1. A packet is limited to 48 bytes of annotations, due to a limited
available space in the linux kernel's skbuf.
 P2. Due to the limited space, many annotations accidentally write
over each other in this small 48byte annotation space.

Requirements:
R1. All existing annotations should be supported.
R2. There are people that annotate packets in high performance
systems. Therefore adding and removing annotations should as least
costly as possible.
R3. A unlimited number of annotations should be supported on each packet.
R4. Annotations should be flexible to hold any size of data, as well
as being extendable so that new types of annotations are easily added.
R5. Annotations should not override each other.
R6. When a Packet is cloned, the annotations must also be cloned.
R7. Annotations are allowed to be pointers to other objects. This
causes issues because if a packet is cloned, so must the annotations.
If the annotation is a pointer click might also have to clone the
annotation, and equally Click must destroy them when the Packet is
discarded.

SyClick Solution:
The solution invented at SyClick involved creating a new
AnnotationSpace array of type Annotation *. Each element in the array
pointed to a Annotation class. The existing 48 byte array in the
Packet had a pointer stored in the last 4 or 8 bytes, which pointed to
this the AnnotationSpace. Each index in the AnnotationSpace was
exclusively used by a different Annotation type. For example, to add a
paint annotation you would create a PaintAnnotation class which
extends Annotation. This class would, for example, store a single
integer to represent the paint colour. You would then define a unique
number for this annotation, for example, 10. To set the paint
annotation you would first find the AnnotationSpace pointer in the
Packet, then check if index 10 contain a valid pointer, if not you
would create a PaintAnnotation object. Finally you would set the value
on this instance. A added bonus was that you could still use the first
40-44 bytes in the Packet for existing annotations, allowing Click to
slowly transition to the new scheme.

I hope I have correctly remembered and represented this solution.

SyClick solution problems:
I believe that the original solution did not consider a high
performance requirements. Therefore, mallocing a AnnotationSpace, and
then another malloc for each Annotation is very costly in both CPU and
memory. Secondly if you still allow use to the 40-44 byte annotation
area, you will run the risk of violating requirement 5. It does
satisfy requirement 7 as long as each Annotaton subclass is clonable,
and when a Packet is cloned it must clone each Annotation and the
AnnotationSpace.

To get some background on the problem, and see what other people have
done when faced with a similar problem I have described how both
FreeBSD and Linux do a similar thing.

FreeBSD kernel solution:
Inside the FreeBSD kernel (not click), each packet is stored in a
mbuf. A type of annotation can be added to these mbufs called
mbuf_tags. These tags contain a integer type identifier as well as a
data associated with the tag. Each mbuf can points to a linked list of
these mbuf_tags. To add a new tag you simply add it to the end of the
linked list. To search if a tag is set you must walk the linked list.
This has pros and cons, firstly it is flexible and very extendable
without wasting memory. However, it does require a malloc for each
mbuf_tag. FreeBSD migrates this malloc cost by having preallocated
pools of tags which speeds up tag allocation.

Linux kernel solution:
Inside a non click Linux kernel, each packet is stored in a skbuf.
These skbufs have many fields, each one for a different annotation.
Each time a annotation is added or removed the skbuf struct changes.
In my opinion this solution is ugly, but it easy, work well and
maintains high performance.

A hybrid solution (aka my solution):
There are only 48 bytes of space available, so if you want an
unlimited number of annotations, you will eventually have to malloc
some addition space. Therefore I suggest a hybrid approach, where the
48byte area is used for as long as possible, and then when you need
more space you move to a new linked list area. To facilitate this I
suggest that each annotation has a unique number. The 48byte area is
then divided into multiple annotations where each annotation looks
like:
struct {
 char size; // the size of the data field
 char id;
 char data[];
}

The number of annotations within the 48byte area would depend on the
data size. For example:
Data Size (in bytes), # of annotations
1, 16
2, 12
4, 8
8, 4.8

As soon as we run out of space in the 48 byte area we malloc a new
annotation area at least as big as the inserted annotation, but most
likely a lot larger. We can then chain these 48byte area to the new
area using a special next annotations annotation.

To check if the packet has a particular annotation this structure will
have to be linearly traversed. This may not be a issue if there are
only a couple of annotations on the packet, but if there are 10 or 20
you may lose performance again (I will have to benchmark that).

Also, this scheme makes it hard to include complex annotations, such
as a pointer to a object. As cloning and deleting will cause memory
leaks.

While sat here in the airport I've come up with a number of different
ideas, here are some more:

A slow/fast annotation:
Perhaps we should split the annotation space into slow and fast areas.
The fast area will exclusively use the 48 byte area, while the slow
area will use some external malloced space. The slow area could also
allow custom clone and cleanup handlers to ensure that any pointers to
class can be dealt with correctly. The fast area will work exactly how
it works now, so we have many of the problems we have now, but with
perhaps less annotations trying to be stuffed in it (unless of cause
want a fast system).

A annotation offset table:
We use the 48byte area, however, we ensure that no more than 48 bytes
of annotation may be configured in a single click configuration. Some
code would parse all the Elements within the Click configuration and
identify which annotations are being used. Then Click determines a
different byte offset for each annotation to exclusively use within
the 48 byte area. For example, say a destination IP address and paint
annotation are being used. The first annotation (dest IP) would be
assigned zero (the beginning of the 48 byte area), and paint would be
assigned 4. Paint is assigned 4 because the first annotation takes 4
bytes, and paint is to appear after it.

When a annotation is written to a Packet, the offset would be looked
up in this table and written to the correct place. This system would
ensure that annotations do not override each other. The system should
also be fast because it is the same as existing just with one level of
indirection. However, we are still limited to 48 bytes, and we don't
easily support pointers to objects.

A more bizzare/clever solution?
I still think there might be a better way to do all this, so I'm
welcome for any suggestions.

What's next?
I hope people will read all the way down to here :) I also hope people
will give their feedback and suggestions. I might code up a couple of
the solutions and benchmark them.

thanks
Andrew



More information about the click mailing list