[Planetlab-users] Discussion on the networking-related issues in 4.2

Sapan Bhatia sapanb at CS.Princeton.EDU
Tue Aug 12 22:42:46 EDT 2008


Dear all,

This is an update on and summary of the networking issues that were
reported in the 4.2 version of PlanetLab. All but one of these have
been resolved (more on this a bit later). First here's a summary of
what happened. Much of the functionality that is required to multiplex
and demultiplex packets between slices is implemented by the
Linux-Vservers extension of Linux. Although substantial in size and
complexity, this extension is maintained by the Linux community, which
keeps it up-to-date with the latest version of the mainline kernel.
Unfortunately, Linux-vserver does not (yet) support some of the
features that have been in use on PlanetLab -- e.g. isolated- raw
sockets, packet sockets and kernel-space tunneling.

In version 4.1, these functionalities were implemented in a module
called VNET, which was also substantial in size and complexity but
unfortunately not maintained by the community, making it problematic
to maintain and keep up-to-date with the mainline kernel. Since we
didn't want to maintain this component indefinitely, we chalked out a
path for convergence through which its features (e.g. raw socket
support) could progressively be merged into Linux-vserver. In doing
so, we cut down the size and complexity of VNET drastically. This
seemed to have worked (mostly) during our initial alpha testing.
However more varied loads revealed problems with a class of raw
sockets called "packet sockets". Quite amusingly, there was a bug in
the first version of PlanetLab 4.2 that caused packet sockets to work
properly most of the time. But when we fixed this bug, they stopped
working altogether and entailed substantial rewriting of the relevant
code.

What's special about packet sockets is that they do not support packet
filtering via Netfilter/iptables, which is what PlanetLab 4.2 uses to
manage associations between network connections and sockets on the one
hand and slices on the other. Also, the implementation of the Linux
network subsystem (deliberately) makes it hard to propagate filtering
actions from one component (TCP/IP) to the other (packet sockets)
without copying packets, potentially multiple times.  We worked around
this restriction by writing some code that is (gently stated) not
portable.

There are two good fixes for the situation -- the first is to port
Netfilter to packet sockets, implementing the right hooks (this
support of course need not be comprehensie, and need only support the
hooks needed by PlanetLab), the second is to extend BPF to be able to
filter based on sliver IDs. Both of these are non-trivial projects to
take on, in terms of the programming time required.

In the absence of these solutions, we'll be forced to keep the kludge
we have in place, which has the side effect (coming back to the
problem that has not been resolved) of intermittently reordering
incoming packets with respect to outgoing packets sent out in response
(e.g. TCP SYN-ACK packets sent out in response to TCP-SYN packets).
This when packet sockets are used.

There are 2 workarounds to this problem:
(i) The packets are timestamped, so they can be sorted upon reception
by a user-space program. e.g. here's a program that seems to be doing
that (I've looked at the code and it seems right, but I haven't tested
it) - www.life-gone-hazy.com/src/tcpdump-tools/tcpdump-reorder.c.
(ii) The problem can be circumvented by using AF_INET/RAW sockets,
which support NETFILTER and iptables natively. Libpcap supports this
type of sockets, so recompiling it and installing it in a slice should
resolve the reordering issue.

If you have any thoughts about this subject, or alternative solutions
to propose, then please post them as a reply.

Thank you
Sapan




--
Sapan Bhatia
www.cs.princeton.edu/~sapanb



More information about the Users mailing list