[Planetlab-users] sendto: operation not permitted

Patrick Verkaik pverkaik at cs.ucsd.edu
Sun Nov 12 21:40:32 EST 2006


Hi Neil,

I believe we're doing everything according to the specs. I condensed the 
connecting side of our application into a small example program (attached) 
that reproduces the behaviour. I would appreciate if you or someone could 
take a quick look at it.

The program binds a socket to a local port and (through a separate raw 
socket) sends a SYN and then immediately an ACK packet to REMOTE_ADDR 
(132.239.17.226) port REMOTE_TCP_PORT (33445). For both packets if it 
encounters EPERM it sleeps a second and retries (and keeps doing so until 
EPERM disappears).

This is the tcpdump output when the remote host is allowed to send back a 
SYN/ACK:
20:53:18.068949 IP 132.239.17.224.50227 > 132.239.17.226.33445: S 10:10(0) win 20502
20:53:18.079683 IP 132.239.17.226.33445 > 132.239.17.224.50227: S 905277696:905277696(0) ack 11 win 5712 <mss 1428>
20:53:19.071081 IP 132.239.17.224.50227 > 132.239.17.226.33445: . ack 3389689600 win 20502

(The program gets one EPERM since its first attempt at sending ACK 
precedes the SYN/ACK from the remote host.)

Tcpdump output when the remote host sends back RST instead of SYN/ACK:
21:05:26.991199 IP 132.239.17.224.52034 > 132.239.17.226.33445: S 10:10(0) win 20502
21:05:26.991347 IP 132.239.17.226.33445 > 132.239.17.224.52034: R 0:0(0) ack 11 win 0
21:05:26.991979 IP 132.239.17.224.52034 > 132.239.17.226.33445: . ack 0 win 20502

So far so good. However, if I suppress the SYN/ACK from the remote host I 
get this:

21:09:03.005807 IP 132.239.17.224.52554 > 132.239.17.226.33445: S 10:10(0) win 20502
21:11:03.555572 IP 132.239.17.224.52554 > 132.239.17.226.33445: . ack 0 win 20502

For a full two minutes the program repeatedly gets EPERM as it tries to 
send the ACK. After the two minutes have passed the ACK finally goes 
through.

In the above the program is running on planetlab1.ucsd.edu. When run on 
planet-lab2.cs.ucr.edu it shows exactly the same behaviour.

Can you confirm that we're doing everything the way we should?

 	Thanks,
 	Patrick

On Thu, 9 Nov 2006, Neil Spring wrote:

> Patrick,
>
> PlanetLab's vnet works on the assumption that you can send tcp packets so 
> long as the source port is one your slice "owns" via opening a socket and 
> calling bind, and that you can only receive tcp packets to a destination port 
> that you "own" in the same way.
>
> We send gobs of tcp packets, without EPERM on sendto.   If it's letting a 
> packet through after 100,000 retries, that sounds like a bug.
>
> Planetlab rate limiting has been found to delay packets by 30 seconds (I'm as 
> surprised as anyone, but I heard this from a reputable source on sunday). 
> I'd guess that if you were really blowing the queue, you'd get ENOBUFF rather 
> than EPERM.
>
> I'm pretty sure Mark wrote a vnet faq, or at least the documentation should 
> get you pointed in the right direction.
>
> -neil
>
> On Nov 9, 2006, at 2:58 AM, Patrick Verkaik wrote:
>
>> 
>> Hi,
>> 
>> I have a question about the following sendto() behaviour that I don't quite 
>> understand.
>> 
>> We're sending raw TCP packets using sendto() but getting intermittent EPERM 
>> errors. We've found that repeatedly retrying the sendto() (with exactly the 
>> same packet) until it no longer gives EPERM eventually gets the packet 
>> through. During a short connection (perhaps lasting 10 seconds and sending 
>> across less than 100 bytes of data) we sometimes see about 100,000 failed 
>> sendto() attempts (with a usleep(1) separating the attempts roughly 20,000 
>> attempts).
>> 
>> Another curious fact is that we only see this behaviour when we're 
>> tunneling the reverse TCP traffic into the sending host. The host therefore 
>> doesn't see e.g. SYN/ACK packets coming back in response to outgoing SYNs.
>> 
>> Can anyone explain this? Is this how Planetlab implements rate limiting or 
>> prevents SYN-flooding attacks being launched from Planetlab?
>> 
>> (Btw: we're running as root and using the node's IP address as source IP 
>> address.)
>>
>> 	Thanks,
>>
>> 	Patrick Verkaik
>> 	Barath Raghavan
>> 
>> _______________________________________________
>> Users mailing list: Users at lists.planet-lab.org
>> https://lists.planet-lab.org/mailman/listinfo/users
>
> _______________________________________________
> Users mailing list: Users at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/users

-- 

 	Patrick
-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <assert.h>

#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#include <netinet/ip.h>
#include <netinet/tcp.h>

#undef NDEBUG


// customise
#define REMOTE_TCP_PORT 33445
#define LOCAL_ADDR ("132.239.17.224")
#define REMOTE_ADDR ("132.239.17.226")

struct tcppseudo
{
  u_int32_t saddr;
  u_int32_t daddr;
  u_int16_t protocol;
  u_int16_t tcp_len; 
};

u_int16_t
ip_cksum(u_int16_t *buf, size_t count, u_int16_t *buf2, size_t count2)
{
  /* Code adapted from RFC 1071. */

  register u_int32_t sum = 0;

  while( count > 1 )  {
    /*  This is the inner loop */
    sum += *(buf++);
    count -= 2;
  }

  while( count2 > 1 )  {
    /*  This is the inner loop */
    sum += *(buf2++);
    count2 -= 2;
  }

  /*  Add left-over byte, if any */
  if( count > 0 )
    sum += * (unsigned char *) buf;

  /*  Fold 32-bit sum to 16 bits */
  while (sum>>16)
    sum = (sum & 0xffff) + (sum >> 16);

  return ~sum & (u_int32_t) 0xffff;
}

int
send_raw(int rawsock, char *ip_buf)
{
  struct iphdr *iphdr = (struct iphdr *) ip_buf;
  unsigned pktlen = ntohs (iphdr->tot_len);

  assert (iphdr->protocol == IPPROTO_TCP);
  unsigned iphdrlen = iphdr->ihl * 4;
  struct tcphdr *tcphdr = (struct tcphdr *) &ip_buf[iphdrlen];

  char *l4_buf = ip_buf + iphdr->ihl * 4;
  unsigned l4_len = ntohs (iphdr->tot_len)-iphdr->ihl * 4;

  tcphdr->check = 0;
  struct tcppseudo tcppseudo;
  tcppseudo.saddr = iphdr->saddr;
  tcppseudo.daddr = iphdr->daddr;
  tcppseudo.protocol = htons(IPPROTO_TCP);
  tcppseudo.tcp_len = htons(l4_len);

  tcphdr->check = ip_cksum((u_int16_t *) l4_buf, l4_len,
                           (u_int16_t *) &tcppseudo, sizeof(struct tcppseudo));
  iphdr->check = 0;
  iphdr->check = ip_cksum((u_int16_t *) iphdr, sizeof(struct iphdr), 0, 0);

  struct sockaddr_in dest_addr;
  socklen_t dest_addr_len = sizeof(struct sockaddr_in);
  bzero(&dest_addr, dest_addr_len);
  dest_addr.sin_family = AF_INET;
  dest_addr.sin_addr.s_addr = iphdr->daddr;
  dest_addr.sin_port = tcphdr->dest;

  ssize_t sent;
  sent = sendto(rawsock, ip_buf, pktlen, 0, (struct sockaddr *) &dest_addr,
                    dest_addr_len);

  if (sent != pktlen) {
    if (sent < 0) {
      perror("sendto");
    }
    else
      fprintf(stderr, "sendto returned %d != %u bytes\n", sent, pktlen);
    return 0;
  }
  return 1;
}



int
allocate_raw_tcp_port(u_int32_t saddr)
{
  struct sockaddr_in tcp_sin;
  memset(&tcp_sin, 0, sizeof(tcp_sin));
  tcp_sin.sin_addr.s_addr = saddr;
  tcp_sin.sin_port = 0;

  // discover a free port
  int tcp_sock = socket(PF_INET, SOCK_STREAM, 0);
  if (tcp_sock == -1) {
    perror("allocate_raw_tcp_port: creating TCP socket");
    exit(1);
  }

  if(bind(tcp_sock, (struct sockaddr*) &tcp_sin,
	    sizeof(struct sockaddr_in)) < 0) {
    perror("allocate_raw_tcp_port: binding TCP port");
    return -1;
  }

  struct sockaddr_in sa;
  socklen_t sa_len = sizeof(struct sockaddr_in);
  if (getsockname(tcp_sock, (struct sockaddr *) &sa, (socklen_t *) &sa_len) < 0) {
    perror("allocate_raw_tcp_port: getsockname");
    return -1;
  }

  tcp_sin = sa;
  fprintf(stderr, "allocate_raw_tcp_port: briefly bound tcp socket on %s, %u\n",
                      inet_ntoa (tcp_sin.sin_addr),
                      (unsigned) ntohs(tcp_sin.sin_port));

  // close port so that we can now bind a raw socket to it. note: same
  // weird behaviour occurs when we don't bind a raw socket and just go with
  // tcp_sock
  close(tcp_sock); // XXX may lose the port. 


  struct sockaddr_in raw_sin = tcp_sin;

  fprintf(stderr, "allocate_raw_tcp_port: binding raw on %s, %u\n",
                      inet_ntoa (raw_sin.sin_addr),
                      (unsigned) ntohs(raw_sin.sin_port));

  int tcp_raw_sock = socket(PF_INET, SOCK_RAW, IPPROTO_TCP);
  if (tcp_raw_sock == -1) {
    perror("allocate_raw_tcp_port: creating TCP raw socket");
    exit(1);
  }
  
  if (bind(tcp_raw_sock, (struct sockaddr *) &raw_sin,
      sizeof(struct sockaddr_in)) < 0) {
    perror("allocate_raw_tcp_port: binding raw socket");
    return -1;
  }

  sa_len = sizeof(struct sockaddr_in);
  if (getsockname(tcp_raw_sock, (struct sockaddr *) &sa, (socklen_t *) &sa_len) < 0) {
    perror("allocate_raw_tcp_port: getsockname");
    return -1;
  }

  raw_sin = sa;
  fprintf(stderr, "allocate_raw_tcp_port: bound tcp raw socket on %s, %u\n",
                      inet_ntoa (raw_sin.sin_addr),
                      (unsigned) ntohs(raw_sin.sin_port));
  return htons(raw_sin.sin_port);
}



int
main(int argc, char **argv)
{
  unsigned pkt_len = sizeof(struct iphdr) + sizeof(struct tcphdr);
  char ip_buf[pkt_len+1000];

  u_int32_t saddr;
  u_int32_t daddr;
  if (! inet_aton(LOCAL_ADDR, (struct in_addr*) &saddr)) {
    fprintf(stderr, "inet_aton error\n");
    exit(1);
  }
  if (! inet_aton(REMOTE_ADDR, (struct in_addr*) &daddr)) {
    fprintf(stderr, "inet_aton error\n");
    exit(1);
  }

  int rawsock;
  if((rawsock = socket(PF_INET, SOCK_RAW, IPPROTO_TCP)) < 0) {
    perror("socket");
    exit(1);
  }

  int tmp = 1;
  if (setsockopt(rawsock, 0, IP_HDRINCL, &tmp, sizeof(tmp)) < 0) {
    perror("setsockopt");
    exit(1);
  }

  int local_tcp_port;
  if ((local_tcp_port = allocate_raw_tcp_port(saddr)) < 0)
    exit(1);
  fprintf(stderr, "allocate_raw_tcp_port returned port %d\n", local_tcp_port);

  struct iphdr *iphdr = (struct iphdr *) ip_buf;
  bzero(iphdr, sizeof(struct iphdr));
  iphdr->version = 4;
  iphdr->ihl = 5;
  iphdr->tos = 0;
  iphdr->tot_len = htons(pkt_len); 
  iphdr->id = htons(5);
  iphdr->frag_off = 0;
  iphdr->ttl = 100;
  iphdr->protocol = IPPROTO_TCP;
  iphdr->saddr = saddr;
  iphdr->daddr = daddr;

  char *l4_buf = &ip_buf[sizeof(struct iphdr)];
 
  struct tcphdr *tcphdr = (struct tcphdr *) l4_buf;
  bzero(tcphdr, sizeof(struct tcphdr));

  tcphdr->source = htons(local_tcp_port);
  tcphdr->dest = htons(REMOTE_TCP_PORT);
  tcphdr->seq = htonl(10);
  tcphdr->ack_seq = 0;
  tcphdr->doff = 5;
  tcphdr->window = 5712;
  tcphdr->urg_ptr = 0;

  tcphdr->syn = 1;

  while(1) {
    if (send_raw(rawsock, ip_buf))
      break;
    if (errno != EPERM) {
      fprintf(stderr, "dropping packet\n");
      return;
    }
    sleep(1);
  }
  fprintf(stderr, "sent packet\n");

  tcphdr->syn = 0;
  tcphdr->ack = 1;

  while(1) {
    if (send_raw(rawsock, ip_buf))
      break;
    if (errno != EPERM) {
      fprintf(stderr, "dropping packet\n");
      return;
    }
    sleep(1);
  }
  fprintf(stderr, "sent packet\n");
}



More information about the Users mailing list