[PL #3916] Bug in Proper...

Mark Huang via RT devel at planet-lab.org
Tue Jan 25 10:46:24 EST 2005


Email Recipients (see http://www.planet-lab.org/Support)
       Requestor: justin at cs.arizona.edu
       Ticket Ccs: mlhuang at cs.princeton.edu, smuir at cs.princeton.edu, stork at cs.arizona.edu, vivek at cs.princeton.edu

==================================================

> When we call Proper on the production nodes sometimes there is no
> response (the call waits forever for a response that will presumably
> never come).
> 
> This is currently happening on planetlab1.cs.purdue.edu (with
> arizona_stork) in case that aids in troubleshooting...   We are happy
> to do whatever we can to help find this bug...

This may be related to the loopback TCP problem that Vivek (and others) 
are having on nodes. I am pulling my hair out trying to figure out what 
the problem could be; every lead that I've followed up on has turned cold.

In your code, can you timeout the connect() request after a few (>=5) 
seconds, and retry forever? If eventually your connect() request 
succeeds, then you're probably seeing the same thing Vivek is seeing. 
Unfortunately, it won't bring us closer to a solution since Vivek's test 
case is easy to reproduce already, but it should just be a matter of 
time before we can track it down.

--Mark




More information about the Devel-community mailing list