[Planetlab-users] recvfrom blocking on DNS (in China)

Emil Sit sit+planetlab at MIT.EDU
Mon May 9 10:40:19 EDT 2005


Has anyone observed any nodes blocking in recvfrom to DNS servers
for very long periods of time? e.g.

>     mit_dht at lzu2.6planetlab.edu.cn:~
>     [0] $ strace -p 20267
>     Process 20267 attached - interrupt to quit
>     recvfrom(4,  <unfinished ...>

>     mit_dht at lzu2.6planetlab.edu.cn:~
>     [0] $ lsof -p 20267
>     COMMAND   PID    USER   FD   TYPE    DEVICE    SIZE      NODE NAME
[...]
>     wget    20267 mit_dht    3w   REG     253,2      33   3361614 /tmp/sfs-0.8pre-1.i386.sum
>     wget    20267 mit_dht    4u  IPv4 573940260               UDP lzu2.6planetlab.edu.cn:38300->CS.NIC.EDU.CN:domain 

The wget manpage suggests that wget itself doesn't do any DNS lookup
timeouts on its own, except for any timeout set by the system libraries;
ltrace indicates that it is just calling gethostbyname.  Doesn't
the system library time out DNS lookups?

I've seen several other nodes in that state, including:
    pku2.6planetlab.edu.cn 
    cut1.6planetlab.edu.cn
    xmu1.6planetlab.edu.cn
    zju1.6planetlab.edu.cn
    uestc2.6planetlab.edu.cn

Some of those nodes are trying to run stork to install some packages,
some are just running wget.  Logging into those nodes suggests that
they can successfully resolve via DNS (for example, the lsof output
knows the name of CS.NIC.EDU.CN...)

I've posted to support [PL #5672] and Mark H suggested I inquire
more generally.



More information about the Users mailing list