[Planetlab-users] concerning the behavior of
planetlab2.cs.virginia.edu in the last two weeks
Livia Maria Rodrigues Sampaio
livia at dsc.ufcg.edu.br
Thu Aug 24 14:49:47 EDT 2006
Adams, Robert wrote:
Thanks Robert!
Investigating the load average of the node planetlab2.cs.virginia.edu
right now, using the comon tool, no information is available for this
node. There is a message indicating "no response". The same was observed
using cotop (http://planetlab2.cs.princeton.edu:3120/cotop). However, I
can log into the node. And executing the "top" command I got the
following information:
top - 14:42:07 up 97 days, 21:36, 0 users, load average: 526.76,
504.26, 504.05
Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie
Cpu(s): 9.4% us, 46.2% sy, 43.8% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.7% si
Mem: 1036172k total, 1022308k used, 13864k free, 5496k buffers
Swap: 1048568k total, 551604k used, 496964k free, 109124k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9634 ufcg_liv 15 0 2400 1204 1004 S 0.0 0.1 0:00.03 bash
25231 ufcg_liv 15 0 204m 9124 5204 S 0.0 0.9 0:00.35 java
14413 ufcg_liv 16 0 1892 920 760 R 0.0 0.1 0:00.00 top
Note that, it indicates a high load average in this node (526.76,
504.26, 504.05). I would like to get more information about this. What
is going on with the comon/cotop tool for this node?
I am looking forward to hearing from you...
Lívia
> The information about all the other slices running on the node can be collected from comon (http://comon.cs.princeton.edu/). You can get a "top" for all the slices on a node by pointing a browser at http://planetlab.host.name:3120/cotop (for instance, http://planetlab-1.cs.princeton.edu:3120/cotop for the host planetlab-1.cs.princeton.edu).
>
> There are some interfaces on the comon page to select nodes based on this data. Some people parse the cotop data to analyze the application load on nodes (I'm doing research similar to yours).
>
> -- robert adams
>
> -----Original Message-----
> From: users-bounces at planet-lab.org [mailto:users-bounces at planet-lab.org] On Behalf Of Livia Maria Rodrigues Sampaio
> Sent: Wednesday, August 16, 2006 7:17 AM
> To: users
> Subject: [Planetlab-users] concerning the behavior of planetlab2.cs.virginia.edu in the last two weeks
>
> Hello all,
>
> I have been running some experiments using a distributed application (implemented by myself) in the planet-lab with 5 nodes, including planetlab2.cs.virginia.edu. In the last two weeks the performance of my application slowed down. As such performance depends on the workload of the node planetlab2.cs.virginia.edu I suppose this node slowed down due to some reason. I would like to have more information about such workload variability because this is very important to my performance studies. Particularly, If you have also been using this node in your experiments you can contribute with my studies.
>
> The objective of my experiments is to analyze the performance of a distributed
> application using the execution time as performance metric.
> In one of the experiments I used
> messages of 32KB. The execution time of the application was around 1 second.
> However, from july 29th to august 13th, such execution times increased to
> 180 seconds. Presently, august 16th, the execution times are again around
> 1 second. The variability in the performance of the application is associated with the workload of the node planetlab2.cs.virginia.edu. It seems that the node was "fast" and became "slow" for some
> period. This is not surprising in the planet-lab as pointed out in the paper "Fixing the embarrassing slowness of OpenDHT on Planetlab". However, I would like to have more information about the slowness of this specific node.
>
> In order to put some light on this topic I started investigating two possibilities: variabilities in the network load (to/from the
> node) or in the processing load (in the node). In the first
> case, I collected some
> traceroute data, in the periods the node was "fast" and "slow", trying to find some
> communication bottleneck - in this case, I configured the traceroute to send messages of 1KB. The traceroute data described the transmission of a message from each of the 5 nodes considered in my experiments to the same destination. I didn't noticed anything strange in the logs. Consequently, I suppose the reason for the variability in the performance of my application was due to an intensive use of the resources (eg. CPU, memory) of planetlab2.cs.virginia.edu by a number of applications running on this node, during the observed period, concurrently with my own application. Particularly, memory bound application can cause more swapout of the files been used. Moreover, applications with high priority can use more cpu cycles than the others.
>
> Another possibility is the intensive use of bandwidth causing fairsharing of available bandwidth, in this case, each slice of the same node cannot use the bandwidth limit specified for it. I suppose this is more critical as the number of applications running in different slices but on the same node increases.
>
> I would appreciate a lot if you could help me on this topic in the following way:
>
> 1. Have anybody run applications on planetlab2.cs.virginia.edu during the period from july 29th to august 13th? Is your application memory bound or does it require high bandwidth?
> 2. Is it possible to get information about CPU usage or memory consumption of a planet-lab node during some period in the past in order to identity the behavior of the applications competing with my own for the resources in the node planetlab2.cs.virginia.edu?
>
> I am looking forward to hearing from you...
> Lívia
>
> _______________________________________________
> Users mailing list: Users at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/users
>
>
More information about the Users
mailing list