[Planetlab-users] node availability benchmarks

Larry Peterson llp at CS.Princeton.EDU
Fri Jun 18 14:13:23 EDT 2004


On Jun 18, 2004, at 1:36 PM, Neil Spring wrote:

> I'd very much like to see a number that represents "fully operational" 
> -- the number of machines that have zero problems.  In particular, 
> that DNS and ping work.  I would like sites to be encouraged not to 
> put planetlab boxes behind firewalls.

I completely agree with the goal, and that we should provide incentives 
to
not put PL boxes behind firewalls, but there are many usable nodes (to 
some
applications) that aren't pingable.

>
> I'd change:
>
> "visible": I don't understand the value in reporting a metric based on 
> being able to ssh into a non-vserver slice.  who has one of these but 
> you?

Visible is not intended to equal usable. It just says we've eliminated 
one
source of problems: those introduced by bad network configurations.

I recognize that the lower bound is most helpful to users, but (1) one 
user's
lower bound may be different from another's (hence there might be value 
in
defining multiple independent tests), and (2) benchmarks are useful to 
both
users (trying to select good nodes) and operations people (trying to 
isolate
and quantify the different classes of problems commonly encountered).

Larry

>
> "usable": to me should be a number of machines I should have no 
> problems with.  If you keep a list of "usable" machines and I find one 
> to be unusable, I should be able to file a trouble ticket with high 
> confidence that it's a new issue to you.  That is, usable nodes should 
> pass *all* of Mic's tests.  Ganglia running is not evidence of a node 
> being usable -- ganglia is lightweight and designed to run on pretty 
> messed up machines.   This is where ping belongs.
>
> think lower bound.
>
> thanks,
> -neil
>
> On Jun 18, 2004, at 6:43 AM, Larry Peterson wrote:
>
>> As has recently been discussed on this list, there are a lot of 
>> different
>> numbers being used to describe how many nodes are on PlanetLab. I 
>> thought
>> it might be helpful to provide a rough break down. I'd also be 
>> interested
>> in any thoughts people have on my definitions.
>>
>> Total: This is the number that appears on the home page (currently 
>> 395) and
>>        corresponds to the number of machines that have been 
>> registered with
>>        the database. Some of these machines have never actually 
>> booted. We
>>        all understand that this number has more PR value than 
>> anything else.
>>
>> Visible: This corresponds to the number of nodes we can "reach" from 
>> Princeton.
>>        (Others run similar experiments from other sites.) Our working 
>> definition
>>        for a "visible" node is that we can ssh into a non-vserver 
>> account on it.
>>        Recently, we have been seeing ~325 visible nodes.
>>
>>        As an aside, an alternative definition of visible is that the 
>> machine is
>>        pingable. This set is typically smaller because many sites 
>> filter pings, so
>>        reachable via ssh seems to be a better definition of visible.
>>
>>        Of the 70-odd nodes that have been registered but are not 
>> currently visible,
>>        roughly half have are in "debug" mode; i.e., have known, 
>> long-term problems.
>>        It's sometimes a hardware failure, sometimes a site that's 
>> gun-shy about a
>>        recent incident report, sometimes a change in the local net 
>> configuration,
>>        and sometimes a non-responsive contact that hasn't rebooted a 
>> hung machine.
>>        Martin monitors node availability on a daily basis, and has 
>> started to post
>>        his results at 
>> https://www.planet-lab.org/Wiki/bin/view/Planetlab/NodeInfo
>>
>>        The other half are what I would characterize as having 
>> transient problems:
>>        the machine crashes or hangs, we remotely reboot it or send 
>> email to the
>>        site's technical contact, and within a day or two, all is 
>> well. We've been
>>        seeing ~10 nodes come up and another ~10 go down on any give 
>> day.
>>
>> Usable: This corresponds to the number of nodes that an application 
>> can productively
>>        use. By it's very nature, whether a node qualifies as usable 
>> is application-
>>        specific. However, it seems useful to define one or more 
>> generic benchmarks
>>        that we could run periodically.
>>
>>        The lowest threshold seems to be that a service running on the 
>> node is able
>>        to phone home. Ganglia and the NOC sensor do this, and report 
>> availablility
>>        numbers in the upper 200's (~290).
>>
>>        A higher threshold would be that that one could ssh into the 
>> node, and
>>        "consume" some number of resources, something representative 
>> of a minimal
>>        service.
>>
>>        Another possibility would be to create a new slice and see how 
>> many nodes it
>>        comes up on (in what timeframe). The current 
>> model/architecture does not
>>        necessarily support rapid slice creation/termination since 
>> slices are intended
>>        to be fairly long-lived, but this would be a useful metric 
>> nonetheless.
>>
>> I would be interested in hearing other suggestions for benchmarks we 
>> should run to
>> evaluate node availability. In this context, it's worth noting that 
>> we've recently
>> seen numbers posted to this mailing list ranging from around 200 to 
>> nearly 300. Why
>> the wide range? One thing that's clear from Mic's data is that slices 
>> aren't reliably
>> being created on all nodes the user selects. We expect upgrades to 
>> the node manager
>> being rolled out in coming weeks will improve this situation, but we 
>> clearly need to
>> measure our progress on this front. There are also nodes with 
>> insufficient resources;
>> again, the allocation/scheduling upgrades in the works should help.
>>
>> What I don't have a good handle on is the extent to which users are 
>> able to
>> successfully work around the churn that happens on PlanetLab. 
>> Insights on this
>> point would be helpful.
>>
>> Larry
>>
>>
>> _______________________________________________
>> Users mailing list: Users at lists.planet-lab.org
>> http://lists.planet-lab.org/mailman/listinfo/users
>>
>
> _______________________________________________
> Users mailing list: Users at lists.planet-lab.org
> http://lists.planet-lab.org/mailman/listinfo/users
>



More information about the Users mailing list