[Planetlab-users] cpu freezes question
Robert P Ricci
ricci at cs.utah.edu
Sat Aug 25 12:04:20 EDT 2007
Thus spake Roger Pack on Thu, Aug 23, 2007 at 08:49:32PM -0600:
> I have encountered a few cpu freezes on processes, and, wondering if it was
> just me, found this online
> ...
> On PlanetLab, however, high CPU load occasionally causes processes to freeze
> for several seconds, long enough for several successive pings to time out.
> Chord then incorrectly declared peers offline and potentially misrouted
> messages....
>
> My question is how long does these normally last? I've had some in the
> 10-20 second range--is that normal? Thanks!
I've definitely experienced behavior like this, though not as severe as
you seem to have. We did some experiments to measure CPU starvation on
PlanetLab, and the longest period we saw was around half a second. I
could make this code available to you if you'd like to do your own
study. (Some results from our study are in the third paper referenced
below.)
I've also experienced freezes that seemed to result from extremely long
disk I/O latency. Offhand, I would say that those freezes tend to be
longer than the periods of CPU starvation, but I haven't directly
studied them.
I know of three published papers that quantify some of the scheduling
problems and slowdowns you'll see on PlanetLab - though, for all three
of them, this is not the main focus of the paper, it's used as
motivation for the work:
Fixing the Embarrassing Slowness of OpenDHT on PlanetLab
by Sean Rhea, Byung-Gon Chun, John Kubiatowicz, and Scott Shenker
In WORLDS '05
Has some good measurements of the amount of time taken for CPU-intensive
tasks, plus disk read latency
Supercharging PlanetLab - a High Performance, Multi-Application, Overlay
Network Platform
by Jon Turner et al.
To appear in SIGCOMM '07
In Section 5, discusses the scheduler used by PlanetLab, and explores
scheduling behavior with the default and alternate parameters
The Flexlab Approach to Realistic Evaluation of Networked Systems
by Robert Ricci, Jonathon Duerig, Pramod Sanaga, et al.
In NSDI '07
In Appendix A, uses carefully timed nanosleep()s to measure periods of
CPU starvation
There are other papers that discuss scheduling issues (such as the VINI
paper by Bavier et al. in SIGCOMM '06), and high-level CPU availability
(such as the "Experiences Building PlanetLab" paper by Peterson et al.
in OSDI '06), but they don't directly quantify the behavior you're
seeing.
--
/-----------------------------------------------------------
| Robert P Ricci <ricci at cs.utah.edu> | <ricci at flux.utah.edu>
| Research Associate, University of Utah Flux Group
| www.flux.utah.edu | www.emulab.net
\-----------------------------------------------------------
More information about the Users
mailing list