[Planetlab-devel] node manager

Thierry Parmentelat thierry.parmentelat at sophia.inria.fr
Tue Dec 11 09:50:47 EST 2007


I'm moving here the thread that started on the cvs ml

The issues I have with nm are as follows - from this list you'll see 
that i could use some help :-)

- when I started digging yesterday, the external symptom was that slices 
did not get created; at least i cound not enter the node through ssh

- there was a lot of issues related to conf_files, so upon faiyaz's 
suggestion, and in an attempt to focus on slice creation, I have 
commented out all modules but 'sm' in nm.py

- I had made the indentation change because I was seing this kind of 
messages which suggested there was something wrong with calling 
set_ipaddresses_config. I haven't committed that back yet, but here is 
what I now getting with the original version

Tue Dec 11 14:41:53 2007: operation on ts_slicetest1 failed. 
 Traceback (most recent call last):
  File "/usr/share/NodeManager/accounts.py", line 168, in _run
    cmd[0](*cmd[1:])
  File "/usr/share/NodeManager/accounts.py", line 129, in _ensure_created
    if not isinstance(self._acct, next_class): self._acct = next_class(rec)
  File "/usr/share/NodeManager/sliver_vs.py", line 65, in __init__
    self.configure(rec)
  File "/usr/share/NodeManager/sliver_vs.py", line 83, in configure
    self.set_resources()
  File "/usr/share/NodeManager/sliver_vs.py", line 175, in set_resources
    self.set_ipaddresses_config(self.rspec['ip_addresses'])
  File "/usr/lib/python2.5/site-packages/vserver.py", line 230, in 
set_ipaddresses_config
    self.set_ipaddresses(addresses)
  File "/usr/lib/python2.5/site-packages/vserver.py", line 219, in 
set_ipaddresses
    vserverimpl.netremove(self.ctx, "all")
OSError: [Errno -22] Unknown error 4294967274

and I have no clue what this -22 error actually means

- as I wrote in a commit log this morning, generally speaking I think 
sliver_vs.set_resources should be much more consevative and protect all 
calls to the underlying Vserver object in try/except clauses. In the 
example above, an error down below causes control to come up to 
accounts._run which breaks the logic, the thread queues dont get polled 
and all is broken.

- generally speaking, it's really hard to understand where error 
messages actually end up
I'm also trying to improve this on the fly, but that's really not 
perfect yet
the error messages directly printed from vserver.py somehow get lost in 
daemon mode.
for instance, I need to run without the -d option to see the following 
kind of errors, which are not too useful either
Unexpected error with getrlimit for context 501
Unexpected error with getrlimit for context 500
Unexpected error with setrlimit for running context 500
Unexpected error with getrlimit for context 500
Unexpected error with setrlimit for running context 500
...

- and on the same track, some other errors turn out to get logged ... in 
the slice's /var/log/boot.log
took me while to figure
[in slice] # cat /var/log/boot.log
Tue Dec 11 14:21:16 2007: starting the virtual server ts_slicetest1
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/vserver.py", line 447, in start
  File "/usr/lib/python2.5/site-packages/vserver.py", line 396, in __prep
TypeError: argument 1 must be string, not file


- I have messed around a lot and don't know very well where I am staying 
anymore
but this morning I was in a situation where authorized_keys was actually 
getting created with some correct value, but that was not visible from 
the slice; that is, /home/<slice>/.ssh/authorized_keys was OK but after 
I entered the slice I could not see any .ssh - but maybe here I'm doing 
something wrong. What's supposed to be the magic that allows the slice 
to see a file in /home again ?

-- Thierry



More information about the Devel mailing list