[Planetlab-devel] node manager
Thierry Parmentelat
thierry.parmentelat at sophia.inria.fr
Tue Dec 11 09:50:47 EST 2007
I'm moving here the thread that started on the cvs ml
The issues I have with nm are as follows - from this list you'll see
that i could use some help :-)
- when I started digging yesterday, the external symptom was that slices
did not get created; at least i cound not enter the node through ssh
- there was a lot of issues related to conf_files, so upon faiyaz's
suggestion, and in an attempt to focus on slice creation, I have
commented out all modules but 'sm' in nm.py
- I had made the indentation change because I was seing this kind of
messages which suggested there was something wrong with calling
set_ipaddresses_config. I haven't committed that back yet, but here is
what I now getting with the original version
Tue Dec 11 14:41:53 2007: operation on ts_slicetest1 failed.
Traceback (most recent call last):
File "/usr/share/NodeManager/accounts.py", line 168, in _run
cmd[0](*cmd[1:])
File "/usr/share/NodeManager/accounts.py", line 129, in _ensure_created
if not isinstance(self._acct, next_class): self._acct = next_class(rec)
File "/usr/share/NodeManager/sliver_vs.py", line 65, in __init__
self.configure(rec)
File "/usr/share/NodeManager/sliver_vs.py", line 83, in configure
self.set_resources()
File "/usr/share/NodeManager/sliver_vs.py", line 175, in set_resources
self.set_ipaddresses_config(self.rspec['ip_addresses'])
File "/usr/lib/python2.5/site-packages/vserver.py", line 230, in
set_ipaddresses_config
self.set_ipaddresses(addresses)
File "/usr/lib/python2.5/site-packages/vserver.py", line 219, in
set_ipaddresses
vserverimpl.netremove(self.ctx, "all")
OSError: [Errno -22] Unknown error 4294967274
and I have no clue what this -22 error actually means
- as I wrote in a commit log this morning, generally speaking I think
sliver_vs.set_resources should be much more consevative and protect all
calls to the underlying Vserver object in try/except clauses. In the
example above, an error down below causes control to come up to
accounts._run which breaks the logic, the thread queues dont get polled
and all is broken.
- generally speaking, it's really hard to understand where error
messages actually end up
I'm also trying to improve this on the fly, but that's really not
perfect yet
the error messages directly printed from vserver.py somehow get lost in
daemon mode.
for instance, I need to run without the -d option to see the following
kind of errors, which are not too useful either
Unexpected error with getrlimit for context 501
Unexpected error with getrlimit for context 500
Unexpected error with setrlimit for running context 500
Unexpected error with getrlimit for context 500
Unexpected error with setrlimit for running context 500
...
- and on the same track, some other errors turn out to get logged ... in
the slice's /var/log/boot.log
took me while to figure
[in slice] # cat /var/log/boot.log
Tue Dec 11 14:21:16 2007: starting the virtual server ts_slicetest1
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/vserver.py", line 447, in start
File "/usr/lib/python2.5/site-packages/vserver.py", line 396, in __prep
TypeError: argument 1 must be string, not file
- I have messed around a lot and don't know very well where I am staying
anymore
but this morning I was in a situation where authorized_keys was actually
getting created with some correct value, but that was not visible from
the slice; that is, /home/<slice>/.ssh/authorized_keys was OK but after
I entered the slice I could not see any .ssh - but maybe here I'm doing
something wrong. What's supposed to be the magic that allows the slice
to see a file in /home again ?
-- Thierry
More information about the Devel
mailing list