[Planetlab-devel] diagnosing a sneak bug in 5.0 & impact on 4.2
Faiyaz Ahmed
faiyaza at CS.Princeton.EDU
Thu May 29 13:05:15 EDT 2008
Hi Thierry,
This sounds fishy.
> the thing is, this fails, so the actual boot-mode ssh keys dont get
> pushed on the hard drive
> however the node proceeds and does the kexec, resulting in an
> unreachable node.
>
> ======
> . when running conf_files, the command performs an xmlrpc call to the
> api, for obtaining getslivers, and does so with a session auth method:
> the capture that I made shows that the postdata that gets into the http
> session looks like this
>
> "<?xml version='1.0'?>
> <methodCall>
> <methodName>GetSlivers</methodName>
> <params><param>\n<value><struct>\n<member>\n<name>session</name>
> <value><string>eTJPKAIex5XysDpHBcZgXTVZ25OlxLH/serial8250: too much work
> for irq4^M
> 7aROqL73mKM=</string>
> </value>\n</member>\n<member>\n<name>AuthMethod</name>\n<value><string>session</string></value>\n</member>\n
>
> </struct></value>\n</param>\n</params>\n</methodCall>\n"
>
> where the actual session string has this suspicious-looking part about
> 'serial8250: too much work for irq4'
I've seen this message before but its usually the kernel of the machine
I'm SSH'ing from or the node I'm SSH'ing to complaining about bad irq
settings. The syslog is likely set to echo emerg to all terminals. Do
you think this is the case?
Can you call this method, with the same credentials used by the node
from a different machine and check the post/get results?
> PS.
> one last thing; in this case the BootManager should basically refrain
> from reaching the kexec, as this means losing the node
> on another, totally unrelated, problem, we have a node failing to
> perform 'chkconfig ntpd on' in the chroot, and in this case rather than
> trying to proceed anyway, BM gives up and the node remains in 'dbg' mode.
> I'd rather the opposite; ntp failing does not seem like such a big deal,
> while in the conf_files case the node becomes unreachable to us.
> Did anyone have a plan to review this BM logic. ?
I agree. It seems counterintuitive to fail on a non issue and continue
on a real one. I can work out the exit codes on conf_files so it BM
will put the node back into debug if it can't contact PLC.
Faiyaz
>
> _______________________________________________
> Devel mailing list
> Devel at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/devel
More information about the Devel
mailing list