[Planetlab-devel] hardware / software compatibility with e1000 AMT
Stephen Soltesz
soltesz at CS.Princeton.EDU
Mon Oct 1 12:16:34 EDT 2007
Last update.
I have gotten the card to work by modifying the boot sequence to include adding
an additional init script that:
modprobe -r e1000
modprobe e1000
sleep 2
before the real 'network' init script is run. This 'catches' the first Checksum
error, and allows the card to be ready when the real network script configures
the network.
From a power-off state, this sequence succeeds. Without the sleep, it does not
work, presumably the card needs the delay to re-initialize itself before
'network' script tries to init the card.
Clearly, this needs a better solution for deployment. The problem seems to be
known in the LKML community, but it's evidently been an issue since 2.6.17 and
still reported in some cases for 2.6.22 kernels, so I'm not sure if it's going
to be fixed any time soon.
Alternately, there may be a module hack that we could add to CVS that would fix
the load problem. But, I'll leave that to the experts.
Stephen.
Stephen Soltesz wrote:
> Loading the module twice successively allows the module to be loaded,
> but the card is still not available to the kernel, for some reason. i.e.
>
> # ifconfig eth0
> eth0: error fetching interface information: Device not found
>
> I got this hint from:
>
> http://www.thinkwiki.org/wiki/Problem_with_e1000:_EEPROM_Checksum_Is_Not_Valid
>
>
> I'm investigating a few other possibilities.
>
> Stephen.
>
> Stephen Soltesz wrote:
>> At Marc's request I began testing an HP machine with support for
>> Intel's 'Active Management Technology' (AMT). AMT promises
>> transparent operation on an existing NIC for remote administration,
>> such as power and reboot.
>>
>> I've configured AMT on this machine, and I can get the machine to reboot.
>>
>> Unfortunately, when the node is configured for 'boot' mode, the node
>> loses network connectivity somewhere along the line during the boot up
>> sequence. Whichever component first configures the host network
>> succeeds and is able to contact MyPLC in order to get the network
>> config, boot state, etc.
>>
>> But, after a kexec to the next kernel, the init scripts fail to
>> initialize the network card again, reporting:
>>
>> 'e1000 device eth0 does not seem to be present, delaying
>> initialization.'
>>
>> Then I have neither console nor ssh access to investigate further.
>> The bootcd password only appears to work at the console in debug
>> mode. This may be by design.
>>
>> When the node is configured for 'debug' mode, the node does not lose
>> network connectivity. Instead, I can log in via ssh, as well as
>> operate the AMT interface to reboot the machine.
>>
>> My best guess is that there is either some interaction between the new
>> kernel and the card, or the action of first configuring the card and
>> then reconfiguring it. The e1000 driver appears to have knowledge of
>> AMT devices, in general.
>>
>> When in debug mode, I can log in at the console, and bring the
>> interface down.
>>
>> # ifconfig eth0 down
>> e1000: eth0: e1000_reset: Hardware Error
>>
>> When attempting to bring the interface up again it works (I can log
>> in) and I see:
>>
>> # ifconfig eth0 up
>> e1000: eth0: e1000_request_irq: Unable to allocate MSI interrupt
>> Error: -22
>> ADDRCONF(NETDEV_UP): eth0: link is not ready
>> e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
>> e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
>> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> eth0: no IPv6 routers present
>>
>> If I bring the interface down, and unload the e1000 module, I see:
>>
>> # ifconfig eth0 down
>> # rmmod e1000
>> ACPI: PCI interrupt for device 000:00:19.0 disabled
>>
>> Upon trying to reload the module, I see:
>>
>> # modprobe e1000
>> ACPI: PCI Interrupt 000:00:19.0[B] -> GSI 19 (level, low) -> IRQ 17
>> e1000: 0000:00:19.0 e1000_probe: The EEPROM Checksum Is Not Valid
>> ACPI: PCI interrupt for device 0000:00:19.0 disabled
>> e1000: probe of 0000:00:19.0 failed with error -5
>>
>> What would cause an EEPROM Checksum error?
>>
>> Stephen.
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at lists.planet-lab.org
>> https://lists.planet-lab.org/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at lists.planet-lab.org
> https://lists.planet-lab.org/mailman/listinfo/devel
More information about the Devel
mailing list