[Users] The SPM host node is in unresponsive mode
shuming at linux.vnet.ibm.com
Tue May 15 06:14:40 UTC 2012
Some errors in service status, Is engine-notifierd critical to VDSM? Why
did it say" pgrep: invalid user name: engine"
[root at ovirt-node1 ~]# service --status-all
/etc/init.d/ceph: ceph conf /etc/ceph/ceph.conf not found; system is not
# Generated by ebtables-save v1.0 on Tue May 15 14:08:06 CST 2012
pgrep: invalid user name: engine
/etc/init.d/engine-notifierd is stopped
JAVA_EXECUTABLE or HSQLDB_JAR_PATH in '/etc/sysconfig/hsqldb' is set to
No active sessions
On 2012-5-15 12:19, Haim Ateya wrote:
> ----- Original Message -----
>> From: "Shu Ming"<shuming at linux.vnet.ibm.com>
>> To: "users at oVirt.org"<users at ovirt.org>
>> Sent: Tuesday, May 15, 2012 4:56:36 AM
>> Subject: [Users] The SPM host node is in unresponsive mode
>> I attached one host node in my engine. Because it is the only one
>> node, it is automatically the SPM node. And it used to run well in
>> engine. Yesterday, some errors happened in the network work of the
>> node. That made the node become "unresponsive" in the engine. I am
>> sure the network errors are fixed and want to bring the node back to
>> life now. However, I found that the only one node could not be
>> "confirm as host been rebooted" and could not be set into the
>> maintenance mode. The reason given there is no active host in the
>> datacenter and SPM can not enter into maintenance mode. It seems
>> it fell into a logic loop here. Losting network can be quite common
>> developing environment even in production environment, I think we
>> have a way to address this problem on how to repair a host node
>> encountering network down for a while.
> Hi Shu,
> first, for the manual fence to work ("confirm host have been rebooted") you will need
> another host in the cluster which will be used as a proxy and send the actual manual fence command.
> second, you are absolutely right, loss of network is a common scenario, and we should be able
> to recover, but lets try to understand why your host remain unresponsive after network returned.
> please ssh to the host and try the following:
> - vdsClient -s 0 getVdsCaps (validity check making sure vdsm service is up and running and communicate with its network socket from localhost)
> - please ping between host and engine
> - please make sure there is no firewall on blocking tcp 54321 (on both host and engine)
> also, please provide vdsm.log (from the time network issues begun) and spm-lock.log (both located on /var/log/vdsm/).
> as for a mitigation, we can always manipulate db and set it correctly, but first, lets try the above.
>> Shu Ming<shuming at linux.vnet.ibm.com>
>> IBM China Systems and Technology Laboratory
>> Users mailing list
>> Users at ovirt.org
Shu Ming<shuming at linux.vnet.ibm.com>
IBM China Systems and Technology Laboratory
More information about the Users