[Users] SPM host in unknown status

Shu Ming shuming at linux.vnet.ibm.com
Mon May 28 03:09:06 UTC 2012


How about your /var/log/vdsm.log in the two nodes?  It  seems that VDSM 
got some problems.

On 2012-5-28 11:04, T-Sinjon wrote:
> 1,on node1, vdsm seems strange , it's sleeping
> [root at ovirt-node-1 ~]# systemctl status vdsmd.service
> vdsmd.service - Virtual Desktop Server Manager
>  Loaded: loaded (/lib/systemd/system/vdsmd.service; enabled)
>  Active: active (running) since Mon, 28 May 2012 02:43:22 +0000; 9min ago
> Process: 1157 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, 
> status=0/SUCCESS)
> Main PID: 2228 (respawn)
>  CGroup: name=systemd:/system/vdsmd.service
>  ? 2228 /bin/bash -e /usr/share/vdsm/respawn --minlifetime...
>  ? 3573 sleep 900
> 2,no firewall blocked
> 3,network is ok, i can ssh into node1 from engine
>
> I have used the fence option (confirm host has been rebooted), but SPM 
> did not changed to other node, below is the engine.log when i do this 
> action:
>
> 2012-05-28 10:49:51,846 INFO 
>  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] 
> (pool-5-thread-49) [72d88732] Lock Acquired to object EngineLock 
> [exclusiveLocks= key: 
> org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand value: 
> ae567034-5d8e-11e1-bdc9-a7168ad4d39f
> , sharedLocks= ]
> 2012-05-28 10:49:51,847 INFO 
>  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] 
> (pool-5-thread-49) [72d88732] Running command: FenceVdsManualyCommand 
> internal: false. Entities affected :  ID: 
> ae567034-5d8e-11e1-bdc9-a7168ad4d39f Type: VDS
> 2012-05-28 10:49:51,927 INFO 
>  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] 
> (pool-5-thread-49) [72d88732] Trying to fence spm ovirt-node-1.local 
> via vds ovirt-node-2.local
> 2012-05-28 10:49:51,933 INFO 
>  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceSpmStorageVDSCommand] 
> (pool-5-thread-49) [72d88732] START, FenceSpmStorageVDSCommand(vdsId = 
> a522a6a6-a72e-11e1-baa3-bba876a88ef4, storagePoolId = 
> 524a7003-edec-4f52-a38e-b15cadfbe3ef, prevId=1, prevLVER=17), log id: 
> 530cb694
> 2012-05-28 10:49:51,965 INFO 
>  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] 
> (pool-5-thread-49) [72d88732] Command 
> org.ovirt.engine.core.vdsbroker.vdsbroker.FenceSpmStorageVDSCommand 
> return value
>  Class Name: 
> org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
> mStatus                       Class Name: 
> org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
> mCode                         654
> mMessage                      Not SPM
>
>
> 2012-05-28 10:49:51,966 INFO 
>  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] 
> (pool-5-thread-49) [72d88732] Vds: ovirt-node-2.local
> 2012-05-28 10:49:51,966 ERROR 
> [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-5-thread-49) 
> [72d88732] Command FenceSpmStorageVDS execution failed. Exception: 
> IRSNonOperationalException: IRSGenericException: IRSErrorException: 
> IRSNonOperationalException: Not SPM
> 2012-05-28 10:49:51,966 INFO 
>  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceSpmStorageVDSCommand] 
> (pool-5-thread-49) [72d88732] FINISH, FenceSpmStorageVDSCommand, log 
> id: 530cb694
> 2012-05-28 10:49:51,967 WARN 
>  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] 
> (pool-5-thread-49) [72d88732] Could not fence spm on vds 
> ovirt-node-2.local
> 2012-05-28 10:49:51,971 ERROR 
> [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] 
> (pool-5-thread-49) [72d88732] Transaction rolled-back for command: 
> org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand.
> 2012-05-28 10:49:51,971 INFO 
>  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] 
> (pool-5-thread-49) [72d88732] Lock freed to object EngineLock 
> [exclusiveLocks= key: 
> org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand value: 
> ae567034-5d8e-11e1-bdc9-a7168ad4d39f
> , sharedLocks= ]
> 2012-05-28 10:49:57,457 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-79) hostFromVds::selectedVds - 
> ovirt-node-2.local, spmStatus Free, storage pool BLC
> 2012-05-28 10:49:57,461 ERROR 
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-79) SPM Init: could not find reported vds or 
> not up - pool:BLC vds_spm_id: 1
> 2012-05-28 10:49:57,466 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-79) SPM selection - vds seems as spm 
> ovirt-node-1.local
> 2012-05-28 10:49:57,466 WARN 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-79) spm vds is non responsive, stopping spm 
> selection.
> 2012-05-28 10:50:00,002 INFO 
>  [org.ovirt.engine.core.bll.AutoRecoveryManager] 
> (QuartzScheduler_Worker-87) Checking autorecoverable hosts
> 2012-05-28 10:50:00,004 INFO 
>  [org.ovirt.engine.core.bll.AutoRecoveryManager] 
> (QuartzScheduler_Worker-87) Checking autorecoverable hosts done
> 2012-05-28 10:50:00,004 INFO 
>  [org.ovirt.engine.core.bll.AutoRecoveryManager] 
> (QuartzScheduler_Worker-87) Checking autorecoverable storage domains
> 2012-05-28 10:50:00,006 INFO 
>  [org.ovirt.engine.core.bll.AutoRecoveryManager] 
> (QuartzScheduler_Worker-87) Checking autorecoverable storage domains done
> 2012-05-28 10:50:07,502 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-93) hostFromVds::selectedVds - 
> ovirt-node-2.local, spmStatus Free, storage pool BLC
> 2012-05-28 10:50:07,505 ERROR 
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-93) SPM Init: could not find reported vds or 
> not up - pool:BLC vds_spm_id: 1
> 2012-05-28 10:50:07,510 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-93) SPM selection - vds seems as spm 
> ovirt-node-1.local
> 2012-05-28 10:50:07,510 WARN 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-93) spm vds is non responsive, stopping spm 
> selection.
> 2012-05-28 10:50:17,551 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-34) hostFromVds::selectedVds - 
> ovirt-node-2.local, spmStatus Free, storage pool BLC
> 2012-05-28 10:50:17,554 ERROR 
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-34) SPM Init: could not find reported vds or 
> not up - pool:BLC vds_spm_id: 1
> 2012-05-28 10:50:17,559 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-34) SPM selection - vds seems as spm 
> ovirt-node-1.local
> 2012-05-28 10:50:17,559 WARN 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-34) spm vds is non responsive, stopping spm 
> selection.
> 2012-05-28 10:50:27,609 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-92) hostFromVds::selectedVds - 
> ovirt-node-2.local, spmStatus Free, storage pool BLC
> 2012-05-28 10:50:27,612 ERROR 
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-92) SPM Init: could not find reported vds or 
> not up - pool:BLC vds_spm_id: 1
> 2012-05-28 10:50:27,617 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-92) SPM selection - vds seems as spm 
> ovirt-node-1.local
> 2012-05-28 10:50:27,618 WARN 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-92) spm vds is non responsive, stopping spm 
> selection.
> 2012-05-28 10:50:37,652 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-67) hostFromVds::selectedVds - 
> ovirt-node-2.local, spmStatus Free, storage pool BLC
> 2012-05-28 10:50:37,656 ERROR 
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-67) SPM Init: could not find reported vds or 
> not up - pool:BLC vds_spm_id: 1
> 2012-05-28 10:50:37,661 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-67) SPM selection - vds seems as spm 
> ovirt-node-1.local
> 2012-05-28 10:50:37,662 WARN 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-67) spm vds is non responsive, stopping spm 
> selection.
> 2012-05-28 10:50:47,709 INFO 
>  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-34) hostFromVds::selectedVds - 
> ovirt-node-2.local, spmStatus Free, storage pool BLC
> 2012-05-28 10:50:47,712 ERROR 
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (QuartzScheduler_Worker-34) SPM Init: could not find reported vds or 
> not up - pool:BLC vds_spm_id: 1
>
> On 28 May, 2012, at 12:08 AM, Haim Ateya wrote:
>
>> Hi, first question that comes to mind is why host is in 
>> non-responsive state?
>> Please check the following:
>> 1. vdsmd service is running on host side
>> 2. No firewall is blocking comm. in and out
>> 3. No network issue between host and manager
>>
>> Now, for your question, you can use the  manual fence option (confirm 
>> host has been rebooted), which will free spm role for faulty host, 
>> and engine will elect new spm.
>>
>> Haim
>>
>> On May 27, 2012, at 18:32, T-Sinjon <tscbj1989 at gmail.com 
>> <mailto:tscbj1989 at gmail.com>> wrote:
>>
>>> Description of problem:
>>>
>>> i have 2 nodes
>>> ovirt-node1.local            Non Responsive        SPM
>>> ovirt-node2.local            Up                None
>>>
>>> The SPM node stuck in Non-responsive status, it can't be actived,
>>> all vms in the node went into Unknown status and the master vm 
>>> domain became inactived
>>>
>>> when i do "Maintenace" action to node1, it says:
>>> Error: Cannot switch Host to Maintenance mode.
>>> Host still has running VMs on it and is in Non-Responsive state.
>>>
>>> but there has no vm running in node1 , it only has 2 vms in Unknown 
>>> status
>>>
>>> Because I can't active the SPM host , so  i can't active  the vm 
>>> storage domain
>>>
>>> 1,How can i migrated the SPM role to other host in my data center , 
>>> such us node2?
>>> 2,How can i send the node1 to UP status?(I have did 'confirm the 
>>> host has been Rebooted' action , and rebooted the node1, but it did 
>>> no sense)
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org <mailto:Users at ovirt.org>
>>> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users


-- 
Shu Ming<shuming at linux.vnet.ibm.com>
IBM China Systems and Technology Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20120528/a756b2f9/attachment-0001.html>


More information about the Users mailing list