
On 1/26/2014 4:00 PM, Itamar Heim wrote:
On 01/26/2014 10:51 PM, Ted Miller wrote:
On 1/26/2014 3:10 PM, Itamar Heim wrote:
On 01/26/2014 10:08 PM, Ted Miller wrote:
My Data Center is down, and won't come back up.
Data Center Status on the GUI flips between "Non Responsive" and "Contending"
Also noted: Host sometimes seen flipping between "Low" and "Contending" in SPM column. Storage VM2 "Data (Master)" is in "Cross Data-Center Status" = Unknown VM2 is "up" under "Volumes" tab
Created another volume for VM storage. It shows up in "volumes" tab, but when I try to add "New Domain" in storage tab, says that "There are No Data Centers to which the Storage Domain can be attached"
Setup: 2 hosts w/ glusterfs storage 1 engine all 3 computers Centos 6.5, just updated ovirt-engine 3.3.0.1-1.el6 ovirt-engine-lib 3.3.2-1.el6 ovirt-host-deploy.noarch 1.1.3-1.el6 glusterfs.x86_64 3.4.2-1.el6
This loop seems to repeat in the ovirt-engine log (grep of log showing only DefaultQuartzScheduler_Worker-79 thread:
2014-01-26 14:44:58,416 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) Irs placed on server 9a591103-83be-4ca9-b207-06929223b541 failed. Proceed Failover 2014-01-26 14:44:58,511 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) hostFromVds::selectedVds - office4a, spmStatus Free, storage pool mill 2014-01-26 14:44:58,550 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) SpmStatus on vds 127ed939-34af-41a8-87a0-e2f6174b1877: Free 2014-01-26 14:44:58,571 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) starting spm on vds office4a, storage pool mill, prevId 2, LVER 15 2014-01-26 14:44:58,579 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) START, SpmStartVDSCommand(HostName = office4a, HostId = 127ed939-34af-41a8-87a0-e2f6174b1877, storagePoolId = 536a864d-83aa-473a-a675-e38aafdd9071, prevId=2, prevLVER=15, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=false), log id: 74c38eb7 2014-01-26 14:44:58,617 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) spmStart polling started: taskId = e8986753-fc80-4b11-a11d-6d3470b1728c 2014-01-26 14:45:00,662 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-79) Failed in HSMGetTaskStatusVDS method 2014-01-26 14:45:00,664 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-79) Error code AcquireHostIdFailure and error message VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id 2014-01-26 14:45:00,665 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) spmStart polling ended: taskId = e8986753-fc80-4b11-a11d-6d3470b1728c task status = finished 2014-01-26 14:45:00,666 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id 2014-01-26 14:45:00,695 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) spmStart polling ended, spm status: Free 2014-01-26 14:45:00,702 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (DefaultQuartzScheduler_Worker-79) START, HSMClearTaskVDSCommand(HostName = office4a, HostId = 127ed939-34af-41a8-87a0-e2f6174b1877, taskId=e8986753-fc80-4b11-a11d-6d3470b1728c), log id: 336ec5a6 2014-01-26 14:45:00,722 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (DefaultQuartzScheduler_Worker-79) FINISH, HSMClearTaskVDSCommand, log id: 336ec5a6 2014-01-26 14:45:00,724 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) FINISH, SpmStartVDSCommand, return: org.ovirt.engine.core.common.businessentities.SpmStatusResult@13652652, log id: 74c38eb7 2014-01-26 14:45:00,733 INFO [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (DefaultQuartzScheduler_Worker-79) Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: 536a864d-83aa-473a-a675-e38aafdd9071 Type: StoragePool 2014-01-26 14:45:00,778 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
Ted Miller Elkhart, IN, USA
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
is this gluster storage (guessing sunce you mentioned a 'volume') yes (mentioned under "setup" above) does it have a quorum? Volume Name: VM2 Type: Replicate Volume ID: 7bea8d3b-ec2a-4939-8da8-a82e6bda841e Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.41.65.2:/bricks/01/VM2 Brick2: 10.41.65.4:/bricks/01/VM2 Brick3: 10.41.65.4:/bricks/101/VM2 Options Reconfigured: cluster.server-quorum-type: server storage.owner-gid: 36 storage.owner-uid: 36 auth.allow: * user.cifs: off nfs.disa (there were reports of split brain on the domain metadata before when no quorum exist for gluster) after full heal:
[root@office4a ~]$ gluster volume heal VM2 info Gathering Heal info on volume VM2 has been successful
Brick 10.41.65.2:/bricks/01/VM2 Number of entries: 0
Brick 10.41.65.4:/bricks/01/VM2 Number of entries: 0
Brick 10.41.65.4:/bricks/101/VM2 Number of entries: 0 [root@office4a ~]$ gluster volume heal VM2 info split-brain Gathering Heal info on volume VM2 has been successful
Brick 10.41.65.2:/bricks/01/VM2 Number of entries: 0
Brick 10.41.65.4:/bricks/01/VM2 Number of entries: 0
Brick 10.41.65.4:/bricks/101/VM2 Number of entries: 0
noticed this in host /var/log/messages (while looking for something else). Loop seems to repeat over and over.
Jan 26 15:35:52 office4a sanlock[3763]: 2014-01-26 15:35:52-0500 14678 [30419]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 26 15:35:53 office4a sanlock[3763]: 2014-01-26 15:35:53-0500 14679 [3771]: s1997 add_lockspace fail result -90 Jan 26 15:35:58 office4a vdsm TaskManager.Task ERROR Task=`89885661-88eb-4ea3-8793-00438735e4ab`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012 res = f(*args, **kwargs)#012 File "/usr/share/vdsm/storage/hsm.py", line 2111, in getAllTasksStatuses#012 allTasksStatus = sp.getAllTasksStatuses()#012 File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012 raise SecureError()#012SecureError Jan 26 15:35:59 office4a sanlock[3763]: 2014-01-26 15:35:59-0500 14686 [30495]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 26 15:36:00 office4a sanlock[3763]: 2014-01-26 15:36:00-0500 14687 [3772]: s1998 add_lockspace fail result -90 Jan 26 15:36:00 office4a vdsm TaskManager.Task ERROR Task=`8db9ff1a-2894-407a-915a-279f6a7eb205`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/storage/task.py", line 318, in run#012 return self.cmd(*self.argslist, **self.argsdict)#012 File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012 self.masterDomain.acquireHostId(self.id)#012 File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012 self._clusterLock.acquireHostId(hostId, async)#012 File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId#012 raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long'))
Ted Miller Elkhart, IN, USA
this is the new storage domain? what about the previous volume for the first SD?
The default/default data center/cluster had to be abandoned because of a split-brain that could not be healed. Can't remove old storage from database, can't get data center up due to corrupt storage, ends up a circular argument. I started over with same hosts, totally new storage in new data center. This mill/one data center/cluster was working fine with VM2 storage, then died. Ted Miller