
On 01/27/2014 12:00 AM, Ted Miller wrote:
On 1/26/2014 4:00 PM, Itamar Heim wrote:
On 01/26/2014 10:51 PM, Ted Miller wrote:
On 1/26/2014 3:10 PM, Itamar Heim wrote:
On 01/26/2014 10:08 PM, Ted Miller wrote:
My Data Center is down, and won't come back up.
Data Center Status on the GUI flips between "Non Responsive" and "Contending"
Also noted: Host sometimes seen flipping between "Low" and "Contending" in SPM column. Storage VM2 "Data (Master)" is in "Cross Data-Center Status" = Unknown VM2 is "up" under "Volumes" tab
Created another volume for VM storage. It shows up in "volumes" tab, but when I try to add "New Domain" in storage tab, says that "There are No Data Centers to which the Storage Domain can be attached"
Setup: 2 hosts w/ glusterfs storage 1 engine all 3 computers Centos 6.5, just updated ovirt-engine 3.3.0.1-1.el6 ovirt-engine-lib 3.3.2-1.el6 ovirt-host-deploy.noarch 1.1.3-1.el6 glusterfs.x86_64 3.4.2-1.el6
This loop seems to repeat in the ovirt-engine log (grep of log showing only DefaultQuartzScheduler_Worker-79 thread:
2014-01-26 14:44:58,416 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) Irs placed on server 9a591103-83be-4ca9-b207-06929223b541 failed. Proceed Failover 2014-01-26 14:44:58,511 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) hostFromVds::selectedVds - office4a, spmStatus Free, storage pool mill 2014-01-26 14:44:58,550 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) SpmStatus on vds 127ed939-34af-41a8-87a0-e2f6174b1877: Free 2014-01-26 14:44:58,571 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) starting spm on vds office4a, storage pool mill, prevId 2, LVER 15 2014-01-26 14:44:58,579 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) START, SpmStartVDSCommand(HostName = office4a, HostId = 127ed939-34af-41a8-87a0-e2f6174b1877, storagePoolId = 536a864d-83aa-473a-a675-e38aafdd9071, prevId=2, prevLVER=15, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=false), log id: 74c38eb7 2014-01-26 14:44:58,617 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) spmStart polling started: taskId = e8986753-fc80-4b11-a11d-6d3470b1728c 2014-01-26 14:45:00,662 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-79) Failed in HSMGetTaskStatusVDS method 2014-01-26 14:45:00,664 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-79) Error code AcquireHostIdFailure and error message VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id 2014-01-26 14:45:00,665 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) spmStart polling ended: taskId = e8986753-fc80-4b11-a11d-6d3470b1728c task status = finished 2014-01-26 14:45:00,666 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id 2014-01-26 14:45:00,695 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) spmStart polling ended, spm status: Free 2014-01-26 14:45:00,702 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (DefaultQuartzScheduler_Worker-79) START, HSMClearTaskVDSCommand(HostName = office4a, HostId = 127ed939-34af-41a8-87a0-e2f6174b1877, taskId=e8986753-fc80-4b11-a11d-6d3470b1728c), log id: 336ec5a6 2014-01-26 14:45:00,722 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (DefaultQuartzScheduler_Worker-79) FINISH, HSMClearTaskVDSCommand, log id: 336ec5a6 2014-01-26 14:45:00,724 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-79) FINISH, SpmStartVDSCommand, return: org.ovirt.engine.core.common.businessentities.SpmStatusResult@13652652,
log id: 74c38eb7 2014-01-26 14:45:00,733 INFO [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (DefaultQuartzScheduler_Worker-79) Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: 536a864d-83aa-473a-a675-e38aafdd9071 Type: StoragePool 2014-01-26 14:45:00,778 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-79) IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
Ted Miller Elkhart, IN, USA
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
is this gluster storage (guessing sunce you mentioned a 'volume') yes (mentioned under "setup" above) does it have a quorum? Volume Name: VM2 Type: Replicate Volume ID: 7bea8d3b-ec2a-4939-8da8-a82e6bda841e Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.41.65.2:/bricks/01/VM2 Brick2: 10.41.65.4:/bricks/01/VM2 Brick3: 10.41.65.4:/bricks/101/VM2 Options Reconfigured: cluster.server-quorum-type: server storage.owner-gid: 36 storage.owner-uid: 36 auth.allow: * user.cifs: off nfs.disa (there were reports of split brain on the domain metadata before when no quorum exist for gluster) after full heal:
[root@office4a ~]$ gluster volume heal VM2 info Gathering Heal info on volume VM2 has been successful
Brick 10.41.65.2:/bricks/01/VM2 Number of entries: 0
Brick 10.41.65.4:/bricks/01/VM2 Number of entries: 0
Brick 10.41.65.4:/bricks/101/VM2 Number of entries: 0 [root@office4a ~]$ gluster volume heal VM2 info split-brain Gathering Heal info on volume VM2 has been successful
Brick 10.41.65.2:/bricks/01/VM2 Number of entries: 0
Brick 10.41.65.4:/bricks/01/VM2 Number of entries: 0
Brick 10.41.65.4:/bricks/101/VM2 Number of entries: 0
noticed this in host /var/log/messages (while looking for something else). Loop seems to repeat over and over.
Jan 26 15:35:52 office4a sanlock[3763]: 2014-01-26 15:35:52-0500 14678 [30419]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 26 15:35:53 office4a sanlock[3763]: 2014-01-26 15:35:53-0500 14679 [3771]: s1997 add_lockspace fail result -90 Jan 26 15:35:58 office4a vdsm TaskManager.Task ERROR Task=`89885661-88eb-4ea3-8793-00438735e4ab`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012 res = f(*args, **kwargs)#012 File "/usr/share/vdsm/storage/hsm.py", line 2111, in getAllTasksStatuses#012 allTasksStatus = sp.getAllTasksStatuses()#012 File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012 raise SecureError()#012SecureError Jan 26 15:35:59 office4a sanlock[3763]: 2014-01-26 15:35:59-0500 14686 [30495]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 26 15:36:00 office4a sanlock[3763]: 2014-01-26 15:36:00-0500 14687 [3772]: s1998 add_lockspace fail result -90 Jan 26 15:36:00 office4a vdsm TaskManager.Task ERROR Task=`8db9ff1a-2894-407a-915a-279f6a7eb205`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/storage/task.py", line 318, in run#012 return self.cmd(*self.argslist, **self.argsdict)#012 File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012 self.masterDomain.acquireHostId(self.id)#012 File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012 self._clusterLock.acquireHostId(hostId, async)#012 File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId#012 raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long'))
fede - thoughts on above? (vojtech reported something similar, but it sorted out for him after some retries)
Ted Miller Elkhart, IN, USA
this is the new storage domain? what about the previous volume for the first SD?
The default/default data center/cluster had to be abandoned because of a split-brain that could not be healed. Can't remove old storage from database, can't get data center up due to corrupt storage, ends up a circular argument.
I started over with same hosts, totally new storage in new data center. This mill/one data center/cluster was working fine with VM2 storage, then died.
Ted Miller
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users