
On Wed, Oct 2, 2013 at 12:07 AM, Jason Brooks wrote:
I'm having this issue on my ovirt 3.3 setup (two node, one is AIO, GlusterFS storage, both on F19) as well.
Jason
Me too with oVirt 3.3 setup and GlusterFS DataCenter. One dedicated engine + 2 vdsm hosts. All fedora 19 + ovirt stable. My trying to migrate VM is CentOS 6.4 fully updated After trying to migrate in webadmin gui I get: VM c6s is down. Exit message: 'iface'. For a few moments the VM appears as down in the GUI but actually an ssh session I had before is still alive. Also the qemu process on source host is still there. After a little, the VM results again as "Up" in the source node from the gui Actually it never stopped: [root@c6s ~]# uptime 16:36:56 up 19 min, 1 user, load average: 0.00, 0.00, 0.00 At target many errors such as Thread-8609::ERROR::2013-10-03 16:17:13,086::task::850::TaskManager.Task::(_setError) Task=`a102541e-fbe5-46a3-958c-e5f4026cac8c`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2123, in getAllTasksStatuses allTasksStatus = sp.getAllTasksStatuses() File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper raise SecureError() SecureError But relevant perhaps is around the time of migration (16:33-16:34) Thread-9968::DEBUG::2013-10-03 16:33:38,811::task::1168::TaskManager.Task::(prepare) Task=`6c1d3161-edcd-4344-a32a-4a18f75f5ba3`::finished: {'taskStatus': {'code': 0, 'message': 'Task is initializing', 'taskState': 'running', 'taskResult': '', 'taskID': '0eaac2f3-3d25-4c8c-9738-708aba290404'}} Thread-9968::DEBUG::2013-10-03 16:33:38,811::task::579::TaskManager.Task::(_updateState) Task=`6c1d3161-edcd-4344-a32a-4a18f75f5ba3`::moving from state preparing -> state finished Thread-9968::DEBUG::2013-10-03 16:33:38,811::resourceManager::939::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-9968::DEBUG::2013-10-03 16:33:38,812::resourceManager::976::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-9968::DEBUG::2013-10-03 16:33:38,812::task::974::TaskManager.Task::(_decref) Task=`6c1d3161-edcd-4344-a32a-4a18f75f5ba3`::ref 0 aborting False 0eaac2f3-3d25-4c8c-9738-708aba290404::ERROR::2013-10-03 16:33:38,847::task::850::TaskManager.Task::(_setError) Task=`0eaac2f3-3d25-4c8c-9738-708aba290404`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 318, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/sp.py", line 272, in startSpm self.masterDomain.acquireHostId(self.id) File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId self._clusterLock.acquireHostId(hostId, async) File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: ('d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291', SanlockException(5, 'Sanlock lockspace add failure', 'Input/output error')) 0eaac2f3-3d25-4c8c-9738-708aba290404::DEBUG::2013-10-03 16:33:38,847::task::869::TaskManager.Task::(_run) Task=`0eaac2f3-3d25-4c8c-9738-708aba290404`::Task._run: 0eaac2f3-3d25-4c8c-9738-708aba290404 () {} failed - stopping task 0eaac2f3-3d25-4c8c-9738-708aba290404::DEBUG::2013-10-03 16:33:38,847::task::1194::TaskManager.Task::(stop) Task=`0eaac2f3-3d25-4c8c-9738-708aba290404`::stopping in state running (force False) Instead at source: Thread-10402::ERROR::2013-10-03 16:35:03,713::vm::244::vm.Vm::(_recover) vmId=`4147e0d3-19a7-447b-9d88-2ff19365bec0`::migration destination error: Error creating the requested VM Thread-10402::ERROR::2013-10-03 16:35:03,740::vm::324::vm.Vm::(run) vmId=`4147e0d3-19a7-447b-9d88-2ff19365bec0`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 311, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 347, in _startUnderlyingMigration response['status']['message']) RuntimeError: migration destination error: Error creating the requested VM Thread-1161::DEBUG::2013-10-03 16:35:04,243::fileSD::238::Storage.Misc.excCmd::(getReadDelay) '/usr/bin/dd iflag=direct if=/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/metadata bs=4096 count=1' (cwd None) Thread-1161::DEBUG::2013-10-03 16:35:04,262::fileSD::238::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n512 bytes (512 B) copied, 0.0015976 s, 320 kB/s\n'; <rc> = 0 Thread-1161::INFO::2013-10-03 16:35:04,269::clusterlock::174::SANLock::(acquireHostId) Acquiring host id for domain d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291 (id: 2) Thread-1161::DEBUG::2013-10-03 16:35:04,270::clusterlock::192::SANLock::(acquireHostId) Host id for domain d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291 successfully acquired (id: 2) Full logs here: https://docs.google.com/file/d/0BwoPbcrMv8mvNVFBaGhDeUdMOFE/edit?usp=sharing in the zip file: engine.log vdsm_source.log vdsm_target.log ovirt gluster 3.3 datacenter status.png (note as many times it seems from a gui point of view that DC set to non responsive...?) One strange thing: In Clusters --> Gluster01 --> Volumes I see my gluster volume (gvdata) as up while in DC --> Gluster (My DC name) Storage --> gvdata (my storage domain name) here is marked as down, but the related C6 VM is always up and running??? source: [root@f18ovn03 vdsm]# df -h | grep rhev f18ovn01.mydomain:gvdata 30G 4.7G 26G 16% /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata f18engine.mydomain:/var/lib/exports/iso 35G 9.4G 24G 29% /rhev/data-center/mnt/f18engine.mydomain:_var_lib_exports_iso target: [root@f18ovn01 vdsm]# df -h | grep rhev f18ovn01.mydomain:gvdata 30G 4.7G 26G 16% /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata f18engine.mydomain:/var/lib/exports/iso 35G 9.4G 24G 29% /rhev/data-center/mnt/f18engine.mydomain:_var_lib_exports_iso I didn't try to activate again because I didn't want to make worse things: it seems strange to me that both hosts are marked as up but the only existing Storage Domain (so marked as "Data (Master)) in the Gui is down..... Gianluca