Re: [Users] Migration issues with ovirt 3.3

3 Oct 2013

      On Wed, Oct 2, 2013 at 12:07 AM, Jason Brooks  wrote:
...
I'm having this issue on my ovirt 3.3 setup (two node, one is AIO,
GlusterFS storage, both on F19) as well.
Jason
Me too with oVirt 3.3 setup and GlusterFS DataCenter.
One dedicated engine + 2 vdsm hosts. All fedora 19 + ovirt stable.
My trying to migrate VM is CentOS 6.4 fully updated
After trying to migrate in webadmin gui I get:
VM c6s is down. Exit message: 'iface'.

For a few moments the VM appears as down in the GUI but actually an
ssh session I had before is still alive.
Also the qemu process on source host is still there.
After a little, the VM results again as "Up" in the source node from the gui

Actually it never stopped:
[root@c6s ~]# uptime
 16:36:56 up 19 min,  1 user,  load average: 0.00, 0.00, 0.00

At target many errors such as
Thread-8609::ERROR::2013-10-03
16:17:13,086::task::850::TaskManager.Task::(_setError)
Task=`a102541e-fbe5-46a3-958c-e5f4026cac8c`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2123, in getAllTasksStatuses
    allTasksStatus = sp.getAllTasksStatuses()
  File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper
    raise SecureError()
SecureError

But relevant perhaps is around the time of migration (16:33-16:34)

Thread-9968::DEBUG::2013-10-03
16:33:38,811::task::1168::TaskManager.Task::(prepare)
Task=`6c1d3161-edcd-4344-a32a-4a18f75f5ba3`::finished: {'taskStatus':
{'code': 0, 'message': 'Task is initializing', 'taskState': 'running',
'taskResult': '', 'taskID': '0eaac2f3-3d25-4c8c-9738-708aba290404'}}
Thread-9968::DEBUG::2013-10-03
16:33:38,811::task::579::TaskManager.Task::(_updateState)
Task=`6c1d3161-edcd-4344-a32a-4a18f75f5ba3`::moving from state
preparing -> state finished
Thread-9968::DEBUG::2013-10-03
16:33:38,811::resourceManager::939::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-9968::DEBUG::2013-10-03
16:33:38,812::resourceManager::976::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-9968::DEBUG::2013-10-03
16:33:38,812::task::974::TaskManager.Task::(_decref)
Task=`6c1d3161-edcd-4344-a32a-4a18f75f5ba3`::ref 0 aborting False
0eaac2f3-3d25-4c8c-9738-708aba290404::ERROR::2013-10-03
16:33:38,847::task::850::TaskManager.Task::(_setError)
Task=`0eaac2f3-3d25-4c8c-9738-708aba290404`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/sp.py", line 272, in startSpm
    self.masterDomain.acquireHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id:
('d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291', SanlockException(5, 'Sanlock
lockspace add failure', 'Input/output error'))
0eaac2f3-3d25-4c8c-9738-708aba290404::DEBUG::2013-10-03
16:33:38,847::task::869::TaskManager.Task::(_run)
Task=`0eaac2f3-3d25-4c8c-9738-708aba290404`::Task._run:
0eaac2f3-3d25-4c8c-9738-708aba290404 () {} failed - stopping task
0eaac2f3-3d25-4c8c-9738-708aba290404::DEBUG::2013-10-03
16:33:38,847::task::1194::TaskManager.Task::(stop)
Task=`0eaac2f3-3d25-4c8c-9738-708aba290404`::stopping in state running
(force False)

Instead at source:

Thread-10402::ERROR::2013-10-03
16:35:03,713::vm::244::vm.Vm::(_recover)
vmId=`4147e0d3-19a7-447b-9d88-2ff19365bec0`::migration destination
error: Error creating the requested VM
Thread-10402::ERROR::2013-10-03 16:35:03,740::vm::324::vm.Vm::(run)
vmId=`4147e0d3-19a7-447b-9d88-2ff19365bec0`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 311, in run
    self._startUnderlyingMigration()
  File "/usr/share/vdsm/vm.py", line 347, in _startUnderlyingMigration
    response['status']['message'])
RuntimeError: migration destination error: Error creating the requested VM
Thread-1161::DEBUG::2013-10-03
16:35:04,243::fileSD::238::Storage.Misc.excCmd::(getReadDelay)
'/usr/bin/dd iflag=direct
if=/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/metadata
bs=4096 count=1' (cwd None)
Thread-1161::DEBUG::2013-10-03
16:35:04,262::fileSD::238::Storage.Misc.excCmd::(getReadDelay)
SUCCESS: <err> = '0+1 records in\n0+1 records out\n512 bytes (512 B)
copied, 0.0015976 s, 320 kB/s\n'; <rc> = 0
Thread-1161::INFO::2013-10-03
16:35:04,269::clusterlock::174::SANLock::(acquireHostId) Acquiring
host id for domain d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291 (id: 2)
Thread-1161::DEBUG::2013-10-03
16:35:04,270::clusterlock::192::SANLock::(acquireHostId) Host id for
domain d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291 successfully acquired (id:
2)

Full logs here:
https://docs.google.com/file/d/0BwoPbcrMv8mvNVFBaGhDeUdMOFE/edit?usp=sharing

in the zip file:
engine.log
vdsm_source.log
vdsm_target.log
ovirt gluster 3.3 datacenter status.png
(note as many times it seems from a gui point of view that DC set to
non responsive...?)

One strange thing:
In Clusters --> Gluster01 --> Volumes I  see my gluster volume (gvdata) as up

while in DC --> Gluster  (My DC name) Storage --> gvdata (my storage
domain name) here is marked as down, but the related C6 VM is always
up and running???

source:
[root@f18ovn03 vdsm]# df -h | grep rhev
f18ovn01.mydomain:gvdata                  30G  4.7G   26G  16%
/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata
f18engine.mydomain:/var/lib/exports/iso   35G  9.4G   24G  29%
/rhev/data-center/mnt/f18engine.mydomain:_var_lib_exports_iso

target:
 [root@f18ovn01 vdsm]# df -h | grep rhev
f18ovn01.mydomain:gvdata                  30G  4.7G   26G  16%
/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata
f18engine.mydomain:/var/lib/exports/iso   35G  9.4G   24G  29%
/rhev/data-center/mnt/f18engine.mydomain:_var_lib_exports_iso

I didn't try to activate again because I didn't want to make worse things:
it seems strange to me that both hosts are marked as up but the only
existing Storage Domain (so marked as "Data (Master)) in the Gui is
down.....

Gianluca

Re: [Users] Migration issues with ovirt 3.3

Gianluca Cecchi