On 26/02/15 17:31, Soeren Malchow wrote:
Hi,
we tried this (Roys Mail below) , and yes, shutdown always works.
But now this problem comes up with regular machines now as well.
The environment is setup like this
Engine: Ovirt 3.5.1.1-1.el6 – on CentOS6
The Storage Backend is gluster 3.6.2-1.el7 on CentOS 7
Compute hosts: Libvirt Version: libvirt-1.2.9.1-2.fc20, kvm 2.1.2 -
7.fc20, vdsm vdsm-4.16.10-8.gitc937927.fc20
All compute Servers are 100% identical.
The storage cluster was tested manually and works just fine.
The network interfaces are not fully utilized, more like 15%.
Log output ist as below. The thing in the log output I do not understand
is this
“2015-02-26T14:50:26.650595Z qemu-system-x86_64: load of migration
failed: Input/output error”
From the qemu log.
Also if I shut down machines, put their host into maintenance and start
them somewhere else, everything works just fine.
Can someone help with this ? Any idea where to look ?
Regards
Soeren
*From VDSM Log*
I just tried to migrate a machine, this here happens on the source
ßsnip à
vdsm.log:Thread-49548::DEBUG::2015-02-26
15:42:26,692::__init__::469::jsonrpc.JsonRpcServer::(_serveRequest)
Calling 'VM.migrate' in bridge with {u'params': {u'tunneled':
u'false',
u'dstqemu': u'172.19.2.31', u'src': u'compute04',
u'dst':
u'compute01:54321', u'vmId':
u'b75823d1-00f0-457e-a692-8b95f73907db',
u'abortOnError': u'true', u'method': u'online'},
u'vmID':
u'b75823d1-00f0-457e-a692-8b95f73907db'}
vdsm.log:Thread-49548::DEBUG::2015-02-26
15:42:26,694::API::510::vds::(migrate) {u'tunneled': u'false',
u'dstqemu': u'IPADDR', u'src': u'compute04',
u'dst': u'compute01:54321',
u'vmId': u'b75823d1-00f0-457e-a692-8b95f73907db',
u'abortOnError':
u'true', u'method': u'online'}
vdsm.log:Thread-49549::DEBUG::2015-02-26
15:42:26,699::migration::103::vm.Vm::(_setupVdsConnection)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Destination server is:
compute01:54321
vdsm.log:Thread-49549::DEBUG::2015-02-26
15:42:26,702::migration::105::vm.Vm::(_setupVdsConnection)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Initiating connection with
destination
vdsm.log:Thread-49549::DEBUG::2015-02-26
15:42:26,733::migration::155::vm.Vm::(_prepareGuest)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration started
vdsm.log:Thread-49549::DEBUG::2015-02-26
15:42:26,755::migration::238::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration semaphore
acquired after 0 seconds
vdsm.log:Thread-49549::DEBUG::2015-02-26
15:42:27,211::migration::298::vm.Vm::(_startUnderlyingMigration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::starting migration to
qemu+tls://compute01/system with miguri tcp://IPADDR
vdsm.log:Thread-49550::DEBUG::2015-02-26
15:42:27,213::migration::361::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration downtime thread
started
vdsm.log:Thread-49551::DEBUG::2015-02-26
15:42:27,216::migration::410::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::starting migration monitor
thread
vdsm.log:Thread-49550::DEBUG::2015-02-26
15:43:42,218::migration::370::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
to 50
vdsm.log:Thread-49550::DEBUG::2015-02-26
15:44:57,222::migration::370::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
to 100
vdsm.log:Thread-49550::DEBUG::2015-02-26
15:46:12,227::migration::370::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
to 150
vdsm.log:Thread-49551::WARNING::2015-02-26
15:47:07,279::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (1791MiB) > lowmark (203MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:47:17,281::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (1398MiB) > lowmark (203MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49550::DEBUG::2015-02-26
15:47:27,233::migration::370::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
to 200
vdsm.log:Thread-49551::WARNING::2015-02-26
15:47:27,283::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (1066MiB) > lowmark (203MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:47:37,285::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (701MiB) > lowmark (203MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:47:47,287::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (361MiB) > lowmark (203MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:48:07,291::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (683MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:48:17,292::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (350MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:48:27,294::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (18MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:48:37,296::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (646MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49550::DEBUG::2015-02-26
15:48:42,238::migration::370::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
to 250
vdsm.log:Thread-49551::WARNING::2015-02-26
15:48:47,334::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (317MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:48:57,342::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (1018MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:49:07,344::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (679MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:49:17,346::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (357MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:49:27,348::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (31MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:49:37,349::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (854MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:49:47,351::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (525MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49550::DEBUG::2015-02-26
15:49:57,242::migration::370::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
to 300
vdsm.log:Thread-49551::WARNING::2015-02-26
15:49:57,353::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (183MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:50:07,355::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (785MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:50:17,357::migration::458::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
remaining (457MiB) > lowmark (13MiB). Refer to RHBZ#919201.
vdsm.log:Thread-49551::WARNING::2015-02-26
15:50:27,359::migration::445::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration is stuck: Hasn't
progressed in 150.069458961 seconds. Aborting.
vdsm.log:Thread-49551::DEBUG::2015-02-26
15:50:27,362::migration::470::vm.Vm::(stop)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::stopping migration monitor
thread
vdsm.log:Thread-49549::DEBUG::2015-02-26
15:50:27,852::migration::376::vm.Vm::(cancel)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::canceling migration
downtime thread
vdsm.log:Thread-49549::DEBUG::2015-02-26
15:50:27,852::migration::470::vm.Vm::(stop)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::stopping migration monitor
thread
vdsm.log:Thread-49550::DEBUG::2015-02-26
15:50:27,853::migration::373::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration downtime thread
exiting
vdsm.log:Thread-49549::ERROR::2015-02-26
15:50:27,855::migration::161::vm.Vm::(_recover)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::operation aborted:
migration job: canceled by client
vdsm.log:Thread-49549::ERROR::2015-02-26
15:50:28,049::migration::260::vm.Vm::(run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Failed to migrate
vdsm.log:Thread-49690::DEBUG::2015-02-26
15:50:33,284::__init__::469::jsonrpc.JsonRpcServer::(_serveRequest)
Calling 'VM.getMigrationStatus' in bridge with {u'vmID':
u'b75823d1-00f0-457e-a692-8b95f73907db'}
ßsnip à
At the same time on the destination server
ßsnip à
vdsm.log:Thread-350501::DEBUG::2015-02-26
15:42:25,923::BindingXMLRPC::1133::vds::(wrapper) client
[172.19.2.34]::call vmGetStats with
('b75823d1-00f0-457e-a692-8b95f73907db',) {}
vdsm.log:Thread-350502::DEBUG::2015-02-26
15:42:26,018::BindingXMLRPC::1133::vds::(wrapper) client
[172.19.2.34]::call vmMigrationCreate with ALL THE MACHINE START
PARAMETERS here
vdsm.log:Thread-350502::INFO::2015-02-26
15:42:26,019::clientIF::394::vds::(createVm) vmContainerLock acquired by
vm b75823d1-00f0-457e-a692-8b95f73907db
vdsm.log:Thread-350502::DEBUG::2015-02-26
15:42:26,033::clientIF::407::vds::(createVm) Total desktops after
creation of b75823d1-00f0-457e-a692-8b95f73907db is 3
vdsm.log:Thread-350503::DEBUG::2015-02-26
15:42:26,033::vm::2264::vm.Vm::(_startUnderlyingVm)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Start
vdsm.log:Thread-350502::DEBUG::2015-02-26
15:42:26,036::vm::5658::vm.Vm::(waitForMigrationDestinationPrepare)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration destination:
waiting for VM creation
vdsm.log:Thread-350503::DEBUG::2015-02-26
15:42:26,036::vm::2268::vm.Vm::(_startUnderlyingVm)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::_ongoingCreations acquired
vdsm.log:Thread-350502::DEBUG::2015-02-26
15:42:26,038::vm::5663::vm.Vm::(waitForMigrationDestinationPrepare)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration destination:
waiting 48s for path preparation
vdsm.log:Thread-350503::INFO::2015-02-26
15:42:26,038::vm::3261::vm.Vm::(_run)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::VM wrapper has started
vdsm.log:Thread-350503::WARNING::2015-02-26
15:42:26,041::vm::2056::vm.Vm::(buildConfDevices)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Unknown type found, device:
'{'device': 'unix', 'alias': 'channel0',
'type': 'channel', 'address':
{'bus': '0', 'controller': '0', 'type':
'virtio-serial', 'port': '1'}}'
found
vdsm.log:Thread-350503::WARNING::2015-02-26
15:42:26,041::vm::2056::vm.Vm::(buildConfDevices)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Unknown type found, device:
'{'device': 'unix', 'alias': 'channel1',
'type': 'channel', 'address':
{'bus': '0', 'controller': '0', 'type':
'virtio-serial', 'port': '2'}}'
found
vdsm.log:Thread-350503::DEBUG::2015-02-26
15:42:26,349::vm::1058::vm.Vm::(__init__)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Ignoring param (target,
10485760) in BalloonDevice
vdsm.log:Thread-350503::DEBUG::2015-02-26
15:42:26,350::vm::2294::vm.Vm::(_startUnderlyingVm)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::_ongoingCreations released
vdsm.log:Thread-350502::ERROR::2015-02-26
15:42:26,351::vm::5638::vm.Vm::(_updateDevicesDomxmlCache)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Alias not found for device
type graphics during migration at destination host
vdsm.log:Thread-350503::DEBUG::2015-02-26
15:42:26,353::vm::4128::vm.Vm::(_waitForUnderlyingMigration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Waiting 21600 seconds for
end of migration
vdsm.log:Thread-350502::DEBUG::2015-02-26
15:42:26,377::BindingXMLRPC::1140::vds::(wrapper) return
vmMigrationCreate with {ALL THE MACHINE PARAMETERS HERE
vdsm.log:libvirtEventLoop::DEBUG::2015-02-26
15:42:27,195::vm::5571::vm.Vm::(_onLibvirtLifecycleEvent)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::event Started detail 1
opaque None
vdsm.log:libvirtEventLoop::DEBUG::2015-02-26
15:50:27,037::vm::5571::vm.Vm::(_onLibvirtLifecycleEvent)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::event Stopped detail 5
opaque None
vdsm.log:libvirtEventLoop::INFO::2015-02-26
15:50:27,038::vm::2366::vm.Vm::(_onQemuDeath)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::underlying process disconnected
vdsm.log:libvirtEventLoop::INFO::2015-02-26
15:50:27,038::vm::4952::vm.Vm::(releaseVm)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Release VM resources
vdsm.log:Thread-350503::DEBUG::2015-02-26
15:50:27,044::libvirtconnection::143::root::(wrapper) Unknown
libvirterror: ecode: 42 edom: 10 level: 2 message: Domain not found: no
domain with matching uuid 'b75823d1-00f0-457e-a692-8b95f73907db'
vdsm.log:Thread-350503::ERROR::2015-02-26
15:50:27,047::vm::2325::vm.Vm::(_startUnderlyingVm)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Failed to start a migration
destination vm
vdsm.log:MigrationError: Domain not found: no domain with matching uuid
'b75823d1-00f0-457e-a692-8b95f73907db'
vdsm.log:Thread-350503::DEBUG::2015-02-26
15:50:27,058::vm::2786::vm.Vm::(setDownStatus)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Changed state to Down: VM
failed to migrate (code=8)
vdsm.log:Thread-351517::DEBUG::2015-02-26
15:50:27,089::BindingXMLRPC::1133::vds::(wrapper) client
[172.19.2.34]::call vmDestroy with
('b75823d1-00f0-457e-a692-8b95f73907db',) {}
vdsm.log:Thread-351517::INFO::2015-02-26
15:50:27,092::API::332::vds::(destroy) vmContainerLock acquired by vm
b75823d1-00f0-457e-a692-8b95f73907db
vdsm.log:Thread-351517::DEBUG::2015-02-26
15:50:27,154::vm::5026::vm.Vm::(destroy)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::destroy Called
vdsm.log:libvirtEventLoop::WARNING::2015-02-26
15:50:27,162::utils::129::root::(rmFile) File:
/var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.com.redhat.rhevm.vdsm
already removed
vdsm.log:libvirtEventLoop::WARNING::2015-02-26
15:50:27,163::utils::129::root::(rmFile) File:
/var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.org.qemu.guest_agent.0
already removed
vdsm.log:libvirtEventLoop::INFO::2015-02-26
15:50:27,164::logUtils::44::dispatcher::(wrapper) Run and protect:
inappropriateDevices(thiefId='b75823d1-00f0-457e-a692-8b95f73907db')
vdsm.log:Thread-351517::DEBUG::2015-02-26
15:50:27,170::vm::5020::vm.Vm::(deleteVm)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Total desktops after
destroy of b75823d1-00f0-457e-a692-8b95f73907db is 2
vdsm.log:libvirtEventLoop::WARNING::2015-02-26
15:50:27,170::vm::1953::vm.Vm::(_set_lastStatus)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::trying to set state to Down
when already Down
ßsnip à
And I can see the error “Domain not found: no domain with matching uuid
'b75823d1-00f0-457e-a692-8b95f73907db'” in the middle, but I am not
entirely sure what this means.
In the qemu log I can see this
ßsnip à
*From QEMU LOG*
2015-02-26 14:42:26.859+0000: starting up
Domain id=8 is tainted: hook-script
2015-02-26T14:50:26.650595Z qemu-system-x86_64: load of migration
failed: Input/output error
2015-02-26 14:50:26.684+0000: shutting down
ßsnip à
the time here seems to be logged in GMT and in the snippets above it is
CET – might this be the problem ???
Here is the qemu commandline (that was removed above in cleaned
ßsnip à
/usr/bin/qemu-kvm
-name SERVERNAME
-S
-machine pc-1.0,accel=kvm,usb=off
-cpu SandyBridge
-m 10240
-realtime mlock=off
-smp 4,maxcpus=64,sockets=16,cores=4,threads=1
-uuid b75823d1-00f0-457e-a692-8b95f73907db
-smbios type=1,manufacturer=oVirt,product=oVirt
Node,version=20-3,serial=4C4C4544-0038-3210-8048-B7C04F303232,uuid=b75823d1-00f0-457e-a692-8b95f73907db
-no-user-config
-nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/al-exchange-01.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control
-rtc base=2015-02-26T15:42:26,driftfix=slew
-global kvm-pit.lost_tick_policy=discard
-no-hpet
-no-shutdown
-boot strict=on
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
-device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5
-drive
file=/rhev/data-center/mnt/XXXX:_data_export_iso/85821683-bc7c-429b-bff2-ce25f56fc16f/images/11111111-1111-1111-1111-111111111111/XXXX.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial=
-device
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2
-drive
file=/rhev/data-center/0f954891-b1cd-4f09-99ae-75d404d95f9d/276e9ba7-e19a-49c5-8ad7-26711934d5e4/images/c37bfa94-718c-4125-9202-bc299535eca5/299bd6d1-f4a0-4296-9944-176b2252a886,if=none,id=drive-virtio-disk0,format=raw,serial=c37bfa94-718c-4125-9202-bc299535eca5,cache=none,werror=stop,rerror=stop,aio=threads
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive
file=/rhev/data-center/0f954891-b1cd-4f09-99ae-75d404d95f9d/276e9ba7-e19a-49c5-8ad7-26711934d5e4/images/75134ccc-b74e-4955-90a5-95d4ceff403b/1e5b5b86-c31d-476c-acb2-a5bd6a65490b,if=none,id=drive-virtio-disk1,format=raw,serial=75134ccc-b74e-4955-90a5-95d4ceff403b,cache=none,werror=stop,rerror=stop,aio=threads
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
-netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31
-device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:c7:72:0a,bus=pci.0,addr=0x3
-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.com.redhat.rhevm.vdsm,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.org.qemu.guest_agent.0,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
-device usb-tablet,id=input0
-vnc IP-ADDRESS:2,password
-k de
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2
-incoming tcp:[::]:49152
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7
-msg timestamp=on
ßsnip à
Here is the content of the .meta file in the directories of the 2 images
above
<--snip-->
DOMAIN=276e9ba7-e19a-49c5-8ad7-26711934d5e4
VOLTYPE=LEAF
CTIME=1424445378
FORMAT=RAW
IMAGE=c37bfa94-718c-4125-9202-bc299535eca5
DISKTYPE=2
PUUID=00000000-0000-0000-0000-000000000000
LEGALITY=LEGAL
MTIME=0
POOL_UUID=
DESCRIPTION={"DiskAlias":"al-exchange-01_Disk1","DiskDescription":"System"}
TYPE=SPARSE
SIZE=146800640
EOF
<--snip-->
<--snip-->
DOMAIN=276e9ba7-e19a-49c5-8ad7-26711934d5e4
CTIME=1424785898
FORMAT=RAW
DISKTYPE=2
LEGALITY=LEGAL
SIZE=419430400
VOLTYPE=LEAF
DESCRIPTION={"DiskAlias":"al-exchange-01_Disk2","DiskDescription":"Data"}
IMAGE=75134ccc-b74e-4955-90a5-95d4ceff403b
PUUID=00000000-0000-0000-0000-000000000000
MTIME=0
POOL_UUID=
TYPE=SPARSE
EOF
<--snip-->
*From:*Roy Golan [mailto:rgolan@redhat.com]
*Sent:* Wednesday, February 18, 2015 12:12 PM
*To:* Soeren Malchow; users(a)ovirt.org
*Subject:* Re: [ovirt-users] Hosted Engine Migration fails
On 02/16/2015 04:55 AM, Soeren Malchow wrote:
Dear all,
we ahve a setup with several hosts running fedora 10 with the
virt-preview packages installed (for snapshot live merge) and a
hosted engine running centos 6.6-
We are experiencing a problem with the Live Migration of the Hosted
Engine, in the case of setting the host for the Engine into
maintenance as well as a manual migration.
I tried this on the “ovirtmgmt” network and when that failed I did
some research and tried to use another network interface (separate
from ovirtmgmt on layer 2), this also fails.
It looks as if the migration is still going through the ovirtmgmt
interface, at least judging from the network traffic, and I think
the error that I found (RHBZ#919201) is actually the right one.
--
vdsm.log:Thread-6745::WARNING::2015-02-16
03:22:25,743::migration::458::vm.Vm::(monitor_migration)
vmId=`XXX`::Migration stalling: remaining (35MiB) > lowmark (15MiB).
Refer to RHBZ#919201.
vdsm.log:Thread-6745::WARNING::2015-02-16
03:22:35,745::migration::458::vm.Vm::(monitor_migration)
vmId=`XXX`::Migration stalling: remaining (129MiB) > lowmark
(15MiB). Refer to RHBZ#919201.
vdsm.log:Thread-6745::WARNING::2015-02-16
03:22:45,747::migration::458::vm.Vm::(monitor_migration)
vmId=`XXX`::Migration stalling: remaining (42MiB) > lowmark (15MiB).
Refer to RHBZ#919201.
vdsm.log:Thread-6745::WARNING::2015-02-16
03:22:55,749::migration::458::vm.Vm::(monitor_migration)
vmId=`XXX`::Migration stalling: remaining (88MiB) > lowmark (15MiB).
Refer to RHBZ#919201.
--
The ovirtmgmt interface is 2 x 1Gbit (LACP connected to Dell
Switches with MLAG) and by far not fully utilized.
Can anyone help me where to go form here ?
still need the whole log. probably the guest (engine) is doing lots of
memory i/o if you got a fair amount of running VMs and Hosts. that will
stalls the migration because
the guest pages are getting dirty faster than qemu can copy.
you have 2 options:
1. try several more times.
2. shutdown the engine vm, it should start on another host
Regards
Soeren
Hi Soeren,
the key issue seems to be this one-
"
vdsm.log:Thread-49551::WARNING::2015-02-26
15:50:27,359::migration::445::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration is stuck: Hasn't
progressed in 150.069458961 seconds. Aborting.
"
Basically it means that the migration process is not converging since
moving the data from one host to the other is too slow.
The reason for this could be-
1. Extremely slow or busy network.
2. Something inside your VM is changing memory very fast (faster than
the copying rate).
In order to rule out (2), you can start an empty VM with nothing inside
but a minimal OS. If you're not sure you can use tiny Linux[1].
Such a minimal VM should have no problem with migrating from one machine
to the other. If it has an issue, it means that you have a problem in
your network that causes the migration process to take longer than it
should.
Give it a try and let us know how it goes.
Doron
[1]
http://distro.ibiblio.org/tinycorelinux/