[ovirt-users] Hosted Engine Migration fails

Doron Fediuck dfediuck at redhat.com
Thu Feb 26 16:45:49 UTC 2015



On 26/02/15 17:31, Soeren Malchow wrote:
> Hi,
> 
>  
> 
> we tried this (Roys Mail below) , and yes, shutdown always works.
> 
>  
> 
> But now this problem comes up with regular machines now as well.
> 
>  
> 
> The environment is setup like this
> 
>  
> 
> Engine: Ovirt 3.5.1.1-1.el6 – on CentOS6
> 
>  
> 
> The Storage Backend is gluster 3.6.2-1.el7 on CentOS 7
> 
>  
> 
> Compute hosts: Libvirt Version: libvirt-1.2.9.1-2.fc20, kvm 2.1.2 -
> 7.fc20, vdsm vdsm-4.16.10-8.gitc937927.fc20
> 
>  
> 
> All compute Servers are 100% identical.
> 
>  
> 
> The storage cluster was tested manually and works just fine.
> 
>  
> 
> The network interfaces are not fully utilized, more like 15%.
> 
>  
> 
> Log output ist as below. The thing in the log output I do not understand
> is this
> 
>  
> 
> “2015-02-26T14:50:26.650595Z qemu-system-x86_64: load of migration
> failed: Input/output error”
> 
>  
> 
> From the qemu log.
> 
>  
> 
> Also if I shut down machines, put their host into maintenance and start
> them somewhere else, everything works just fine.
> 
>  
> 
> Can someone help with this ? Any idea where to look ?
> 
>  
> 
> Regards
> 
> Soeren
> 
>  
> 
>  
> 
>  
> 
> *From VDSM Log*
> 
>  
> 
> I just tried to migrate a machine, this here happens on the source
> 
>  
> 
> ßsnip à
> 
>  
> 
> vdsm.log:Thread-49548::DEBUG::2015-02-26
> 15:42:26,692::__init__::469::jsonrpc.JsonRpcServer::(_serveRequest)
> Calling 'VM.migrate' in bridge with {u'params': {u'tunneled': u'false',
> u'dstqemu': u'172.19.2.31', u'src': u'compute04', u'dst':
> u'compute01:54321', u'vmId': u'b75823d1-00f0-457e-a692-8b95f73907db',
> u'abortOnError': u'true', u'method': u'online'}, u'vmID':
> u'b75823d1-00f0-457e-a692-8b95f73907db'}
> 
> vdsm.log:Thread-49548::DEBUG::2015-02-26
> 15:42:26,694::API::510::vds::(migrate) {u'tunneled': u'false',
> u'dstqemu': u'IPADDR', u'src': u'compute04', u'dst': u'compute01:54321',
> u'vmId': u'b75823d1-00f0-457e-a692-8b95f73907db', u'abortOnError':
> u'true', u'method': u'online'}
> 
> vdsm.log:Thread-49549::DEBUG::2015-02-26
> 15:42:26,699::migration::103::vm.Vm::(_setupVdsConnection)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Destination server is:
> compute01:54321
> 
> vdsm.log:Thread-49549::DEBUG::2015-02-26
> 15:42:26,702::migration::105::vm.Vm::(_setupVdsConnection)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Initiating connection with
> destination
> 
> vdsm.log:Thread-49549::DEBUG::2015-02-26
> 15:42:26,733::migration::155::vm.Vm::(_prepareGuest)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration started
> 
> vdsm.log:Thread-49549::DEBUG::2015-02-26
> 15:42:26,755::migration::238::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration semaphore
> acquired after 0 seconds
> 
> vdsm.log:Thread-49549::DEBUG::2015-02-26
> 15:42:27,211::migration::298::vm.Vm::(_startUnderlyingMigration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::starting migration to
> qemu+tls://compute01/system with miguri tcp://IPADDR
> 
> vdsm.log:Thread-49550::DEBUG::2015-02-26
> 15:42:27,213::migration::361::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration downtime thread
> started
> 
> vdsm.log:Thread-49551::DEBUG::2015-02-26
> 15:42:27,216::migration::410::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::starting migration monitor
> thread
> 
> vdsm.log:Thread-49550::DEBUG::2015-02-26
> 15:43:42,218::migration::370::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
> to 50
> 
> vdsm.log:Thread-49550::DEBUG::2015-02-26
> 15:44:57,222::migration::370::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
> to 100
> 
> vdsm.log:Thread-49550::DEBUG::2015-02-26
> 15:46:12,227::migration::370::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
> to 150
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:47:07,279::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (1791MiB) > lowmark (203MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:47:17,281::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (1398MiB) > lowmark (203MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49550::DEBUG::2015-02-26
> 15:47:27,233::migration::370::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
> to 200
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:47:27,283::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (1066MiB) > lowmark (203MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:47:37,285::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (701MiB) > lowmark (203MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:47:47,287::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (361MiB) > lowmark (203MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:48:07,291::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (683MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:48:17,292::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (350MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:48:27,294::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (18MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:48:37,296::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (646MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49550::DEBUG::2015-02-26
> 15:48:42,238::migration::370::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
> to 250
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:48:47,334::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (317MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:48:57,342::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (1018MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:49:07,344::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (679MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:49:17,346::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (357MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:49:27,348::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (31MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:49:37,349::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (854MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:49:47,351::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (525MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49550::DEBUG::2015-02-26
> 15:49:57,242::migration::370::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration downtime
> to 300
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:49:57,353::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (183MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:50:07,355::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (785MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:50:17,357::migration::458::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:
> remaining (457MiB) > lowmark (13MiB). Refer to RHBZ#919201.
> 
> vdsm.log:Thread-49551::WARNING::2015-02-26
> 15:50:27,359::migration::445::vm.Vm::(monitor_migration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration is stuck: Hasn't
> progressed in 150.069458961 seconds. Aborting.
> 
> vdsm.log:Thread-49551::DEBUG::2015-02-26
> 15:50:27,362::migration::470::vm.Vm::(stop)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::stopping migration monitor
> thread
> 
> vdsm.log:Thread-49549::DEBUG::2015-02-26
> 15:50:27,852::migration::376::vm.Vm::(cancel)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::canceling migration
> downtime thread
> 
> vdsm.log:Thread-49549::DEBUG::2015-02-26
> 15:50:27,852::migration::470::vm.Vm::(stop)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::stopping migration monitor
> thread
> 
> vdsm.log:Thread-49550::DEBUG::2015-02-26
> 15:50:27,853::migration::373::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration downtime thread
> exiting
> 
> vdsm.log:Thread-49549::ERROR::2015-02-26
> 15:50:27,855::migration::161::vm.Vm::(_recover)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::operation aborted:
> migration job: canceled by client
> 
> vdsm.log:Thread-49549::ERROR::2015-02-26
> 15:50:28,049::migration::260::vm.Vm::(run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Failed to migrate
> 
> vdsm.log:Thread-49690::DEBUG::2015-02-26
> 15:50:33,284::__init__::469::jsonrpc.JsonRpcServer::(_serveRequest)
> Calling 'VM.getMigrationStatus' in bridge with {u'vmID':
> u'b75823d1-00f0-457e-a692-8b95f73907db'}
> 
>  
> 
> ßsnip à
> 
>  
> 
>  
> 
> At the same time on the destination server
> 
>  
> 
> ßsnip à
> 
>  
> 
> vdsm.log:Thread-350501::DEBUG::2015-02-26
> 15:42:25,923::BindingXMLRPC::1133::vds::(wrapper) client
> [172.19.2.34]::call vmGetStats with
> ('b75823d1-00f0-457e-a692-8b95f73907db',) {}
> 
> vdsm.log:Thread-350502::DEBUG::2015-02-26
> 15:42:26,018::BindingXMLRPC::1133::vds::(wrapper) client
> [172.19.2.34]::call vmMigrationCreate with ALL THE MACHINE START
> PARAMETERS here
> 
> vdsm.log:Thread-350502::INFO::2015-02-26
> 15:42:26,019::clientIF::394::vds::(createVm) vmContainerLock acquired by
> vm b75823d1-00f0-457e-a692-8b95f73907db
> 
> vdsm.log:Thread-350502::DEBUG::2015-02-26
> 15:42:26,033::clientIF::407::vds::(createVm) Total desktops after
> creation of b75823d1-00f0-457e-a692-8b95f73907db is 3
> 
> vdsm.log:Thread-350503::DEBUG::2015-02-26
> 15:42:26,033::vm::2264::vm.Vm::(_startUnderlyingVm)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Start
> 
> vdsm.log:Thread-350502::DEBUG::2015-02-26
> 15:42:26,036::vm::5658::vm.Vm::(waitForMigrationDestinationPrepare)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration destination:
> waiting for VM creation
> 
> vdsm.log:Thread-350503::DEBUG::2015-02-26
> 15:42:26,036::vm::2268::vm.Vm::(_startUnderlyingVm)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::_ongoingCreations acquired
> 
> vdsm.log:Thread-350502::DEBUG::2015-02-26
> 15:42:26,038::vm::5663::vm.Vm::(waitForMigrationDestinationPrepare)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration destination:
> waiting 48s for path preparation
> 
> vdsm.log:Thread-350503::INFO::2015-02-26
> 15:42:26,038::vm::3261::vm.Vm::(_run)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::VM wrapper has started
> 
> vdsm.log:Thread-350503::WARNING::2015-02-26
> 15:42:26,041::vm::2056::vm.Vm::(buildConfDevices)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Unknown type found, device:
> '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address':
> {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}'
> found
> 
> vdsm.log:Thread-350503::WARNING::2015-02-26
> 15:42:26,041::vm::2056::vm.Vm::(buildConfDevices)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Unknown type found, device:
> '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address':
> {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}'
> found
> 
> vdsm.log:Thread-350503::DEBUG::2015-02-26
> 15:42:26,349::vm::1058::vm.Vm::(__init__)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Ignoring param (target,
> 10485760) in BalloonDevice
> 
> vdsm.log:Thread-350503::DEBUG::2015-02-26
> 15:42:26,350::vm::2294::vm.Vm::(_startUnderlyingVm)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::_ongoingCreations released
> 
> vdsm.log:Thread-350502::ERROR::2015-02-26
> 15:42:26,351::vm::5638::vm.Vm::(_updateDevicesDomxmlCache)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Alias not found for device
> type graphics during migration at destination host
> 
> vdsm.log:Thread-350503::DEBUG::2015-02-26
> 15:42:26,353::vm::4128::vm.Vm::(_waitForUnderlyingMigration)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Waiting 21600 seconds for
> end of migration
> 
> vdsm.log:Thread-350502::DEBUG::2015-02-26
> 15:42:26,377::BindingXMLRPC::1140::vds::(wrapper) return
> vmMigrationCreate with {ALL THE MACHINE PARAMETERS HERE
> 
> vdsm.log:libvirtEventLoop::DEBUG::2015-02-26
> 15:42:27,195::vm::5571::vm.Vm::(_onLibvirtLifecycleEvent)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::event Started detail 1
> opaque None
> 
> vdsm.log:libvirtEventLoop::DEBUG::2015-02-26
> 15:50:27,037::vm::5571::vm.Vm::(_onLibvirtLifecycleEvent)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::event Stopped detail 5
> opaque None
> 
> vdsm.log:libvirtEventLoop::INFO::2015-02-26
> 15:50:27,038::vm::2366::vm.Vm::(_onQemuDeath)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::underlying process disconnected
> 
> vdsm.log:libvirtEventLoop::INFO::2015-02-26
> 15:50:27,038::vm::4952::vm.Vm::(releaseVm)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Release VM resources
> 
> vdsm.log:Thread-350503::DEBUG::2015-02-26
> 15:50:27,044::libvirtconnection::143::root::(wrapper) Unknown
> libvirterror: ecode: 42 edom: 10 level: 2 message: Domain not found: no
> domain with matching uuid 'b75823d1-00f0-457e-a692-8b95f73907db'
> 
> vdsm.log:Thread-350503::ERROR::2015-02-26
> 15:50:27,047::vm::2325::vm.Vm::(_startUnderlyingVm)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Failed to start a migration
> destination vm
> 
> vdsm.log:MigrationError: Domain not found: no domain with matching uuid
> 'b75823d1-00f0-457e-a692-8b95f73907db'
> 
> vdsm.log:Thread-350503::DEBUG::2015-02-26
> 15:50:27,058::vm::2786::vm.Vm::(setDownStatus)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Changed state to Down: VM
> failed to migrate (code=8)
> 
> vdsm.log:Thread-351517::DEBUG::2015-02-26
> 15:50:27,089::BindingXMLRPC::1133::vds::(wrapper) client
> [172.19.2.34]::call vmDestroy with
> ('b75823d1-00f0-457e-a692-8b95f73907db',) {}
> 
> vdsm.log:Thread-351517::INFO::2015-02-26
> 15:50:27,092::API::332::vds::(destroy) vmContainerLock acquired by vm
> b75823d1-00f0-457e-a692-8b95f73907db
> 
> vdsm.log:Thread-351517::DEBUG::2015-02-26
> 15:50:27,154::vm::5026::vm.Vm::(destroy)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::destroy Called
> 
> vdsm.log:libvirtEventLoop::WARNING::2015-02-26
> 15:50:27,162::utils::129::root::(rmFile) File:
> /var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.com.redhat.rhevm.vdsm
> already removed
> 
> vdsm.log:libvirtEventLoop::WARNING::2015-02-26
> 15:50:27,163::utils::129::root::(rmFile) File:
> /var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.org.qemu.guest_agent.0
> already removed
> 
> vdsm.log:libvirtEventLoop::INFO::2015-02-26
> 15:50:27,164::logUtils::44::dispatcher::(wrapper) Run and protect:
> inappropriateDevices(thiefId='b75823d1-00f0-457e-a692-8b95f73907db')
> 
> vdsm.log:Thread-351517::DEBUG::2015-02-26
> 15:50:27,170::vm::5020::vm.Vm::(deleteVm)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Total desktops after
> destroy of b75823d1-00f0-457e-a692-8b95f73907db is 2
> 
> vdsm.log:libvirtEventLoop::WARNING::2015-02-26
> 15:50:27,170::vm::1953::vm.Vm::(_set_lastStatus)
> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::trying to set state to Down
> when already Down
> 
>  
> 
> ßsnip à
> 
>  
> 
> And I can see the error “Domain not found: no domain with matching uuid
> 'b75823d1-00f0-457e-a692-8b95f73907db'” in the middle, but I am not
> entirely sure what this means.
> 
>  
> 
> In the qemu log I can see this
> 
>  
> 
> ßsnip à
> 
>  
> 
>  
> 
> *From QEMU LOG*
> 
>  
> 
> 2015-02-26 14:42:26.859+0000: starting up
> 
> Domain id=8 is tainted: hook-script
> 
> 2015-02-26T14:50:26.650595Z qemu-system-x86_64: load of migration
> failed: Input/output error
> 
> 2015-02-26 14:50:26.684+0000: shutting down
> 
>  
> 
> ßsnip à
> 
>  
> 
> the time here seems to be logged in GMT and in the snippets above it is
> CET – might this be the problem ???
> 
>  
> 
> Here is the qemu commandline (that was removed above in cleaned
> 
>  
> 
> ßsnip à
> 
> /usr/bin/qemu-kvm
> 
> -name SERVERNAME
> 
> -S
> 
> -machine pc-1.0,accel=kvm,usb=off
> 
> -cpu SandyBridge
> 
> -m 10240
> 
> -realtime mlock=off
> 
> -smp 4,maxcpus=64,sockets=16,cores=4,threads=1
> 
> -uuid b75823d1-00f0-457e-a692-8b95f73907db
> 
> -smbios type=1,manufacturer=oVirt,product=oVirt
> Node,version=20-3,serial=4C4C4544-0038-3210-8048-B7C04F303232,uuid=b75823d1-00f0-457e-a692-8b95f73907db
> 
> 
> -no-user-config
> 
> -nodefaults
> 
> -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/al-exchange-01.monitor,server,nowait
> 
> 
> -mon chardev=charmonitor,id=monitor,mode=control
> 
> -rtc base=2015-02-26T15:42:26,driftfix=slew
> 
> -global kvm-pit.lost_tick_policy=discard
> 
> -no-hpet
> 
> -no-shutdown
> 
> -boot strict=on
> 
> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
> 
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
> 
> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5
> 
> -drive
> file=/rhev/data-center/mnt/XXXX:_data_export_iso/85821683-bc7c-429b-bff2-ce25f56fc16f/images/11111111-1111-1111-1111-111111111111/XXXX.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial=
> -device
> ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2
> 
> -drive
> file=/rhev/data-center/0f954891-b1cd-4f09-99ae-75d404d95f9d/276e9ba7-e19a-49c5-8ad7-26711934d5e4/images/c37bfa94-718c-4125-9202-bc299535eca5/299bd6d1-f4a0-4296-9944-176b2252a886,if=none,id=drive-virtio-disk0,format=raw,serial=c37bfa94-718c-4125-9202-bc299535eca5,cache=none,werror=stop,rerror=stop,aio=threads
> 
> 
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> 
> 
> -drive
> file=/rhev/data-center/0f954891-b1cd-4f09-99ae-75d404d95f9d/276e9ba7-e19a-49c5-8ad7-26711934d5e4/images/75134ccc-b74e-4955-90a5-95d4ceff403b/1e5b5b86-c31d-476c-acb2-a5bd6a65490b,if=none,id=drive-virtio-disk1,format=raw,serial=75134ccc-b74e-4955-90a5-95d4ceff403b,cache=none,werror=stop,rerror=stop,aio=threads
> 
> 
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
> 
> 
> -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31
> 
> -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:c7:72:0a,bus=pci.0,addr=0x3
> 
> 
> -chardev
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.com.redhat.rhevm.vdsm,server,nowait
> 
> 
> -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm
> 
> 
> -chardev
> socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.org.qemu.guest_agent.0,server,nowait
> 
> 
> -device
> virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
> 
> 
> -device usb-tablet,id=input0
> 
> -vnc IP-ADDRESS:2,password
> 
> -k de
> 
> -device cirrus-vga,id=video0,bus=pci.0,addr=0x2
> 
> -incoming tcp:[::]:49152
> 
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7
> 
> -msg timestamp=on
> 
> ßsnip à
> 
>  
> 
> Here is the content of the .meta file in the directories of the 2 images
> above
> 
>  
> 
> <--snip-->
> 
> DOMAIN=276e9ba7-e19a-49c5-8ad7-26711934d5e4
> 
> VOLTYPE=LEAF
> 
> CTIME=1424445378
> 
> FORMAT=RAW
> 
> IMAGE=c37bfa94-718c-4125-9202-bc299535eca5
> 
> DISKTYPE=2
> 
> PUUID=00000000-0000-0000-0000-000000000000
> 
> LEGALITY=LEGAL
> 
> MTIME=0
> 
> POOL_UUID=
> 
> DESCRIPTION={"DiskAlias":"al-exchange-01_Disk1","DiskDescription":"System"}
> 
> TYPE=SPARSE
> 
> SIZE=146800640
> 
> EOF
> 
> <--snip-->
> 
>  
> 
> <--snip-->
> 
> DOMAIN=276e9ba7-e19a-49c5-8ad7-26711934d5e4
> 
> CTIME=1424785898
> 
> FORMAT=RAW
> 
> DISKTYPE=2
> 
> LEGALITY=LEGAL
> 
> SIZE=419430400
> 
> VOLTYPE=LEAF
> 
> DESCRIPTION={"DiskAlias":"al-exchange-01_Disk2","DiskDescription":"Data"}
> 
> IMAGE=75134ccc-b74e-4955-90a5-95d4ceff403b
> 
> PUUID=00000000-0000-0000-0000-000000000000
> 
> MTIME=0
> 
> POOL_UUID=
> 
> TYPE=SPARSE
> 
> EOF
> 
> <--snip-->
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> *From:*Roy Golan [mailto:rgolan at redhat.com]
> *Sent:* Wednesday, February 18, 2015 12:12 PM
> *To:* Soeren Malchow; users at ovirt.org
> *Subject:* Re: [ovirt-users] Hosted Engine Migration fails
> 
>  
> 
> On 02/16/2015 04:55 AM, Soeren Malchow wrote:
> 
>     Dear all,
> 
>      
> 
>     we ahve a setup with several hosts running fedora 10 with the
>     virt-preview packages installed (for snapshot live merge) and a
>     hosted engine running centos 6.6-
> 
>      
> 
>     We are experiencing a problem with the Live Migration of the Hosted
>     Engine, in the case of setting the host for the Engine into
>     maintenance as well as a manual migration.
> 
>     I tried this on the “ovirtmgmt” network and when that failed I did
>     some research and tried to use another network interface (separate
>     from ovirtmgmt on layer 2), this also fails.
> 
>      
> 
>     It looks as if the migration is still going through the ovirtmgmt
>     interface, at least judging from the network traffic, and I think
>     the error that I found (RHBZ#919201) is actually the right one.
> 
>      --
> 
>     vdsm.log:Thread-6745::WARNING::2015-02-16
>     03:22:25,743::migration::458::vm.Vm::(monitor_migration)
>     vmId=`XXX`::Migration stalling: remaining (35MiB) > lowmark (15MiB).
>     Refer to RHBZ#919201.
> 
>     vdsm.log:Thread-6745::WARNING::2015-02-16
>     03:22:35,745::migration::458::vm.Vm::(monitor_migration)
>     vmId=`XXX`::Migration stalling: remaining (129MiB) > lowmark
>     (15MiB). Refer to RHBZ#919201.
> 
>     vdsm.log:Thread-6745::WARNING::2015-02-16
>     03:22:45,747::migration::458::vm.Vm::(monitor_migration)
>     vmId=`XXX`::Migration stalling: remaining (42MiB) > lowmark (15MiB).
>     Refer to RHBZ#919201.
> 
>     vdsm.log:Thread-6745::WARNING::2015-02-16
>     03:22:55,749::migration::458::vm.Vm::(monitor_migration)
>     vmId=`XXX`::Migration stalling: remaining (88MiB) > lowmark (15MiB).
>     Refer to RHBZ#919201.
> 
>     --
> 
>     The ovirtmgmt interface is 2 x 1Gbit (LACP connected to Dell
>     Switches with MLAG) and by far not fully utilized.
> 
>     Can anyone help me where to go form here ?
> 
> 
> still need the whole log. probably the guest (engine) is doing lots of
> memory i/o if you got a fair amount of running VMs and Hosts.  that will
> stalls the migration because
> the guest pages are getting dirty faster than qemu can copy.
> 
> 
> you have 2 options:
> 
> 1. try several more times.
> 
> 2. shutdown the engine vm, it should start on another host
> 
> 
>     Regards
> 
>     Soeren
> 
> 

Hi Soeren,

the key issue seems to be this one-

"
vdsm.log:Thread-49551::WARNING::2015-02-26
15:50:27,359::migration::445::vm.Vm::(monitor_migration)
vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration is stuck: Hasn't
progressed in 150.069458961 seconds. Aborting.
"

Basically it means that the migration process is not converging since
moving the data from one host to the other is too slow.
The reason for this could be-
1. Extremely slow or busy network.
2. Something inside your VM is changing memory very fast (faster than
the copying rate).

In order to rule out (2), you can start an empty VM with nothing inside
but a minimal OS. If you're not sure you can use tiny Linux[1].
Such a minimal VM should have no problem with migrating from one machine
to the other. If it has an issue, it means that you have a problem in
your network that causes the migration process to take longer than it
should.

Give it a try and let us know how it goes.
Doron

[1] http://distro.ibiblio.org/tinycorelinux/








More information about the Users mailing list