[ovirt-users] Hosted Engine Migration fails

Soeren Malchow soeren.malchow at mcon.net
Thu Feb 26 16:23:47 EST 2015


Dear Doron,



thanks for the answer, i rechecked again:



Network on Source host BEFORE migration



[cid:image002.jpg at 01D05212.E14A0D40]



Network on target host before migration



[cid:image004.jpg at 01D05212.E14A0D40]





The same hosts during migration look like this



Source



[cid:image008.jpg at 01D05212.E14A0D40]



Target



[cid:image010.jpg at 01D05212.E14A0D40]





The migration is not even using full network speed. It is using only about 25 - 30% of the available bandwidth.



I tried with one of our test machines, and I had one that was working (a CentOS machine doing nothing) however, the other machine that did not work was a Windows machine but it was also doing nothing.



Here is a screenshot of an SCP copy between the 2 machines - source -> target

[cid:image015.png at 01D05212.E14A0D40]



One thing I can see however is, that at somepoint the network traffic drops back to a few KB/s even though the migration is not finished and shortly after that the migration stops unsuccessfully.



The ovirtmgmt network hosts only the engine and is used for migration, the storage and VM networks are completely separated. The environment is not in production yet.



Regards

Soeren





-----Original Message-----
From: Doron Fediuck [mailto:dfediuck at redhat.com]
Sent: Thursday, February 26, 2015 5:46 PM
To: Soeren Malchow; Roy Golan; users at ovirt.org
Cc: Dan Kenigsberg; Omer Frenkel
Subject: Re: [ovirt-users] Hosted Engine Migration fails







On 26/02/15 17:31, Soeren Malchow wrote:

> Hi,

>

>

>

> we tried this (Roys Mail below) , and yes, shutdown always works.

>

>

>

> But now this problem comes up with regular machines now as well.

>

>

>

> The environment is setup like this

>

>

>

> Engine: Ovirt 3.5.1.1-1.el6 - on CentOS6

>

>

>

> The Storage Backend is gluster 3.6.2-1.el7 on CentOS 7

>

>

>

> Compute hosts: Libvirt Version: libvirt-1.2.9.1-2.fc20, kvm 2.1.2 -

> 7.fc20, vdsm vdsm-4.16.10-8.gitc937927.fc20

>

>

>

> All compute Servers are 100% identical.

>

>

>

> The storage cluster was tested manually and works just fine.

>

>

>

> The network interfaces are not fully utilized, more like 15%.

>

>

>

> Log output ist as below. The thing in the log output I do not

> understand is this

>

>

>

> "2015-02-26T14:50:26.650595Z qemu-system-x86_64: load of migration

> failed: Input/output error"

>

>

>

> From the qemu log.

>

>

>

> Also if I shut down machines, put their host into maintenance and

> start them somewhere else, everything works just fine.

>

>

>

> Can someone help with this ? Any idea where to look ?

>

>

>

> Regards

>

> Soeren

>

>

>

>

>

>

>

> *From VDSM Log*

>

>

>

> I just tried to migrate a machine, this here happens on the source

>

>

>

> ßsnip à

>

>

>

> vdsm.log:Thread-49548::DEBUG::2015-02-26

> 15:42:26,692::__init__::469::jsonrpc.JsonRpcServer::(_serveRequest)

> Calling 'VM.migrate' in bridge with {u'params': {u'tunneled':

> u'false',

> u'dstqemu': u'172.19.2.31', u'src': u'compute04', u'dst':

> u'compute01:54321', u'vmId': u'b75823d1-00f0-457e-a692-8b95f73907db',

> u'abortOnError': u'true', u'method': u'online'}, u'vmID':

> u'b75823d1-00f0-457e-a692-8b95f73907db'}

>

> vdsm.log:Thread-49548::DEBUG::2015-02-26

> 15:42:26,694::API::510::vds::(migrate) {u'tunneled': u'false',

> u'dstqemu': u'IPADDR', u'src': u'compute04', u'dst':

> u'compute01:54321',

> u'vmId': u'b75823d1-00f0-457e-a692-8b95f73907db', u'abortOnError':

> u'true', u'method': u'online'}

>

> vdsm.log:Thread-49549::DEBUG::2015-02-26

> 15:42:26,699::migration::103::vm.Vm::(_setupVdsConnection)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Destination server is:

> compute01:54321

>

> vdsm.log:Thread-49549::DEBUG::2015-02-26

> 15:42:26,702::migration::105::vm.Vm::(_setupVdsConnection)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Initiating connection

> with destination

>

> vdsm.log:Thread-49549::DEBUG::2015-02-26

> 15:42:26,733::migration::155::vm.Vm::(_prepareGuest)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration started

>

> vdsm.log:Thread-49549::DEBUG::2015-02-26

> 15:42:26,755::migration::238::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration semaphore

> acquired after 0 seconds

>

> vdsm.log:Thread-49549::DEBUG::2015-02-26

> 15:42:27,211::migration::298::vm.Vm::(_startUnderlyingMigration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::starting migration to

> qemu+tls://compute01/system with miguri tcp://IPADDR

>

> vdsm.log:Thread-49550::DEBUG::2015-02-26

> 15:42:27,213::migration::361::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration downtime thread

> started

>

> vdsm.log:Thread-49551::DEBUG::2015-02-26

> 15:42:27,216::migration::410::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::starting migration

> monitor thread

>

> vdsm.log:Thread-49550::DEBUG::2015-02-26

> 15:43:42,218::migration::370::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration

> downtime to 50

>

> vdsm.log:Thread-49550::DEBUG::2015-02-26

> 15:44:57,222::migration::370::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration

> downtime to 100

>

> vdsm.log:Thread-49550::DEBUG::2015-02-26

> 15:46:12,227::migration::370::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration

> downtime to 150

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:47:07,279::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (1791MiB) > lowmark (203MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:47:17,281::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (1398MiB) > lowmark (203MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49550::DEBUG::2015-02-26

> 15:47:27,233::migration::370::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration

> downtime to 200

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:47:27,283::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (1066MiB) > lowmark (203MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:47:37,285::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (701MiB) > lowmark (203MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:47:47,287::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (361MiB) > lowmark (203MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:48:07,291::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (683MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:48:17,292::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (350MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:48:27,294::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (18MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:48:37,296::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (646MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49550::DEBUG::2015-02-26

> 15:48:42,238::migration::370::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration

> downtime to 250

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:48:47,334::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (317MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:48:57,342::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (1018MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:49:07,344::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (679MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:49:17,346::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (357MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:49:27,348::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (31MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:49:37,349::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (854MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:49:47,351::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (525MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49550::DEBUG::2015-02-26

> 15:49:57,242::migration::370::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::setting migration

> downtime to 300

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:49:57,353::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (183MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:50:07,355::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (785MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:50:17,357::migration::458::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration stalling:

> remaining (457MiB) > lowmark (13MiB). Refer to RHBZ#919201.

>

> vdsm.log:Thread-49551::WARNING::2015-02-26

> 15:50:27,359::migration::445::vm.Vm::(monitor_migration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration is stuck:

> Hasn't progressed in 150.069458961 seconds. Aborting.

>

> vdsm.log:Thread-49551::DEBUG::2015-02-26

> 15:50:27,362::migration::470::vm.Vm::(stop)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::stopping migration

> monitor thread

>

> vdsm.log:Thread-49549::DEBUG::2015-02-26

> 15:50:27,852::migration::376::vm.Vm::(cancel)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::canceling migration

> downtime thread

>

> vdsm.log:Thread-49549::DEBUG::2015-02-26

> 15:50:27,852::migration::470::vm.Vm::(stop)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::stopping migration

> monitor thread

>

> vdsm.log:Thread-49550::DEBUG::2015-02-26

> 15:50:27,853::migration::373::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration downtime thread

> exiting

>

> vdsm.log:Thread-49549::ERROR::2015-02-26

> 15:50:27,855::migration::161::vm.Vm::(_recover)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::operation aborted:

> migration job: canceled by client

>

> vdsm.log:Thread-49549::ERROR::2015-02-26

> 15:50:28,049::migration::260::vm.Vm::(run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Failed to migrate

>

> vdsm.log:Thread-49690::DEBUG::2015-02-26

> 15:50:33,284::__init__::469::jsonrpc.JsonRpcServer::(_serveRequest)

> Calling 'VM.getMigrationStatus' in bridge with {u'vmID':

> u'b75823d1-00f0-457e-a692-8b95f73907db'}

>

>

>

> ßsnip à

>

>

>

>

>

> At the same time on the destination server

>

>

>

> ßsnip à

>

>

>

> vdsm.log:Thread-350501::DEBUG::2015-02-26

> 15:42:25,923::BindingXMLRPC::1133::vds::(wrapper) client

> [172.19.2.34]::call vmGetStats with

> ('b75823d1-00f0-457e-a692-8b95f73907db',) {}

>

> vdsm.log:Thread-350502::DEBUG::2015-02-26

> 15:42:26,018::BindingXMLRPC::1133::vds::(wrapper) client

> [172.19.2.34]::call vmMigrationCreate with ALL THE MACHINE START

> PARAMETERS here

>

> vdsm.log:Thread-350502::INFO::2015-02-26

> 15:42:26,019::clientIF::394::vds::(createVm) vmContainerLock acquired

> by vm b75823d1-00f0-457e-a692-8b95f73907db

>

> vdsm.log:Thread-350502::DEBUG::2015-02-26

> 15:42:26,033::clientIF::407::vds::(createVm) Total desktops after

> creation of b75823d1-00f0-457e-a692-8b95f73907db is 3

>

> vdsm.log:Thread-350503::DEBUG::2015-02-26

> 15:42:26,033::vm::2264::vm.Vm::(_startUnderlyingVm)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Start

>

> vdsm.log:Thread-350502::DEBUG::2015-02-26

> 15:42:26,036::vm::5658::vm.Vm::(waitForMigrationDestinationPrepare)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration destination:

> waiting for VM creation

>

> vdsm.log:Thread-350503::DEBUG::2015-02-26

> 15:42:26,036::vm::2268::vm.Vm::(_startUnderlyingVm)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::_ongoingCreations

> acquired

>

> vdsm.log:Thread-350502::DEBUG::2015-02-26

> 15:42:26,038::vm::5663::vm.Vm::(waitForMigrationDestinationPrepare)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::migration destination:

> waiting 48s for path preparation

>

> vdsm.log:Thread-350503::INFO::2015-02-26

> 15:42:26,038::vm::3261::vm.Vm::(_run)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::VM wrapper has started

>

> vdsm.log:Thread-350503::WARNING::2015-02-26

> 15:42:26,041::vm::2056::vm.Vm::(buildConfDevices)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Unknown type found, device:

> '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address':

> {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}'

> found

>

> vdsm.log:Thread-350503::WARNING::2015-02-26

> 15:42:26,041::vm::2056::vm.Vm::(buildConfDevices)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Unknown type found, device:

> '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address':

> {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}'

> found

>

> vdsm.log:Thread-350503::DEBUG::2015-02-26

> 15:42:26,349::vm::1058::vm.Vm::(__init__)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Ignoring param (target,

> 10485760) in BalloonDevice

>

> vdsm.log:Thread-350503::DEBUG::2015-02-26

> 15:42:26,350::vm::2294::vm.Vm::(_startUnderlyingVm)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::_ongoingCreations

> released

>

> vdsm.log:Thread-350502::ERROR::2015-02-26

> 15:42:26,351::vm::5638::vm.Vm::(_updateDevicesDomxmlCache)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Alias not found for

> device type graphics during migration at destination host

>

> vdsm.log:Thread-350503::DEBUG::2015-02-26

> 15:42:26,353::vm::4128::vm.Vm::(_waitForUnderlyingMigration)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Waiting 21600 seconds for

> end of migration

>

> vdsm.log:Thread-350502::DEBUG::2015-02-26

> 15:42:26,377::BindingXMLRPC::1140::vds::(wrapper) return

> vmMigrationCreate with {ALL THE MACHINE PARAMETERS HERE

>

> vdsm.log:libvirtEventLoop::DEBUG::2015-02-26

> 15:42:27,195::vm::5571::vm.Vm::(_onLibvirtLifecycleEvent)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::event Started detail 1

> opaque None

>

> vdsm.log:libvirtEventLoop::DEBUG::2015-02-26

> 15:50:27,037::vm::5571::vm.Vm::(_onLibvirtLifecycleEvent)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::event Stopped detail 5

> opaque None

>

> vdsm.log:libvirtEventLoop::INFO::2015-02-26

> 15:50:27,038::vm::2366::vm.Vm::(_onQemuDeath)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::underlying process

> disconnected

>

> vdsm.log:libvirtEventLoop::INFO::2015-02-26

> 15:50:27,038::vm::4952::vm.Vm::(releaseVm)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Release VM resources

>

> vdsm.log:Thread-350503::DEBUG::2015-02-26

> 15:50:27,044::libvirtconnection::143::root::(wrapper) Unknown

> libvirterror: ecode: 42 edom: 10 level: 2 message: Domain not found:

> no domain with matching uuid 'b75823d1-00f0-457e-a692-8b95f73907db'

>

> vdsm.log:Thread-350503::ERROR::2015-02-26

> 15:50:27,047::vm::2325::vm.Vm::(_startUnderlyingVm)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Failed to start a

> migration destination vm

>

> vdsm.log:MigrationError: Domain not found: no domain with matching

> uuid 'b75823d1-00f0-457e-a692-8b95f73907db'

>

> vdsm.log:Thread-350503::DEBUG::2015-02-26

> 15:50:27,058::vm::2786::vm.Vm::(setDownStatus)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Changed state to Down: VM

> failed to migrate (code=8)

>

> vdsm.log:Thread-351517::DEBUG::2015-02-26

> 15:50:27,089::BindingXMLRPC::1133::vds::(wrapper) client

> [172.19.2.34]::call vmDestroy with

> ('b75823d1-00f0-457e-a692-8b95f73907db',) {}

>

> vdsm.log:Thread-351517::INFO::2015-02-26

> 15:50:27,092::API::332::vds::(destroy) vmContainerLock acquired by vm

> b75823d1-00f0-457e-a692-8b95f73907db

>

> vdsm.log:Thread-351517::DEBUG::2015-02-26

> 15:50:27,154::vm::5026::vm.Vm::(destroy)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::destroy Called

>

> vdsm.log:libvirtEventLoop::WARNING::2015-02-26

> 15:50:27,162::utils::129::root::(rmFile) File:

> /var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.co

> m.redhat.rhevm.vdsm

> already removed

>

> vdsm.log:libvirtEventLoop::WARNING::2015-02-26

> 15:50:27,163::utils::129::root::(rmFile) File:

> /var/lib/libvirt/qemu/channels/b75823d1-00f0-457e-a692-8b95f73907db.or

> g.qemu.guest_agent.0

> already removed

>

> vdsm.log:libvirtEventLoop::INFO::2015-02-26

> 15:50:27,164::logUtils::44::dispatcher::(wrapper) Run and protect:

> inappropriateDevices(thiefId='b75823d1-00f0-457e-a692-8b95f73907db')

>

> vdsm.log:Thread-351517::DEBUG::2015-02-26

> 15:50:27,170::vm::5020::vm.Vm::(deleteVm)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Total desktops after

> destroy of b75823d1-00f0-457e-a692-8b95f73907db is 2

>

> vdsm.log:libvirtEventLoop::WARNING::2015-02-26

> 15:50:27,170::vm::1953::vm.Vm::(_set_lastStatus)

> vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::trying to set state to

> Down when already Down

>

>

>

> ßsnip à

>

>

>

> And I can see the error "Domain not found: no domain with matching

> uuid 'b75823d1-00f0-457e-a692-8b95f73907db'" in the middle, but I am

> not entirely sure what this means.

>

>

>

> In the qemu log I can see this

>

>

>

> ßsnip à

>

>

>

>

>

> *From QEMU LOG*

>

>

>

> 2015-02-26 14:42:26.859+0000: starting up

>

> Domain id=8 is tainted: hook-script

>

> 2015-02-26T14:50:26.650595Z qemu-system-x86_64: load of migration

> failed: Input/output error

>

> 2015-02-26 14:50:26.684+0000: shutting down

>

>

>

> ßsnip à

>

>

>

> the time here seems to be logged in GMT and in the snippets above it

> is CET - might this be the problem ???

>

>

>

> Here is the qemu commandline (that was removed above in cleaned

>

>

>

> ßsnip à

>

> /usr/bin/qemu-kvm

>

> -name SERVERNAME

>

> -S

>

> -machine pc-1.0,accel=kvm,usb=off

>

> -cpu SandyBridge

>

> -m 10240

>

> -realtime mlock=off

>

> -smp 4,maxcpus=64,sockets=16,cores=4,threads=1

>

> -uuid b75823d1-00f0-457e-a692-8b95f73907db

>

> -smbios type=1,manufacturer=oVirt,product=oVirt

> Node,version=20-3,serial=4C4C4544-0038-3210-8048-B7C04F303232,uuid=b75

> 823d1-00f0-457e-a692-8b95f73907db

>

>

> -no-user-config

>

> -nodefaults

>

> -chardev

> socket,id=charmonitor,path=/var/lib/libvirt/qemu/al-exchange-01.monito

> r,server,nowait

>

>

> -mon chardev=charmonitor,id=monitor,mode=control

>

> -rtc base=2015-02-26T15:42:26,driftfix=slew

>

> -global kvm-pit.lost_tick_policy=discard

>

> -no-hpet

>

> -no-shutdown

>

> -boot strict=on

>

> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2

>

> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4

>

> -device

> virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5

>

> -drive

> file=/rhev/data-center/mnt/XXXX:_data_export_iso/85821683-bc7c-429b-bf

> f2-ce25f56fc16f/images/11111111-1111-1111-1111-111111111111/XXXX.iso,i

> f=none,id=drive-ide0-1-0,readonly=on,format=raw,serial=

> -device

> ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2

>

> -drive

> file=/rhev/data-center/0f954891-b1cd-4f09-99ae-75d404d95f9d/276e9ba7-e

> 19a-49c5-8ad7-26711934d5e4/images/c37bfa94-718c-4125-9202-bc299535eca5

> /299bd6d1-f4a0-4296-9944-176b2252a886,if=none,id=drive-virtio-disk0,fo

> rmat=raw,serial=c37bfa94-718c-4125-9202-bc299535eca5,cache=none,werror

> =stop,rerror=stop,aio=threads

>

>

> -device

> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id

> =virtio-disk0,bootindex=1

>

>

> -drive

> file=/rhev/data-center/0f954891-b1cd-4f09-99ae-75d404d95f9d/276e9ba7-e

> 19a-49c5-8ad7-26711934d5e4/images/75134ccc-b74e-4955-90a5-95d4ceff403b

> /1e5b5b86-c31d-476c-acb2-a5bd6a65490b,if=none,id=drive-virtio-disk1,fo

> rmat=raw,serial=75134ccc-b74e-4955-90a5-95d4ceff403b,cache=none,werror

> =stop,rerror=stop,aio=threads

>

>

> -device

> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id

> =virtio-disk1

>

>

> -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31

>

> -device

> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:c7:72:0a,bus=pci.0

> ,addr=0x3

>

>

> -chardev

> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/b75823d1-00

> f0-457e-a692-8b95f73907db.com.redhat.rhevm.vdsm,server,nowait

>

>

> -device

> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=chann

> el0,name=com.redhat.rhevm.vdsm

>

>

> -chardev

> socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/b75823d1-00

> f0-457e-a692-8b95f73907db.org.qemu.guest_agent.0,server,nowait

>

>

> -device

> virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=chann

> el1,name=org.qemu.guest_agent.0

>

>

> -device usb-tablet,id=input0

>

> -vnc IP-ADDRESS:2,password

>

> -k de

>

> -device cirrus-vga,id=video0,bus=pci.0,addr=0x2

>

> -incoming tcp:[::]:49152

>

> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7

>

> -msg timestamp=on

>

> ßsnip à

>

>

>

> Here is the content of the .meta file in the directories of the 2

> images above

>

>

>

> <--snip-->

>

> DOMAIN=276e9ba7-e19a-49c5-8ad7-26711934d5e4

>

> VOLTYPE=LEAF

>

> CTIME=1424445378

>

> FORMAT=RAW

>

> IMAGE=c37bfa94-718c-4125-9202-bc299535eca5

>

> DISKTYPE=2

>

> PUUID=00000000-0000-0000-0000-000000000000

>

> LEGALITY=LEGAL

>

> MTIME=0

>

> POOL_UUID=

>

> DESCRIPTION={"DiskAlias":"al-exchange-01_Disk1","DiskDescription":"Sys

> tem"}

>

> TYPE=SPARSE

>

> SIZE=146800640

>

> EOF

>

> <--snip-->

>

>

>

> <--snip-->

>

> DOMAIN=276e9ba7-e19a-49c5-8ad7-26711934d5e4

>

> CTIME=1424785898

>

> FORMAT=RAW

>

> DISKTYPE=2

>

> LEGALITY=LEGAL

>

> SIZE=419430400

>

> VOLTYPE=LEAF

>

> DESCRIPTION={"DiskAlias":"al-exchange-01_Disk2","DiskDescription":"Dat

> a"}

>

> IMAGE=75134ccc-b74e-4955-90a5-95d4ceff403b

>

> PUUID=00000000-0000-0000-0000-000000000000

>

> MTIME=0

>

> POOL_UUID=

>

> TYPE=SPARSE

>

> EOF

>

> <--snip-->

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

> *From:*Roy Golan [mailto:rgolan at redhat.com]

> *Sent:* Wednesday, February 18, 2015 12:12 PM

> *To:* Soeren Malchow; users at ovirt.org<mailto:users at ovirt.org>

> *Subject:* Re: [ovirt-users] Hosted Engine Migration fails

>

>

>

> On 02/16/2015 04:55 AM, Soeren Malchow wrote:

>

>     Dear all,

>

>

>

>     we ahve a setup with several hosts running fedora 10 with the

>     virt-preview packages installed (for snapshot live merge) and a

>     hosted engine running centos 6.6-

>

>

>

>     We are experiencing a problem with the Live Migration of the Hosted

>     Engine, in the case of setting the host for the Engine into

>     maintenance as well as a manual migration.

>

>     I tried this on the "ovirtmgmt" network and when that failed I did

>     some research and tried to use another network interface (separate

>     from ovirtmgmt on layer 2), this also fails.

>

>

>

>     It looks as if the migration is still going through the ovirtmgmt

>     interface, at least judging from the network traffic, and I think

>     the error that I found (RHBZ#919201) is actually the right one.

>

>      --

>

>     vdsm.log:Thread-6745::WARNING::2015-02-16

>     03:22:25,743::migration::458::vm.Vm::(monitor_migration)

>     vmId=`XXX`::Migration stalling: remaining (35MiB) > lowmark (15MiB).

>     Refer to RHBZ#919201.

>

>     vdsm.log:Thread-6745::WARNING::2015-02-16

>     03:22:35,745::migration::458::vm.Vm::(monitor_migration)

>     vmId=`XXX`::Migration stalling: remaining (129MiB) > lowmark

>     (15MiB). Refer to RHBZ#919201.

>

>     vdsm.log:Thread-6745::WARNING::2015-02-16

>     03:22:45,747::migration::458::vm.Vm::(monitor_migration)

>     vmId=`XXX`::Migration stalling: remaining (42MiB) > lowmark (15MiB).

>     Refer to RHBZ#919201.

>

>     vdsm.log:Thread-6745::WARNING::2015-02-16

>     03:22:55,749::migration::458::vm.Vm::(monitor_migration)

>     vmId=`XXX`::Migration stalling: remaining (88MiB) > lowmark (15MiB).

>     Refer to RHBZ#919201.

>

>     --

>

>     The ovirtmgmt interface is 2 x 1Gbit (LACP connected to Dell

>     Switches with MLAG) and by far not fully utilized.

>

>     Can anyone help me where to go form here ?

>

>

> still need the whole log. probably the guest (engine) is doing lots of

> memory i/o if you got a fair amount of running VMs and Hosts.  that

> will stalls the migration because the guest pages are getting dirty

> faster than qemu can copy.

>

>

> you have 2 options:

>

> 1. try several more times.

>

> 2. shutdown the engine vm, it should start on another host

>

>

>     Regards

>

>     Soeren

>

>



Hi Soeren,



the key issue seems to be this one-



"

vdsm.log:Thread-49551::WARNING::2015-02-26

15:50:27,359::migration::445::vm.Vm::(monitor_migration)

vmId=`b75823d1-00f0-457e-a692-8b95f73907db`::Migration is stuck: Hasn't progressed in 150.069458961 seconds. Aborting.

"



Basically it means that the migration process is not converging since moving the data from one host to the other is too slow.

The reason for this could be-

1. Extremely slow or busy network.

2. Something inside your VM is changing memory very fast (faster than the copying rate).



In order to rule out (2), you can start an empty VM with nothing inside but a minimal OS. If you're not sure you can use tiny Linux[1].

Such a minimal VM should have no problem with migrating from one machine to the other. If it has an issue, it means that you have a problem in your network that causes the migration process to take longer than it should.



Give it a try and let us know how it goes.

Doron



[1] http://distro.ibiblio.org/tinycorelinux/










-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150226/809719ef/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 80617 bytes
Desc: image002.jpg
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150226/809719ef/attachment-0004.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 74278 bytes
Desc: image004.jpg
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150226/809719ef/attachment-0005.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image008.jpg
Type: image/jpeg
Size: 80130 bytes
Desc: image008.jpg
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150226/809719ef/attachment-0006.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image010.jpg
Type: image/jpeg
Size: 77521 bytes
Desc: image010.jpg
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150226/809719ef/attachment-0007.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image015.png
Type: image/png
Size: 16984 bytes
Desc: image015.png
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150226/809719ef/attachment-0001.png>


More information about the Users mailing list