[ovirt-users] Live migration failing

Dafna Ron dron at redhat.com
Tue Apr 29 15:27:45 UTC 2014


I don't think that there is an inconsistency, there seem to have been a 
problem in your setup which cause an issue in migration at that time. if 
you are able to migrate it means that the issue was resolved.
It might be a bug or there might have been a problem with the network or 
storage at the same time you were trying to migrate but it's ok now.
In any case, this is what I see in the logs:

I can see that the migration was stuck:

ovirt001.vdsm.log.4:Thread-112::WARNING::2014-04-28 
13:04:08,922::vm::807::vm.Vm::(run) 
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Migration is stuck: Hasn't 
progressed in 150.061944962 seconds. Aborting.
ovirt001.vdsm.log.4:Thread-54::ERROR::2014-04-28 
13:04:45,001::sampling::355::vm.Vm::(collect) 
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Stats function failed: 
<AdvancedStatsFunction _sampleDisk at 0x11ebb20>

This is the ERROR and it seems that the problem is trying to acquire a 
lock on the disk fails (sanlock might show something or the vm's qemu log):

Thread-54::ERROR::2014-04-28 
13:02:45,001::sampling::355::vm.Vm::(collect) 
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Stats function failed: 
<AdvancedStatsFunction _sampleDisk at 0x11ebb20>
Traceback (most recent call last):
   File "/usr/share/vdsm/sampling.py", line 351, in collect
     statsFunction()
   File "/usr/share/vdsm/sampling.py", line 226, in __call__
     retValue = self._function(*args, **kwargs)
   File "/usr/share/vdsm/vm.py", line 542, in _sampleDisk
     diskSamples[vmDrive.name] = self._vm._dom.blockStats(vmDrive.name)
   File "/usr/share/vdsm/vm.py", line 867, in f
     raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock
Thread-112::INFO::2014-04-28 13:02:48,870::vm::833::vm.Vm::(run) 
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Migration Progress: 520 
seconds elapsed, 79% of data processed, 79% of mem processed


The migration reached 79% and than there was a problem connected to the 
vm's disk (perhaps there was an intermittent issue with your network 
connection to the storage making it intermittently unavailable from the 
host?)

Dafna



On 04/29/2014 03:51 PM, Steve Dainard wrote:
> Hosts packages:
> vdsm-4.14.6-0.el6.x86_64
> libvirt-0.10.2-29.el6_5.7.x86_64
> qemu-kvm-0.12.1.2-2.415.el6_5.8.x86_64 (from the jenkins build with 
> live snapshot support)
>
> Both those examples are of existing VM's, they were generated from a 
> template, and they do have snapshots.
>
> I just attempted to migrate another VM 'DHCP01' and it was successful, 
> this VM was generated from the same template and containing snapshots.
> I then was able to migrate the 'owncloud' vm successfully (which 
> failed multiple times before), then I attempted another vm 'CV_Test' 
> and it completed successfully but the time is way off, listing the 
> migration was done in 31 seconds when the log shows 6 minutes. I ran 
> through the same VM migration again to double check, and there was a 
> kvm process for that VM running on both hosts for longer than what is 
> listed in the GUI.
>
> 2014-Apr-29, 10:15 Migration completed (VM: CV_Test, Source: ovirt001, 
> Destination: ovirt002, Duration: 31 sec).
> 2014-Apr-29, 10:09 Migration started (VM: CV_Test, Source: ovirt001, 
> Destination: ovirt002, User: admin).
>
> 2014-Apr-29, 10:09 Migration completed (VM: central-syslog, Source: 
> ovirt002, Destination: ovirt001, Duration: 26 sec).
> 2014-Apr-29, 10:09 Migration started (VM: central-syslog, Source: 
> ovirt002, Destination: ovirt001, User: admin).
> 2014-Apr-29, 10:08 VM central-syslog started on Host ovirt002
> 2014-Apr-29, 10:08 User admin at internal is connected to VM central-syslog.
> 2014-Apr-29, 10:08 user admin initiated console session for VM 
> central-syslog
> 2014-Apr-29, 10:07 VM central-syslog was started by admin (Host: 
> ovirt002).
> 2014-Apr-29, 10:06 Migration completed (VM: owncloud, Source: 
> ovirt002, Destination: ovirt001, Duration: 20 sec).
> 2014-Apr-29, 10:06 Migration started (VM: owncloud, Source: ovirt002, 
> Destination: ovirt001, User: admin).
> 2014-Apr-29, 09:55 VM DHCP01 is down. Exit message: User shut down
> 2014-Apr-29, 09:55 Migration completed (VM: syslog, Source: ovirt002, 
> Destination: ovirt001, Duration: 31 sec).
> 2014-Apr-29, 09:55 Migration started (VM: syslog, Source: ovirt002, 
> Destination: ovirt001, User: admin).
> 2014-Apr-29, 09:54 VM shutdown initiated by admin on VM DHCP01 (Host: 
> ovirt001).
> 2014-Apr-29, 09:54 Migration completed (VM: DHCP01, Source: ovirt002, 
> Destination: ovirt001, Duration: 18 sec).
> 2014-Apr-29, 09:54 Migration started (VM: DHCP01, Source: ovirt002, 
> Destination: ovirt001, User: admin).
>
>
> Right now my greatest concern is a lack of consistency. I've attached 
> vdsm/libvirt logs from both sides of the first migration error:
>
> 2014-Apr-28, 13:12 Migration failed due to Error: Fatal error during 
> migration (VM: central-syslog, Source: ovirt002, Destination: ovirt001).
> 2014-Apr-28, 13:12 Migration failed due to Error: Fatal error during 
> migration. Trying to migrate to another Host (VM: central-syslog, 
> Source: ovirt002, Destination: ovirt001).
> 2014-Apr-28, 13:12 Migration started (VM: central-syslog, Source: 
> ovirt002, Destination: ovirt001, User: admin).
>
>
> Thanks,
> Steve
>
>
>
> On Tue, Apr 29, 2014 at 9:51 AM, Dafna Ron <dron at redhat.com 
> <mailto:dron at redhat.com>> wrote:
>
>     Actually, the best way to debug this would be to look at both des
>     and src vdsm logs.
>     is this happening on all vm's or just one of them?
>     was this vm launched from an iso? is that iso still available?
>     are there any snapshots?
>     what are the vdsm, libvirt and qemu versions?
>
>     Thanks.
>
>     Dafna
>
>
>
>     On 04/29/2014 02:24 PM, Steve Dainard wrote:
>
>         Thanks, logs attached:
>
>         libvirtd.log.4/central-syslog.log covers the first event
>         (17:12 timestamp)
>         libvirtd.log.3/owncloud.log covers the second event (01:22
>         timestamp)
>
>
>         Steve
>
>
>
>         On Tue, Apr 29, 2014 at 4:48 AM, Francesco Romani
>         <fromani at redhat.com <mailto:fromani at redhat.com>
>         <mailto:fromani at redhat.com <mailto:fromani at redhat.com>>> wrote:
>
>             ----- Original Message -----
>             > From: "Steve Dainard" <sdainard at miovision.com
>         <mailto:sdainard at miovision.com>
>             <mailto:sdainard at miovision.com
>         <mailto:sdainard at miovision.com>>>
>             > To: "users" <users at ovirt.org <mailto:users at ovirt.org>
>         <mailto:users at ovirt.org <mailto:users at ovirt.org>>>
>             > Sent: Tuesday, April 29, 2014 4:32:08 AM
>             > Subject: Re: [ovirt-users] Live migration failing
>             >
>             > Another error on migration.
>
>             Hi, in both cases the core issue is
>
>             ibvirtError: Unable to read from monitor: Connection reset
>         by peer
>
>             can you share the libvirtd and qemu logs?
>
>             Hopefully we can find some more information on those logs.
>
>             Bests,
>
>             --
>             Francesco Romani
>             RedHat Engineering Virtualization R & D
>             Phone: 8261328
>             IRC: fromani
>
>
>
>
>         _______________________________________________
>         Users mailing list
>         Users at ovirt.org <mailto:Users at ovirt.org>
>         http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>     -- 
>     Dafna Ron
>
>


-- 
Dafna Ron



More information about the Users mailing list