[ovirt-users] Live migration failing
Dafna Ron
dron at redhat.com
Tue Apr 29 15:27:45 UTC 2014
I don't think that there is an inconsistency, there seem to have been a
problem in your setup which cause an issue in migration at that time. if
you are able to migrate it means that the issue was resolved.
It might be a bug or there might have been a problem with the network or
storage at the same time you were trying to migrate but it's ok now.
In any case, this is what I see in the logs:
I can see that the migration was stuck:
ovirt001.vdsm.log.4:Thread-112::WARNING::2014-04-28
13:04:08,922::vm::807::vm.Vm::(run)
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Migration is stuck: Hasn't
progressed in 150.061944962 seconds. Aborting.
ovirt001.vdsm.log.4:Thread-54::ERROR::2014-04-28
13:04:45,001::sampling::355::vm.Vm::(collect)
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Stats function failed:
<AdvancedStatsFunction _sampleDisk at 0x11ebb20>
This is the ERROR and it seems that the problem is trying to acquire a
lock on the disk fails (sanlock might show something or the vm's qemu log):
Thread-54::ERROR::2014-04-28
13:02:45,001::sampling::355::vm.Vm::(collect)
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Stats function failed:
<AdvancedStatsFunction _sampleDisk at 0x11ebb20>
Traceback (most recent call last):
File "/usr/share/vdsm/sampling.py", line 351, in collect
statsFunction()
File "/usr/share/vdsm/sampling.py", line 226, in __call__
retValue = self._function(*args, **kwargs)
File "/usr/share/vdsm/vm.py", line 542, in _sampleDisk
diskSamples[vmDrive.name] = self._vm._dom.blockStats(vmDrive.name)
File "/usr/share/vdsm/vm.py", line 867, in f
raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock
Thread-112::INFO::2014-04-28 13:02:48,870::vm::833::vm.Vm::(run)
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Migration Progress: 520
seconds elapsed, 79% of data processed, 79% of mem processed
The migration reached 79% and than there was a problem connected to the
vm's disk (perhaps there was an intermittent issue with your network
connection to the storage making it intermittently unavailable from the
host?)
Dafna
On 04/29/2014 03:51 PM, Steve Dainard wrote:
> Hosts packages:
> vdsm-4.14.6-0.el6.x86_64
> libvirt-0.10.2-29.el6_5.7.x86_64
> qemu-kvm-0.12.1.2-2.415.el6_5.8.x86_64 (from the jenkins build with
> live snapshot support)
>
> Both those examples are of existing VM's, they were generated from a
> template, and they do have snapshots.
>
> I just attempted to migrate another VM 'DHCP01' and it was successful,
> this VM was generated from the same template and containing snapshots.
> I then was able to migrate the 'owncloud' vm successfully (which
> failed multiple times before), then I attempted another vm 'CV_Test'
> and it completed successfully but the time is way off, listing the
> migration was done in 31 seconds when the log shows 6 minutes. I ran
> through the same VM migration again to double check, and there was a
> kvm process for that VM running on both hosts for longer than what is
> listed in the GUI.
>
> 2014-Apr-29, 10:15 Migration completed (VM: CV_Test, Source: ovirt001,
> Destination: ovirt002, Duration: 31 sec).
> 2014-Apr-29, 10:09 Migration started (VM: CV_Test, Source: ovirt001,
> Destination: ovirt002, User: admin).
>
> 2014-Apr-29, 10:09 Migration completed (VM: central-syslog, Source:
> ovirt002, Destination: ovirt001, Duration: 26 sec).
> 2014-Apr-29, 10:09 Migration started (VM: central-syslog, Source:
> ovirt002, Destination: ovirt001, User: admin).
> 2014-Apr-29, 10:08 VM central-syslog started on Host ovirt002
> 2014-Apr-29, 10:08 User admin at internal is connected to VM central-syslog.
> 2014-Apr-29, 10:08 user admin initiated console session for VM
> central-syslog
> 2014-Apr-29, 10:07 VM central-syslog was started by admin (Host:
> ovirt002).
> 2014-Apr-29, 10:06 Migration completed (VM: owncloud, Source:
> ovirt002, Destination: ovirt001, Duration: 20 sec).
> 2014-Apr-29, 10:06 Migration started (VM: owncloud, Source: ovirt002,
> Destination: ovirt001, User: admin).
> 2014-Apr-29, 09:55 VM DHCP01 is down. Exit message: User shut down
> 2014-Apr-29, 09:55 Migration completed (VM: syslog, Source: ovirt002,
> Destination: ovirt001, Duration: 31 sec).
> 2014-Apr-29, 09:55 Migration started (VM: syslog, Source: ovirt002,
> Destination: ovirt001, User: admin).
> 2014-Apr-29, 09:54 VM shutdown initiated by admin on VM DHCP01 (Host:
> ovirt001).
> 2014-Apr-29, 09:54 Migration completed (VM: DHCP01, Source: ovirt002,
> Destination: ovirt001, Duration: 18 sec).
> 2014-Apr-29, 09:54 Migration started (VM: DHCP01, Source: ovirt002,
> Destination: ovirt001, User: admin).
>
>
> Right now my greatest concern is a lack of consistency. I've attached
> vdsm/libvirt logs from both sides of the first migration error:
>
> 2014-Apr-28, 13:12 Migration failed due to Error: Fatal error during
> migration (VM: central-syslog, Source: ovirt002, Destination: ovirt001).
> 2014-Apr-28, 13:12 Migration failed due to Error: Fatal error during
> migration. Trying to migrate to another Host (VM: central-syslog,
> Source: ovirt002, Destination: ovirt001).
> 2014-Apr-28, 13:12 Migration started (VM: central-syslog, Source:
> ovirt002, Destination: ovirt001, User: admin).
>
>
> Thanks,
> Steve
>
>
>
> On Tue, Apr 29, 2014 at 9:51 AM, Dafna Ron <dron at redhat.com
> <mailto:dron at redhat.com>> wrote:
>
> Actually, the best way to debug this would be to look at both des
> and src vdsm logs.
> is this happening on all vm's or just one of them?
> was this vm launched from an iso? is that iso still available?
> are there any snapshots?
> what are the vdsm, libvirt and qemu versions?
>
> Thanks.
>
> Dafna
>
>
>
> On 04/29/2014 02:24 PM, Steve Dainard wrote:
>
> Thanks, logs attached:
>
> libvirtd.log.4/central-syslog.log covers the first event
> (17:12 timestamp)
> libvirtd.log.3/owncloud.log covers the second event (01:22
> timestamp)
>
>
> Steve
>
>
>
> On Tue, Apr 29, 2014 at 4:48 AM, Francesco Romani
> <fromani at redhat.com <mailto:fromani at redhat.com>
> <mailto:fromani at redhat.com <mailto:fromani at redhat.com>>> wrote:
>
> ----- Original Message -----
> > From: "Steve Dainard" <sdainard at miovision.com
> <mailto:sdainard at miovision.com>
> <mailto:sdainard at miovision.com
> <mailto:sdainard at miovision.com>>>
> > To: "users" <users at ovirt.org <mailto:users at ovirt.org>
> <mailto:users at ovirt.org <mailto:users at ovirt.org>>>
> > Sent: Tuesday, April 29, 2014 4:32:08 AM
> > Subject: Re: [ovirt-users] Live migration failing
> >
> > Another error on migration.
>
> Hi, in both cases the core issue is
>
> ibvirtError: Unable to read from monitor: Connection reset
> by peer
>
> can you share the libvirtd and qemu logs?
>
> Hopefully we can find some more information on those logs.
>
> Bests,
>
> --
> Francesco Romani
> RedHat Engineering Virtualization R & D
> Phone: 8261328
> IRC: fromani
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org <mailto:Users at ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> --
> Dafna Ron
>
>
--
Dafna Ron
More information about the Users
mailing list