I don't think that there is an inconsistency, there seem to have been a
problem in your setup which cause an issue in migration at that time. if
you are able to migrate it means that the issue was resolved.
It might be a bug or there might have been a problem with the network or
storage at the same time you were trying to migrate but it's ok now.
In any case, this is what I see in the logs:
I can see that the migration was stuck:
ovirt001.vdsm.log.4:Thread-112::WARNING::2014-04-28
13:04:08,922::vm::807::vm.Vm::(run)
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Migration is stuck: Hasn't
progressed in 150.061944962 seconds. Aborting.
ovirt001.vdsm.log.4:Thread-54::ERROR::2014-04-28
13:04:45,001::sampling::355::vm.Vm::(collect)
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Stats function failed:
<AdvancedStatsFunction _sampleDisk at 0x11ebb20>
This is the ERROR and it seems that the problem is trying to acquire a
lock on the disk fails (sanlock might show something or the vm's qemu log):
Thread-54::ERROR::2014-04-28
13:02:45,001::sampling::355::vm.Vm::(collect)
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Stats function failed:
<AdvancedStatsFunction _sampleDisk at 0x11ebb20>
Traceback (most recent call last):
File "/usr/share/vdsm/sampling.py", line 351, in collect
statsFunction()
File "/usr/share/vdsm/sampling.py", line 226, in __call__
retValue = self._function(*args, **kwargs)
File "/usr/share/vdsm/vm.py", line 542, in _sampleDisk
diskSamples[vmDrive.name] = self._vm._dom.blockStats(vmDrive.name)
File "/usr/share/vdsm/vm.py", line 867, in f
raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock
Thread-112::INFO::2014-04-28 13:02:48,870::vm::833::vm.Vm::(run)
vmId=`2e07f13a-cf04-4fd4-9ecc-7d72c2b630b0`::Migration Progress: 520
seconds elapsed, 79% of data processed, 79% of mem processed
The migration reached 79% and than there was a problem connected to the
vm's disk (perhaps there was an intermittent issue with your network
connection to the storage making it intermittently unavailable from the
host?)
Dafna
On 04/29/2014 03:51 PM, Steve Dainard wrote:
Hosts packages:
vdsm-4.14.6-0.el6.x86_64
libvirt-0.10.2-29.el6_5.7.x86_64
qemu-kvm-0.12.1.2-2.415.el6_5.8.x86_64 (from the jenkins build with
live snapshot support)
Both those examples are of existing VM's, they were generated from a
template, and they do have snapshots.
I just attempted to migrate another VM 'DHCP01' and it was successful,
this VM was generated from the same template and containing snapshots.
I then was able to migrate the 'owncloud' vm successfully (which
failed multiple times before), then I attempted another vm 'CV_Test'
and it completed successfully but the time is way off, listing the
migration was done in 31 seconds when the log shows 6 minutes. I ran
through the same VM migration again to double check, and there was a
kvm process for that VM running on both hosts for longer than what is
listed in the GUI.
2014-Apr-29, 10:15 Migration completed (VM: CV_Test, Source: ovirt001,
Destination: ovirt002, Duration: 31 sec).
2014-Apr-29, 10:09 Migration started (VM: CV_Test, Source: ovirt001,
Destination: ovirt002, User: admin).
2014-Apr-29, 10:09 Migration completed (VM: central-syslog, Source:
ovirt002, Destination: ovirt001, Duration: 26 sec).
2014-Apr-29, 10:09 Migration started (VM: central-syslog, Source:
ovirt002, Destination: ovirt001, User: admin).
2014-Apr-29, 10:08 VM central-syslog started on Host ovirt002
2014-Apr-29, 10:08 User admin@internal is connected to VM central-syslog.
2014-Apr-29, 10:08 user admin initiated console session for VM
central-syslog
2014-Apr-29, 10:07 VM central-syslog was started by admin (Host:
ovirt002).
2014-Apr-29, 10:06 Migration completed (VM: owncloud, Source:
ovirt002, Destination: ovirt001, Duration: 20 sec).
2014-Apr-29, 10:06 Migration started (VM: owncloud, Source: ovirt002,
Destination: ovirt001, User: admin).
2014-Apr-29, 09:55 VM DHCP01 is down. Exit message: User shut down
2014-Apr-29, 09:55 Migration completed (VM: syslog, Source: ovirt002,
Destination: ovirt001, Duration: 31 sec).
2014-Apr-29, 09:55 Migration started (VM: syslog, Source: ovirt002,
Destination: ovirt001, User: admin).
2014-Apr-29, 09:54 VM shutdown initiated by admin on VM DHCP01 (Host:
ovirt001).
2014-Apr-29, 09:54 Migration completed (VM: DHCP01, Source: ovirt002,
Destination: ovirt001, Duration: 18 sec).
2014-Apr-29, 09:54 Migration started (VM: DHCP01, Source: ovirt002,
Destination: ovirt001, User: admin).
Right now my greatest concern is a lack of consistency. I've attached
vdsm/libvirt logs from both sides of the first migration error:
2014-Apr-28, 13:12 Migration failed due to Error: Fatal error during
migration (VM: central-syslog, Source: ovirt002, Destination: ovirt001).
2014-Apr-28, 13:12 Migration failed due to Error: Fatal error during
migration. Trying to migrate to another Host (VM: central-syslog,
Source: ovirt002, Destination: ovirt001).
2014-Apr-28, 13:12 Migration started (VM: central-syslog, Source:
ovirt002, Destination: ovirt001, User: admin).
Thanks,
Steve
On Tue, Apr 29, 2014 at 9:51 AM, Dafna Ron <dron(a)redhat.com
<mailto:dron@redhat.com>> wrote:
Actually, the best way to debug this would be to look at both des
and src vdsm logs.
is this happening on all vm's or just one of them?
was this vm launched from an iso? is that iso still available?
are there any snapshots?
what are the vdsm, libvirt and qemu versions?
Thanks.
Dafna
On 04/29/2014 02:24 PM, Steve Dainard wrote:
Thanks, logs attached:
libvirtd.log.4/central-syslog.log covers the first event
(17:12 timestamp)
libvirtd.log.3/owncloud.log covers the second event (01:22
timestamp)
Steve
On Tue, Apr 29, 2014 at 4:48 AM, Francesco Romani
<fromani(a)redhat.com <mailto:fromani@redhat.com>
<mailto:fromani@redhat.com <mailto:fromani@redhat.com>>> wrote:
----- Original Message -----
> From: "Steve Dainard" <sdainard(a)miovision.com
<mailto:sdainard@miovision.com>
<mailto:sdainard@miovision.com
<mailto:sdainard@miovision.com>>>
> To: "users" <users(a)ovirt.org
<mailto:users@ovirt.org>
<mailto:users@ovirt.org <mailto:users@ovirt.org>>>
> Sent: Tuesday, April 29, 2014 4:32:08 AM
> Subject: Re: [ovirt-users] Live migration failing
>
> Another error on migration.
Hi, in both cases the core issue is
ibvirtError: Unable to read from monitor: Connection reset
by peer
can you share the libvirtd and qemu logs?
Hopefully we can find some more information on those logs.
Bests,
--
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
_______________________________________________
Users mailing list
Users(a)ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users
--
Dafna Ron