[Users] Live migration of VM's occasionally fails
Steve Dainard
sdainard at miovision.com
Mon Feb 17 18:52:39 UTC 2014
VM's are identical, same template, same cpu/mem/nic. Server type, thin
provisioned on NFS (backend is glusterfs 3.4).
Does monitor = spice console? I don't believe either of them had a spice
connection.
I don't see anything in the ovirt001 sanlock.log:
2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4 fd 14 pid 31132
2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4 fd 14 pid 31132
flags 1
2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 ci_in 5 fd
15 count 0
2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 result 0
pid_dead 0
2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 ci_in 6 fd
16 count 0
2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 result 0
pid_dead 0
2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5 fd 15 pid 31319
2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5 fd 15 pid 31319
flags 1
2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead 5,15,31319
cmd_active 0 suspend 0
2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
ovirt002 sanlock.log has on entries during that time frame.
*Steve Dainard *
IT Infrastructure Manager
Miovision <http://miovision.com/> | *Rethink Traffic*
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter
<https://twitter.com/miovision> | Facebook
<https://www.facebook.com/miovision>*
------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON,
Canada | N2C 1L3
This e-mail may contain information that is privileged or confidential. If
you are not the intended recipient, please delete the e-mail and any
attachments and notify us immediately.
On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron <dron at redhat.com> wrote:
> mmm... that is very interesting...
> both vm's are identical? are they server or desktops type? created as thin
> copy or clone? what storage type are you using? did you happen to have an
> open monitor on the vm that failed migration?
> I wonder if it can be sanlock lock on the source template but I can only
> see this bug happening if the vm's are linked to the template
> can you look at the sanlock log and see if there are any warning or errors?
>
> All logs are in debug so I don't think we can get anything more from it
> but I am adding Meital and Omer to this mail to help debug this - perhaps
> they can think of something that can cause that from the trace.
>
> This case is really interesting... sorry, probably not what you want to
> hear... thanks for helping with this :)
>
> Dafna
>
>
>
> On 02/17/2014 05:08 PM, Steve Dainard wrote:
>
>> Failed live migration is wider spread than these two VM's, but they are a
>> good example because they were both built from the same template and have
>> no modifications after they were created. They were also migrated one after
>> the other, with one successfully migrating and the other not.
>>
>> Are there any increased logging levels that might help determine what the
>> issue is?
>>
>> Thanks,
>>
>> *Steve Dainard *
>> IT Infrastructure Manager
>> Miovision <http://miovision.com/> | /Rethink Traffic/
>>
>> *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/
>> company/miovision-technologies> | Twitter <https://twitter.com/miovision>
>> | Facebook <https://www.facebook.com/miovision>*
>> ------------------------------------------------------------------------
>> Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener,
>> ON, Canada | N2C 1L3
>> This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.
>>
>>
>> On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron at redhat.com <mailto:
>> dron at redhat.com>> wrote:
>>
>> did you install these vm's from a cd? run it as run-once with a
>> special monitor?
>> try to think if there is anything different in the configuration
>> of these vm's from the other vm's that succeed to migrate?
>>
>>
>> On 02/17/2014 04:36 PM, Steve Dainard wrote:
>>
>> Hi Dafna,
>>
>> No snapshots of either of those VM's have been taken, and
>> there are no updates for any of those packages on EL 6.5.
>>
>> *Steve Dainard *
>> IT Infrastructure Manager
>> Miovision <http://miovision.com/> | /Rethink Traffic/
>>
>> *Blog <http://miovision.com/blog> | **LinkedIn
>> <https://www.linkedin.com/company/miovision-technologies> |
>> Twitter <https://twitter.com/miovision> | Facebook
>> <https://www.facebook.com/miovision>*
>> ------------------------------------------------------------
>> ------------
>> Miovision Technologies Inc. | 148 Manitou Drive, Suite 101,
>> Kitchener, ON, Canada | N2C 1L3
>> This e-mail may contain information that is privileged or
>> confidential. If you are not the intended recipient, please
>> delete the e-mail and any attachments and notify us immediately.
>>
>>
>> On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron at redhat.com
>> <mailto:dron at redhat.com> <mailto:dron at redhat.com
>> <mailto:dron at redhat.com>>> wrote:
>>
>> does the vm that fails migration have a live snapshot?
>> if so how many snapshots does the vm have.
>> I think that there are newer packages of vdsm, libvirt and
>> qemu -
>> can you try to update
>>
>>
>>
>> On 02/16/2014 12:33 AM, Steve Dainard wrote:
>>
>> Versions are the same:
>>
>> [root at ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu'
>> | sort
>> gpxe-roms-qemu-0.9.7-6.10.el6.noarch
>> libvirt-0.10.2-29.el6_5.3.x86_64
>> libvirt-client-0.10.2-29.el6_5.3.x86_64
>> libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
>> libvirt-python-0.10.2-29.el6_5.3.x86_64
>> qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
>> qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
>> qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
>> vdsm-4.13.3-3.el6.x86_64
>> vdsm-cli-4.13.3-3.el6.noarch
>> vdsm-gluster-4.13.3-3.el6.noarch
>> vdsm-python-4.13.3-3.el6.x86_64
>> vdsm-xmlrpc-4.13.3-3.el6.noarch
>>
>> [root at ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu'
>> | sort
>> gpxe-roms-qemu-0.9.7-6.10.el6.noarch
>> libvirt-0.10.2-29.el6_5.3.x86_64
>> libvirt-client-0.10.2-29.el6_5.3.x86_64
>> libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
>> libvirt-python-0.10.2-29.el6_5.3.x86_64
>> qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
>> qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
>> qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
>> vdsm-4.13.3-3.el6.x86_64
>> vdsm-cli-4.13.3-3.el6.noarch
>> vdsm-gluster-4.13.3-3.el6.noarch
>> vdsm-python-4.13.3-3.el6.x86_64
>> vdsm-xmlrpc-4.13.3-3.el6.noarch
>>
>> Logs attached, thanks.
>>
>> *Steve Dainard *
>> IT Infrastructure Manager
>> Miovision <http://miovision.com/> | /Rethink Traffic/
>>
>> *Blog <http://miovision.com/blog> | **LinkedIn
>> <https://www.linkedin.com/
>> company/miovision-technologies> |
>> Twitter <https://twitter.com/miovision> | Facebook
>> <https://www.facebook.com/miovision>*
>> ------------------------------
>> ------------------------------------------
>> Miovision Technologies Inc. | 148 Manitou Drive, Suite
>> 101,
>> Kitchener, ON, Canada | N2C 1L3
>> This e-mail may contain information that is privileged or
>> confidential. If you are not the intended recipient,
>> please
>> delete the e-mail and any attachments and notify us
>> immediately.
>>
>>
>> On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron
>> <dron at redhat.com <mailto:dron at redhat.com>
>> <mailto:dron at redhat.com <mailto:dron at redhat.com>>
>> <mailto:dron at redhat.com <mailto:dron at redhat.com>
>>
>> <mailto:dron at redhat.com <mailto:dron at redhat.com>>>>
>> wrote:
>>
>> the migration fails in libvirt:
>>
>>
>> Thread-153709::ERROR::2014-02-14
>> 11:17:40,420::vm::337::vm.Vm::(run)
>> vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed
>> to migrate
>> Traceback (most recent call last):
>> File "/usr/share/vdsm/vm.py", line 323, in run
>> self._startUnderlyingMigration()
>> File "/usr/share/vdsm/vm.py", line 403, in
>> _startUnderlyingMigration
>> None, maxBandwidth)
>> File "/usr/share/vdsm/vm.py", line 841, in f
>> ret = attr(*args, **kwargs)
>> File
>> "/usr/lib64/python2.6/site-
>> packages/vdsm/libvirtconnection.py",
>> line 76, in wrapper
>> ret = f(*args, **kwargs)
>> File
>> "/usr/lib64/python2.6/site-packages/libvirt.py",
>> line 1178,
>> in migrateToURI2
>> if ret == -1: raise libvirtError
>> ('virDomainMigrateToURI2()
>> failed', dom=self)
>> libvirtError: Unable to read from monitor: Connection
>> reset by peer
>> Thread-54041::DEBUG::2014-02-14
>> 11:17:41,752::task::579::TaskManager.Task::(_
>> updateState)
>> Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving
>> from state
>> init -> state preparing
>> Thread-54041::INFO::2014-02-14
>> 11:17:41,753::logUtils::44::dispatcher::(wrapper)
>> Run and
>> protect:
>> getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-
>> 0c78d14999e5',
>> spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814',
>> imgUUID='97c9108f-a506-415f-ad2
>> c-370d707cb130',
>> volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836',
>> options=None)
>>
>> Do you have the same libvirt/vdsm/qemu on both
>> your hosts?
>> Please attach the libvirt and vm logs from both hosts.
>>
>> Thanks,
>> Dafna
>>
>>
>>
>> On 02/14/2014 04:50 PM, Steve Dainard wrote:
>>
>> Quick overview:
>> Ovirt 3.3.2 running on CentOS 6.5
>> Two hosts: ovirt001, ovirt002
>> Migrating two VM's: puppet-agent1,
>> puppet-agent2 from
>> ovirt002
>> to ovirt001.
>>
>> The first VM puppet-agent1 migrates
>> successfully. The
>> second
>> VM puppet-agent2 fails with "Migration failed
>> due to
>> Error:
>> Fatal error during migration (VM:
>> puppet-agent2, Source:
>> ovirt002, Destination: ovirt001)."
>>
>> I've attached the logs if anyone can help me track
>> down the issue.
>>
>> Thanks,
>>
>> *Steve Dainard *
>> IT Infrastructure Manager
>> Miovision <http://miovision.com/> | /Rethink
>> Traffic/
>>
>> *Blog <http://miovision.com/blog> | **LinkedIn
>> <https://www.linkedin.com/
>> company/miovision-technologies> |
>> Twitter <https://twitter.com/miovision> |
>> Facebook
>> <https://www.facebook.com/miovision>*
>> ------------------------------
>> ------------------------------------------
>>
>>
>> Miovision Technologies Inc. | 148 Manitou
>> Drive, Suite
>> 101,
>> Kitchener, ON, Canada | N2C 1L3
>> This e-mail may contain information that is
>> privileged or
>> confidential. If you are not the intended
>> recipient,
>> please
>> delete the e-mail and any attachments and
>> notify us
>> immediately.
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org <mailto:Users at ovirt.org>
>> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
>> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>
>> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>>
>>
>>
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>> -- Dafna Ron
>>
>>
>>
>>
>> -- Dafna Ron
>>
>>
>>
>>
>> -- Dafna Ron
>>
>>
>>
>
> --
> Dafna Ron
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140217/d19e23ee/attachment-0001.html>
More information about the Users
mailing list