[Users] Live migration of VM's occasionally fails
Dafna Ron
dron at redhat.com
Mon Feb 17 22:32:01 UTC 2014
really interesting case :) maybe gluster related?
Elad, can you please try to reproduce this?
gluster storage -> at least two vm's server type created from template
as thin provision (it's clone copy).
after create run and migrate all vm's from one host to the second host.
I think it would be a locking issue.
Steve, can you please also check the sanlock log in the second host +
look if there are any errors in the gluster logs (on both hosts)?
Thanks,
Dafna
On 02/17/2014 06:52 PM, Steve Dainard wrote:
> VM's are identical, same template, same cpu/mem/nic. Server type, thin
> provisioned on NFS (backend is glusterfs 3.4).
>
> Does monitor = spice console? I don't believe either of them had a
> spice connection.
>
> I don't see anything in the ovirt001 sanlock.log:
>
> 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4 fd 14 pid 31132
> 2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4 fd 14 pid
> 31132 flags 1
> 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 ci_in 5
> fd 15 count 0
> 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 result
> 0 pid_dead 0
> 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 ci_in 6
> fd 16 count 0
> 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 result
> 0 pid_dead 0
> 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5 fd 15 pid 31319
> 2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5 fd 15 pid
> 31319 flags 1
> 2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead 5,15,31319
> cmd_active 0 suspend 0
> 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
> 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace 5,15
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
> flags 0
>
> ovirt002 sanlock.log has on entries during that time frame.
>
> *Steve Dainard *
> IT Infrastructure Manager
> Miovision <http://miovision.com/> | /Rethink Traffic/
>
> *Blog <http://miovision.com/blog> | **LinkedIn
> <https://www.linkedin.com/company/miovision-technologies> | Twitter
> <https://twitter.com/miovision> | Facebook
> <https://www.facebook.com/miovision>*
> ------------------------------------------------------------------------
> Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener,
> ON, Canada | N2C 1L3
> This e-mail may contain information that is privileged or
> confidential. If you are not the intended recipient, please delete the
> e-mail and any attachments and notify us immediately.
>
>
> On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron <dron at redhat.com
> <mailto:dron at redhat.com>> wrote:
>
> mmm... that is very interesting...
> both vm's are identical? are they server or desktops type? created
> as thin copy or clone? what storage type are you using? did you
> happen to have an open monitor on the vm that failed migration?
> I wonder if it can be sanlock lock on the source template but I
> can only see this bug happening if the vm's are linked to the template
> can you look at the sanlock log and see if there are any warning
> or errors?
>
> All logs are in debug so I don't think we can get anything more
> from it but I am adding Meital and Omer to this mail to help debug
> this - perhaps they can think of something that can cause that
> from the trace.
>
> This case is really interesting... sorry, probably not what you
> want to hear... thanks for helping with this :)
>
> Dafna
>
>
>
> On 02/17/2014 05:08 PM, Steve Dainard wrote:
>
> Failed live migration is wider spread than these two VM's, but
> they are a good example because they were both built from the
> same template and have no modifications after they were
> created. They were also migrated one after the other, with one
> successfully migrating and the other not.
>
> Are there any increased logging levels that might help
> determine what the issue is?
>
> Thanks,
>
> *Steve Dainard *
> IT Infrastructure Manager
> Miovision <http://miovision.com/> | /Rethink Traffic/
>
> *Blog <http://miovision.com/blog> | **LinkedIn
> <https://www.linkedin.com/company/miovision-technologies> |
> Twitter <https://twitter.com/miovision> | Facebook
> <https://www.facebook.com/miovision>*
> ------------------------------------------------------------------------
> Miovision Technologies Inc. | 148 Manitou Drive, Suite 101,
> Kitchener, ON, Canada | N2C 1L3
> This e-mail may contain information that is privileged or
> confidential. If you are not the intended recipient, please
> delete the e-mail and any attachments and notify us immediately.
>
>
> On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron at redhat.com
> <mailto:dron at redhat.com> <mailto:dron at redhat.com
> <mailto:dron at redhat.com>>> wrote:
>
> did you install these vm's from a cd? run it as run-once
> with a
> special monitor?
> try to think if there is anything different in the
> configuration
> of these vm's from the other vm's that succeed to migrate?
>
>
> On 02/17/2014 04:36 PM, Steve Dainard wrote:
>
> Hi Dafna,
>
> No snapshots of either of those VM's have been taken, and
> there are no updates for any of those packages on EL 6.5.
>
> *Steve Dainard *
> IT Infrastructure Manager
> Miovision <http://miovision.com/> | /Rethink Traffic/
>
> *Blog <http://miovision.com/blog> | **LinkedIn
>
> <https://www.linkedin.com/company/miovision-technologies> |
> Twitter <https://twitter.com/miovision> | Facebook
> <https://www.facebook.com/miovision>*
>
> ------------------------------------------------------------------------
> Miovision Technologies Inc. | 148 Manitou Drive, Suite
> 101,
> Kitchener, ON, Canada | N2C 1L3
> This e-mail may contain information that is privileged or
> confidential. If you are not the intended recipient,
> please
> delete the e-mail and any attachments and notify us
> immediately.
>
>
> On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron
> <dron at redhat.com <mailto:dron at redhat.com>
> <mailto:dron at redhat.com <mailto:dron at redhat.com>>
> <mailto:dron at redhat.com <mailto:dron at redhat.com>
> <mailto:dron at redhat.com <mailto:dron at redhat.com>>>> wrote:
>
> does the vm that fails migration have a live snapshot?
> if so how many snapshots does the vm have.
> I think that there are newer packages of vdsm,
> libvirt and
> qemu -
> can you try to update
>
>
>
> On 02/16/2014 12:33 AM, Steve Dainard wrote:
>
> Versions are the same:
>
> [root at ovirt001 ~]# rpm -qa | egrep
> 'libvirt|vdsm|qemu'
> | sort
> gpxe-roms-qemu-0.9.7-6.10.el6.noarch
> libvirt-0.10.2-29.el6_5.3.x86_64
> libvirt-client-0.10.2-29.el6_5.3.x86_64
> libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
> libvirt-python-0.10.2-29.el6_5.3.x86_64
> qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
> qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
> qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
> vdsm-4.13.3-3.el6.x86_64
> vdsm-cli-4.13.3-3.el6.noarch
> vdsm-gluster-4.13.3-3.el6.noarch
> vdsm-python-4.13.3-3.el6.x86_64
> vdsm-xmlrpc-4.13.3-3.el6.noarch
>
> [root at ovirt002 ~]# rpm -qa | egrep
> 'libvirt|vdsm|qemu'
> | sort
> gpxe-roms-qemu-0.9.7-6.10.el6.noarch
> libvirt-0.10.2-29.el6_5.3.x86_64
> libvirt-client-0.10.2-29.el6_5.3.x86_64
> libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
> libvirt-python-0.10.2-29.el6_5.3.x86_64
> qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
> qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
> qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
> vdsm-4.13.3-3.el6.x86_64
> vdsm-cli-4.13.3-3.el6.noarch
> vdsm-gluster-4.13.3-3.el6.noarch
> vdsm-python-4.13.3-3.el6.x86_64
> vdsm-xmlrpc-4.13.3-3.el6.noarch
>
> Logs attached, thanks.
>
> *Steve Dainard *
> IT Infrastructure Manager
> Miovision <http://miovision.com/> | /Rethink
> Traffic/
>
> *Blog <http://miovision.com/blog> | **LinkedIn
>
> <https://www.linkedin.com/company/miovision-technologies> |
> Twitter <https://twitter.com/miovision> |
> Facebook
> <https://www.facebook.com/miovision>*
>
> ------------------------------------------------------------------------
> Miovision Technologies Inc. | 148 Manitou
> Drive, Suite
> 101,
> Kitchener, ON, Canada | N2C 1L3
> This e-mail may contain information that is
> privileged or
> confidential. If you are not the intended
> recipient,
> please
> delete the e-mail and any attachments and
> notify us
> immediately.
>
>
> On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron
> <dron at redhat.com <mailto:dron at redhat.com>
> <mailto:dron at redhat.com <mailto:dron at redhat.com>>
> <mailto:dron at redhat.com
> <mailto:dron at redhat.com> <mailto:dron at redhat.com
> <mailto:dron at redhat.com>>>
> <mailto:dron at redhat.com <mailto:dron at redhat.com>
> <mailto:dron at redhat.com <mailto:dron at redhat.com>>
>
> <mailto:dron at redhat.com
> <mailto:dron at redhat.com> <mailto:dron at redhat.com
> <mailto:dron at redhat.com>>>>> wrote:
>
> the migration fails in libvirt:
>
>
> Thread-153709::ERROR::2014-02-14
> 11:17:40,420::vm::337::vm.Vm::(run)
> vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate
> Traceback (most recent call last):
> File "/usr/share/vdsm/vm.py", line 323,
> in run
> self._startUnderlyingMigration()
> File "/usr/share/vdsm/vm.py", line 403, in
> _startUnderlyingMigration
> None, maxBandwidth)
> File "/usr/share/vdsm/vm.py", line 841, in f
> ret = attr(*args, **kwargs)
> File
> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
> line 76, in wrapper
> ret = f(*args, **kwargs)
> File
> "/usr/lib64/python2.6/site-packages/libvirt.py",
> line 1178,
> in migrateToURI2
> if ret == -1: raise libvirtError
> ('virDomainMigrateToURI2()
> failed', dom=self)
> libvirtError: Unable to read from monitor:
> Connection
> reset by peer
> Thread-54041::DEBUG::2014-02-14
>
> 11:17:41,752::task::579::TaskManager.Task::(_updateState)
> Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state
> init -> state preparing
> Thread-54041::INFO::2014-02-14
>
> 11:17:41,753::logUtils::44::dispatcher::(wrapper)
> Run and
> protect:
>
> getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5',
> spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814',
> imgUUID='97c9108f-a506-415f-ad2
> c-370d707cb130',
> volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836',
> options=None)
>
> Do you have the same libvirt/vdsm/qemu on both
> your hosts?
> Please attach the libvirt and vm logs from
> both hosts.
>
> Thanks,
> Dafna
>
>
>
> On 02/14/2014 04:50 PM, Steve Dainard wrote:
>
> Quick overview:
> Ovirt 3.3.2 running on CentOS 6.5
> Two hosts: ovirt001, ovirt002
> Migrating two VM's: puppet-agent1,
> puppet-agent2 from
> ovirt002
> to ovirt001.
>
> The first VM puppet-agent1 migrates
> successfully. The
> second
> VM puppet-agent2 fails with "Migration
> failed
> due to
> Error:
> Fatal error during migration (VM:
> puppet-agent2, Source:
> ovirt002, Destination: ovirt001)."
>
> I've attached the logs if anyone can
> help me track
> down the issue.
>
> Thanks,
>
> *Steve Dainard *
> IT Infrastructure Manager
> Miovision <http://miovision.com/> |
> /Rethink
> Traffic/
>
> *Blog <http://miovision.com/blog> |
> **LinkedIn
>
> <https://www.linkedin.com/company/miovision-technologies> |
> Twitter <https://twitter.com/miovision> |
> Facebook
> <https://www.facebook.com/miovision>*
> ------------------------------------------------------------------------
>
>
> Miovision Technologies Inc. | 148 Manitou
> Drive, Suite
> 101,
> Kitchener, ON, Canada | N2C 1L3
> This e-mail may contain information
> that is
> privileged or
> confidential. If you are not the intended
> recipient,
> please
> delete the e-mail and any attachments and
> notify us
> immediately.
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org <mailto:Users at ovirt.org>
> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>
> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>>
> <mailto:Users at ovirt.org
> <mailto:Users at ovirt.org> <mailto:Users at ovirt.org
> <mailto:Users at ovirt.org>>
> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>
> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>>>
>
>
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> -- Dafna Ron
>
>
>
>
> -- Dafna Ron
>
>
>
>
> -- Dafna Ron
>
>
>
>
> --
> Dafna Ron
>
>
--
Dafna Ron
More information about the Users
mailing list