[Users] Live migration of VM's occasionally fails

Dafna Ron dron at redhat.com
Mon Feb 17 22:32:01 UTC 2014


really interesting case :)  maybe gluster related?
Elad, can you please try to reproduce this?
gluster storage -> at least two vm's server type created from template 
as thin provision (it's clone copy).
after create run and migrate all vm's from one host to the second host.
I think it would be a locking issue.

Steve, can you please also check the sanlock log in the second host + 
look if there are any errors in the gluster logs (on both hosts)?

Thanks,

Dafna


On 02/17/2014 06:52 PM, Steve Dainard wrote:
> VM's are identical, same template, same cpu/mem/nic. Server type, thin 
> provisioned on NFS (backend is glusterfs 3.4).
>
> Does monitor = spice console? I don't believe either of them had a 
> spice connection.
>
> I don't see anything in the ovirt001 sanlock.log:
>
> 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 done 0
> 2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4 fd 14 pid 31132
> 2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4 fd 14 pid 
> 31132 flags 1
> 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 ci_in 5 
> fd 15 count 0
> 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 result 
> 0 pid_dead 0
> 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 ci_in 6 
> fd 16 count 0
> 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 result 
> 0 pid_dead 0
> 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5 fd 15 pid 31319
> 2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5 fd 15 pid 
> 31319 flags 1
> 2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead 5,15,31319 
> cmd_active 0 suspend 0
> 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
> 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 done 0
> 2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace 5,15 
> a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 
> flags 0
>
> ovirt002 sanlock.log has on entries during that time frame.
>
> *Steve Dainard *
> IT Infrastructure Manager
> Miovision <http://miovision.com/> | /Rethink Traffic/
>
> *Blog <http://miovision.com/blog> | **LinkedIn 
> <https://www.linkedin.com/company/miovision-technologies>  | Twitter 
> <https://twitter.com/miovision>  | Facebook 
> <https://www.facebook.com/miovision>*
> ------------------------------------------------------------------------
> Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, 
> ON, Canada | N2C 1L3
> This e-mail may contain information that is privileged or 
> confidential. If you are not the intended recipient, please delete the 
> e-mail and any attachments and notify us immediately.
>
>
> On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron <dron at redhat.com 
> <mailto:dron at redhat.com>> wrote:
>
>     mmm... that is very interesting...
>     both vm's are identical? are they server or desktops type? created
>     as thin copy or clone? what storage type are you using? did you
>     happen to have an open monitor on the vm that failed migration?
>     I wonder if it can be sanlock lock on the source template but I
>     can only see this bug happening if the vm's are linked to the template
>     can you look at the sanlock log and see if there are any warning
>     or errors?
>
>     All logs are in debug so I don't think we can get anything more
>     from it but I am adding Meital and Omer to this mail to help debug
>     this - perhaps they can think of something that can cause that
>     from the trace.
>
>     This case is really interesting... sorry, probably not what you
>     want to hear...  thanks for helping with this :)
>
>     Dafna
>
>
>
>     On 02/17/2014 05:08 PM, Steve Dainard wrote:
>
>         Failed live migration is wider spread than these two VM's, but
>         they are a good example because they were both built from the
>         same template and have no modifications after they were
>         created. They were also migrated one after the other, with one
>         successfully migrating and the other not.
>
>         Are there any increased logging levels that might help
>         determine what the issue is?
>
>         Thanks,
>
>         *Steve Dainard *
>         IT Infrastructure Manager
>         Miovision <http://miovision.com/> | /Rethink Traffic/
>
>         *Blog <http://miovision.com/blog> | **LinkedIn
>         <https://www.linkedin.com/company/miovision-technologies>  |
>         Twitter <https://twitter.com/miovision>  | Facebook
>         <https://www.facebook.com/miovision>*
>         ------------------------------------------------------------------------
>         Miovision Technologies Inc. | 148 Manitou Drive, Suite 101,
>         Kitchener, ON, Canada | N2C 1L3
>         This e-mail may contain information that is privileged or
>         confidential. If you are not the intended recipient, please
>         delete the e-mail and any attachments and notify us immediately.
>
>
>         On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron at redhat.com
>         <mailto:dron at redhat.com> <mailto:dron at redhat.com
>         <mailto:dron at redhat.com>>> wrote:
>
>             did you install these vm's from a cd? run it as run-once
>         with a
>             special monitor?
>             try to think if there is anything different in the
>         configuration
>             of these vm's from the other vm's that succeed to migrate?
>
>
>             On 02/17/2014 04:36 PM, Steve Dainard wrote:
>
>                 Hi Dafna,
>
>                 No snapshots of either of those VM's have been taken, and
>                 there are no updates for any of those packages on EL 6.5.
>
>                 *Steve Dainard *
>                 IT Infrastructure Manager
>                 Miovision <http://miovision.com/> | /Rethink Traffic/
>
>                 *Blog <http://miovision.com/blog> | **LinkedIn
>                
>         <https://www.linkedin.com/company/miovision-technologies>  |
>                 Twitter <https://twitter.com/miovision>  | Facebook
>                 <https://www.facebook.com/miovision>*
>                
>         ------------------------------------------------------------------------
>                 Miovision Technologies Inc. | 148 Manitou Drive, Suite
>         101,
>                 Kitchener, ON, Canada | N2C 1L3
>                 This e-mail may contain information that is privileged or
>                 confidential. If you are not the intended recipient,
>         please
>                 delete the e-mail and any attachments and notify us
>         immediately.
>
>
>                 On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron
>         <dron at redhat.com <mailto:dron at redhat.com>
>                 <mailto:dron at redhat.com <mailto:dron at redhat.com>>
>         <mailto:dron at redhat.com <mailto:dron at redhat.com>
>                 <mailto:dron at redhat.com <mailto:dron at redhat.com>>>> wrote:
>
>                     does the vm that fails migration have a live snapshot?
>                     if so how many snapshots does the vm have.
>                     I think that there are newer packages of vdsm,
>         libvirt and
>                 qemu -
>                     can you try to update
>
>
>
>                     On 02/16/2014 12:33 AM, Steve Dainard wrote:
>
>                         Versions are the same:
>
>                         [root at ovirt001 ~]# rpm -qa | egrep
>         'libvirt|vdsm|qemu'
>                 | sort
>                         gpxe-roms-qemu-0.9.7-6.10.el6.noarch
>                         libvirt-0.10.2-29.el6_5.3.x86_64
>                         libvirt-client-0.10.2-29.el6_5.3.x86_64
>                         libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
>                         libvirt-python-0.10.2-29.el6_5.3.x86_64
>                         qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
>                         qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
>                         qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
>                         vdsm-4.13.3-3.el6.x86_64
>                         vdsm-cli-4.13.3-3.el6.noarch
>                         vdsm-gluster-4.13.3-3.el6.noarch
>                         vdsm-python-4.13.3-3.el6.x86_64
>                         vdsm-xmlrpc-4.13.3-3.el6.noarch
>
>                         [root at ovirt002 ~]# rpm -qa | egrep
>         'libvirt|vdsm|qemu'
>                 | sort
>                         gpxe-roms-qemu-0.9.7-6.10.el6.noarch
>                         libvirt-0.10.2-29.el6_5.3.x86_64
>                         libvirt-client-0.10.2-29.el6_5.3.x86_64
>                         libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
>                         libvirt-python-0.10.2-29.el6_5.3.x86_64
>                         qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
>                         qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
>                         qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
>                         vdsm-4.13.3-3.el6.x86_64
>                         vdsm-cli-4.13.3-3.el6.noarch
>                         vdsm-gluster-4.13.3-3.el6.noarch
>                         vdsm-python-4.13.3-3.el6.x86_64
>                         vdsm-xmlrpc-4.13.3-3.el6.noarch
>
>                         Logs attached, thanks.
>
>                         *Steve Dainard *
>                         IT Infrastructure Manager
>                         Miovision <http://miovision.com/> | /Rethink
>         Traffic/
>
>                         *Blog <http://miovision.com/blog> | **LinkedIn
>                              
>          <https://www.linkedin.com/company/miovision-technologies>  |
>                         Twitter <https://twitter.com/miovision>  |
>         Facebook
>                         <https://www.facebook.com/miovision>*
>                              
>          ------------------------------------------------------------------------
>                         Miovision Technologies Inc. | 148 Manitou
>         Drive, Suite
>                 101,
>                         Kitchener, ON, Canada | N2C 1L3
>                         This e-mail may contain information that is
>         privileged or
>                         confidential. If you are not the intended
>         recipient,
>                 please
>                         delete the e-mail and any attachments and
>         notify us
>                 immediately.
>
>
>                         On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron
>                 <dron at redhat.com <mailto:dron at redhat.com>
>         <mailto:dron at redhat.com <mailto:dron at redhat.com>>
>                         <mailto:dron at redhat.com
>         <mailto:dron at redhat.com> <mailto:dron at redhat.com
>         <mailto:dron at redhat.com>>>
>                 <mailto:dron at redhat.com <mailto:dron at redhat.com>
>         <mailto:dron at redhat.com <mailto:dron at redhat.com>>
>
>                         <mailto:dron at redhat.com
>         <mailto:dron at redhat.com> <mailto:dron at redhat.com
>         <mailto:dron at redhat.com>>>>> wrote:
>
>                             the migration fails in libvirt:
>
>
>                             Thread-153709::ERROR::2014-02-14
>                             11:17:40,420::vm::337::vm.Vm::(run)
>          vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate
>                             Traceback (most recent call last):
>                               File "/usr/share/vdsm/vm.py", line 323,
>         in run
>                                 self._startUnderlyingMigration()
>                               File "/usr/share/vdsm/vm.py", line 403, in
>                         _startUnderlyingMigration
>                                 None, maxBandwidth)
>                               File "/usr/share/vdsm/vm.py", line 841, in f
>                                 ret = attr(*args, **kwargs)
>                               File
>         "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
>                             line 76, in wrapper
>                                 ret = f(*args, **kwargs)
>                               File
>                 "/usr/lib64/python2.6/site-packages/libvirt.py",
>                         line 1178,
>                             in migrateToURI2
>                                 if ret == -1: raise libvirtError
>                         ('virDomainMigrateToURI2()
>                             failed', dom=self)
>                             libvirtError: Unable to read from monitor:
>         Connection
>                         reset by peer
>                             Thread-54041::DEBUG::2014-02-14
>                                  
>          11:17:41,752::task::579::TaskManager.Task::(_updateState)
>          Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state
>                             init -> state preparing
>                             Thread-54041::INFO::2014-02-14
>                            
>         11:17:41,753::logUtils::44::dispatcher::(wrapper)
>                 Run and
>                         protect:
>                                  
>          getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5',
>                             spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814',
>                             imgUUID='97c9108f-a506-415f-ad2
>                             c-370d707cb130',
>                         volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836',
>                             options=None)
>
>                             Do you have the same libvirt/vdsm/qemu on both
>                 your hosts?
>                             Please attach the libvirt and vm logs from
>         both hosts.
>
>                             Thanks,
>                             Dafna
>
>
>
>                             On 02/14/2014 04:50 PM, Steve Dainard wrote:
>
>                                 Quick overview:
>                                 Ovirt 3.3.2 running on CentOS 6.5
>                                 Two hosts: ovirt001, ovirt002
>                                 Migrating two VM's: puppet-agent1,
>                 puppet-agent2 from
>                         ovirt002
>                                 to ovirt001.
>
>                                 The first VM puppet-agent1 migrates
>                 successfully. The
>                         second
>                                 VM puppet-agent2 fails with "Migration
>         failed
>                 due to
>                         Error:
>                                 Fatal error during migration (VM:
>                 puppet-agent2, Source:
>                                 ovirt002, Destination: ovirt001)."
>
>                                 I've attached the logs if anyone can
>         help me track
>                         down the issue.
>
>                                 Thanks,
>
>                                 *Steve Dainard *
>                                 IT Infrastructure Manager
>                                 Miovision <http://miovision.com/> |
>         /Rethink
>                 Traffic/
>
>                                 *Blog <http://miovision.com/blog> |
>         **LinkedIn
>                                              
>         <https://www.linkedin.com/company/miovision-technologies>  |
>                                 Twitter <https://twitter.com/miovision>  |
>                 Facebook
>                                 <https://www.facebook.com/miovision>*
>         ------------------------------------------------------------------------
>
>
>                                 Miovision Technologies Inc. | 148 Manitou
>                 Drive, Suite
>                         101,
>                                 Kitchener, ON, Canada | N2C 1L3
>                                 This e-mail may contain information
>         that is
>                 privileged or
>                                 confidential. If you are not the intended
>                 recipient,
>                         please
>                                 delete the e-mail and any attachments and
>                 notify us
>                         immediately.
>
>
>                                
>         _______________________________________________
>                                 Users mailing list
>         Users at ovirt.org <mailto:Users at ovirt.org>
>         <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
>                 <mailto:Users at ovirt.org <mailto:Users at ovirt.org>
>         <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>>
>                         <mailto:Users at ovirt.org
>         <mailto:Users at ovirt.org> <mailto:Users at ovirt.org
>         <mailto:Users at ovirt.org>>
>                 <mailto:Users at ovirt.org <mailto:Users at ovirt.org>
>         <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>>>
>
>
>         http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>                             --     Dafna Ron
>
>
>
>
>                     --     Dafna Ron
>
>
>
>
>             --     Dafna Ron
>
>
>
>
>     -- 
>     Dafna Ron
>
>


-- 
Dafna Ron



More information about the Users mailing list