[Users] Live migration of VM's occasionally fails

Steve Dainard sdainard at miovision.com
Mon Feb 17 18:52:39 UTC 2014


VM's are identical, same template, same cpu/mem/nic. Server type, thin
provisioned on NFS (backend is glusterfs 3.4).

Does monitor = spice console? I don't believe either of them had a spice
connection.

I don't see anything in the ovirt001 sanlock.log:

2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 done 0
2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4 fd 14 pid 31132
2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4 fd 14 pid 31132
flags 1
2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 ci_in 5 fd
15 count 0
2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 result 0
pid_dead 0
2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 ci_in 6 fd
16 count 0
2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 result 0
pid_dead 0
2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5 fd 15 pid 31319
2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5 fd 15 pid 31319
flags 1
2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead 5,15,31319
cmd_active 0 suspend 0
2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0
2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 done 0
2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace 5,15
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
flags 0

ovirt002 sanlock.log has on entries during that time frame.

*Steve Dainard *
IT Infrastructure Manager
Miovision <http://miovision.com/> | *Rethink Traffic*

*Blog <http://miovision.com/blog>  |  **LinkedIn
<https://www.linkedin.com/company/miovision-technologies>  |  Twitter
<https://twitter.com/miovision>  |  Facebook
<https://www.facebook.com/miovision>*
------------------------------
 Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON,
Canada | N2C 1L3
This e-mail may contain information that is privileged or confidential. If
you are not the intended recipient, please delete the e-mail and any
attachments and notify us immediately.


On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron <dron at redhat.com> wrote:

> mmm... that is very interesting...
> both vm's are identical? are they server or desktops type? created as thin
> copy or clone? what storage type are you using? did you happen to have an
> open monitor on the vm that failed migration?
> I wonder if it can be sanlock lock on the source template but I can only
> see this bug happening if the vm's are linked to the template
> can you look at the sanlock log and see if there are any warning or errors?
>
> All logs are in debug so I don't think we can get anything more from it
> but I am adding Meital and Omer to this mail to help debug this - perhaps
> they can think of something that can cause that from the trace.
>
> This case is really interesting... sorry, probably not what you want to
> hear...  thanks for helping with this :)
>
> Dafna
>
>
>
> On 02/17/2014 05:08 PM, Steve Dainard wrote:
>
>> Failed live migration is wider spread than these two VM's, but they are a
>> good example because they were both built from the same template and have
>> no modifications after they were created. They were also migrated one after
>> the other, with one successfully migrating and the other not.
>>
>> Are there any increased logging levels that might help determine what the
>> issue is?
>>
>> Thanks,
>>
>> *Steve Dainard *
>> IT Infrastructure Manager
>> Miovision <http://miovision.com/> | /Rethink Traffic/
>>
>> *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/
>> company/miovision-technologies>  | Twitter <https://twitter.com/miovision>
>>  | Facebook <https://www.facebook.com/miovision>*
>> ------------------------------------------------------------------------
>> Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener,
>> ON, Canada | N2C 1L3
>> This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.
>>
>>
>> On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron at redhat.com <mailto:
>> dron at redhat.com>> wrote:
>>
>>     did you install these vm's from a cd? run it as run-once with a
>>     special monitor?
>>     try to think if there is anything different in the configuration
>>     of these vm's from the other vm's that succeed to migrate?
>>
>>
>>     On 02/17/2014 04:36 PM, Steve Dainard wrote:
>>
>>         Hi Dafna,
>>
>>         No snapshots of either of those VM's have been taken, and
>>         there are no updates for any of those packages on EL 6.5.
>>
>>         *Steve Dainard *
>>         IT Infrastructure Manager
>>         Miovision <http://miovision.com/> | /Rethink Traffic/
>>
>>         *Blog <http://miovision.com/blog> | **LinkedIn
>>         <https://www.linkedin.com/company/miovision-technologies>  |
>>         Twitter <https://twitter.com/miovision>  | Facebook
>>         <https://www.facebook.com/miovision>*
>>         ------------------------------------------------------------
>> ------------
>>         Miovision Technologies Inc. | 148 Manitou Drive, Suite 101,
>>         Kitchener, ON, Canada | N2C 1L3
>>         This e-mail may contain information that is privileged or
>>         confidential. If you are not the intended recipient, please
>>         delete the e-mail and any attachments and notify us immediately.
>>
>>
>>         On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron at redhat.com
>>         <mailto:dron at redhat.com> <mailto:dron at redhat.com
>>         <mailto:dron at redhat.com>>> wrote:
>>
>>             does the vm that fails migration have a live snapshot?
>>             if so how many snapshots does the vm have.
>>             I think that there are newer packages of vdsm, libvirt and
>>         qemu -
>>             can you try to update
>>
>>
>>
>>             On 02/16/2014 12:33 AM, Steve Dainard wrote:
>>
>>                 Versions are the same:
>>
>>                 [root at ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu'
>>         | sort
>>                 gpxe-roms-qemu-0.9.7-6.10.el6.noarch
>>                 libvirt-0.10.2-29.el6_5.3.x86_64
>>                 libvirt-client-0.10.2-29.el6_5.3.x86_64
>>                 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
>>                 libvirt-python-0.10.2-29.el6_5.3.x86_64
>>                 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
>>                 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
>>                 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
>>                 vdsm-4.13.3-3.el6.x86_64
>>                 vdsm-cli-4.13.3-3.el6.noarch
>>                 vdsm-gluster-4.13.3-3.el6.noarch
>>                 vdsm-python-4.13.3-3.el6.x86_64
>>                 vdsm-xmlrpc-4.13.3-3.el6.noarch
>>
>>                 [root at ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu'
>>         | sort
>>                 gpxe-roms-qemu-0.9.7-6.10.el6.noarch
>>                 libvirt-0.10.2-29.el6_5.3.x86_64
>>                 libvirt-client-0.10.2-29.el6_5.3.x86_64
>>                 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
>>                 libvirt-python-0.10.2-29.el6_5.3.x86_64
>>                 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
>>                 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
>>                 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
>>                 vdsm-4.13.3-3.el6.x86_64
>>                 vdsm-cli-4.13.3-3.el6.noarch
>>                 vdsm-gluster-4.13.3-3.el6.noarch
>>                 vdsm-python-4.13.3-3.el6.x86_64
>>                 vdsm-xmlrpc-4.13.3-3.el6.noarch
>>
>>                 Logs attached, thanks.
>>
>>                 *Steve Dainard *
>>                 IT Infrastructure Manager
>>                 Miovision <http://miovision.com/> | /Rethink Traffic/
>>
>>                 *Blog <http://miovision.com/blog> | **LinkedIn
>>                        <https://www.linkedin.com/
>> company/miovision-technologies>  |
>>                 Twitter <https://twitter.com/miovision>  | Facebook
>>                 <https://www.facebook.com/miovision>*
>>                        ------------------------------
>> ------------------------------------------
>>                 Miovision Technologies Inc. | 148 Manitou Drive, Suite
>>         101,
>>                 Kitchener, ON, Canada | N2C 1L3
>>                 This e-mail may contain information that is privileged or
>>                 confidential. If you are not the intended recipient,
>>         please
>>                 delete the e-mail and any attachments and notify us
>>         immediately.
>>
>>
>>                 On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron
>>         <dron at redhat.com <mailto:dron at redhat.com>
>>                 <mailto:dron at redhat.com <mailto:dron at redhat.com>>
>>         <mailto:dron at redhat.com <mailto:dron at redhat.com>
>>
>>                 <mailto:dron at redhat.com <mailto:dron at redhat.com>>>>
>> wrote:
>>
>>                     the migration fails in libvirt:
>>
>>
>>                     Thread-153709::ERROR::2014-02-14
>>                     11:17:40,420::vm::337::vm.Vm::(run)
>>                            vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed
>> to migrate
>>                     Traceback (most recent call last):
>>                       File "/usr/share/vdsm/vm.py", line 323, in run
>>                         self._startUnderlyingMigration()
>>                       File "/usr/share/vdsm/vm.py", line 403, in
>>                 _startUnderlyingMigration
>>                         None, maxBandwidth)
>>                       File "/usr/share/vdsm/vm.py", line 841, in f
>>                         ret = attr(*args, **kwargs)
>>                       File
>>                                   "/usr/lib64/python2.6/site-
>> packages/vdsm/libvirtconnection.py",
>>                     line 76, in wrapper
>>                         ret = f(*args, **kwargs)
>>                       File
>>         "/usr/lib64/python2.6/site-packages/libvirt.py",
>>                 line 1178,
>>                     in migrateToURI2
>>                         if ret == -1: raise libvirtError
>>                 ('virDomainMigrateToURI2()
>>                     failed', dom=self)
>>                     libvirtError: Unable to read from monitor: Connection
>>                 reset by peer
>>                     Thread-54041::DEBUG::2014-02-14
>>                            11:17:41,752::task::579::TaskManager.Task::(_
>> updateState)
>>                            Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving
>> from state
>>                     init -> state preparing
>>                     Thread-54041::INFO::2014-02-14
>>                     11:17:41,753::logUtils::44::dispatcher::(wrapper)
>>         Run and
>>                 protect:
>>                            getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-
>> 0c78d14999e5',
>>                     spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814',
>>                     imgUUID='97c9108f-a506-415f-ad2
>>                     c-370d707cb130',
>>                 volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836',
>>                     options=None)
>>
>>                     Do you have the same libvirt/vdsm/qemu on both
>>         your hosts?
>>                     Please attach the libvirt and vm logs from both hosts.
>>
>>                     Thanks,
>>                     Dafna
>>
>>
>>
>>                     On 02/14/2014 04:50 PM, Steve Dainard wrote:
>>
>>                         Quick overview:
>>                         Ovirt 3.3.2 running on CentOS 6.5
>>                         Two hosts: ovirt001, ovirt002
>>                         Migrating two VM's: puppet-agent1,
>>         puppet-agent2 from
>>                 ovirt002
>>                         to ovirt001.
>>
>>                         The first VM puppet-agent1 migrates
>>         successfully. The
>>                 second
>>                         VM puppet-agent2 fails with "Migration failed
>>         due to
>>                 Error:
>>                         Fatal error during migration (VM:
>>         puppet-agent2, Source:
>>                         ovirt002, Destination: ovirt001)."
>>
>>                         I've attached the logs if anyone can help me track
>>                 down the issue.
>>
>>                         Thanks,
>>
>>                         *Steve Dainard *
>>                         IT Infrastructure Manager
>>                         Miovision <http://miovision.com/> | /Rethink
>>         Traffic/
>>
>>                         *Blog <http://miovision.com/blog> | **LinkedIn
>>                                       <https://www.linkedin.com/
>> company/miovision-technologies>  |
>>                         Twitter <https://twitter.com/miovision>  |
>>         Facebook
>>                         <https://www.facebook.com/miovision>*
>>                                       ------------------------------
>> ------------------------------------------
>>
>>
>>                         Miovision Technologies Inc. | 148 Manitou
>>         Drive, Suite
>>                 101,
>>                         Kitchener, ON, Canada | N2C 1L3
>>                         This e-mail may contain information that is
>>         privileged or
>>                         confidential. If you are not the intended
>>         recipient,
>>                 please
>>                         delete the e-mail and any attachments and
>>         notify us
>>                 immediately.
>>
>>
>>                         _______________________________________________
>>                         Users mailing list
>>         Users at ovirt.org <mailto:Users at ovirt.org>
>>         <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
>>                 <mailto:Users at ovirt.org <mailto:Users at ovirt.org>
>>         <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>>
>>
>>
>>         http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>>                     --     Dafna Ron
>>
>>
>>
>>
>>             --     Dafna Ron
>>
>>
>>
>>
>>     --     Dafna Ron
>>
>>
>>
>
> --
> Dafna Ron
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140217/d19e23ee/attachment-0001.html>


More information about the Users mailing list