[Users] Live migration of VM's occasionally fails

Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001. The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)." I've attached the logs if anyone can help me track down the issue. Thanks, *Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | *Rethink Traffic* *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.

the migration fails in libvirt: Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579::TaskManager.Task::(_updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None) Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts. Thanks, Dafna On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron

Versions are the same: [root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch [root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch Logs attached, thanks. *Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | *Rethink Traffic* *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately. On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579:: TaskManager.Task::(_updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron

does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579::TaskManager.Task::(_updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron

Hi Dafna, No snapshots of either of those VM's have been taken, and there are no updates for any of those packages on EL 6.5. *Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | *Rethink Traffic* *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately. On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron@redhat.com> wrote:
does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update
On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto: dron@redhat.com>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579::TaskManager.Task::(_updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------ ------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron

did you install these vm's from a cd? run it as run-once with a special monitor? try to think if there is anything different in the configuration of these vm's from the other vm's that succeed to migrate? On 02/17/2014 04:36 PM, Steve Dainard wrote:
Hi Dafna,
No snapshots of either of those VM's have been taken, and there are no updates for any of those packages on EL 6.5.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com>> wrote:
does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update
On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579::TaskManager.Task::(_updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>*
------------------------------------------------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>
http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron

Failed live migration is wider spread than these two VM's, but they are a good example because they were both built from the same template and have no modifications after they were created. They were also migrated one after the other, with one successfully migrating and the other not. Are there any increased logging levels that might help determine what the issue is? Thanks, *Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | *Rethink Traffic* *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately. On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron@redhat.com> wrote:
did you install these vm's from a cd? run it as run-once with a special monitor? try to think if there is anything different in the configuration of these vm's from the other vm's that succeed to migrate?
On 02/17/2014 04:36 PM, Steve Dainard wrote:
Hi Dafna,
No snapshots of either of those VM's have been taken, and there are no updates for any of those packages on EL 6.5.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron@redhat.com <mailto: dron@redhat.com>> wrote:
does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update
On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------ ------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com
<mailto:dron@redhat.com>>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/ libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579::TaskManager.Task::(_updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ ------------------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>
http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron

mmm... that is very interesting... both vm's are identical? are they server or desktops type? created as thin copy or clone? what storage type are you using? did you happen to have an open monitor on the vm that failed migration? I wonder if it can be sanlock lock on the source template but I can only see this bug happening if the vm's are linked to the template can you look at the sanlock log and see if there are any warning or errors? All logs are in debug so I don't think we can get anything more from it but I am adding Meital and Omer to this mail to help debug this - perhaps they can think of something that can cause that from the trace. This case is really interesting... sorry, probably not what you want to hear... thanks for helping with this :) Dafna On 02/17/2014 05:08 PM, Steve Dainard wrote:
Failed live migration is wider spread than these two VM's, but they are a good example because they were both built from the same template and have no modifications after they were created. They were also migrated one after the other, with one successfully migrating and the other not.
Are there any increased logging levels that might help determine what the issue is?
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com>> wrote:
did you install these vm's from a cd? run it as run-once with a special monitor? try to think if there is anything different in the configuration of these vm's from the other vm's that succeed to migrate?
On 02/17/2014 04:36 PM, Steve Dainard wrote:
Hi Dafna,
No snapshots of either of those VM's have been taken, and there are no updates for any of those packages on EL 6.5.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> wrote:
does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update
On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>*
------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com>
<mailto:dron@redhat.com <mailto:dron@redhat.com>>>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run)
vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14
11:17:41,752::task::579::TaskManager.Task::(_updateState)
Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect:
getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>*
------------------------------------------------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>>
http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron

VM's are identical, same template, same cpu/mem/nic. Server type, thin provisioned on NFS (backend is glusterfs 3.4). Does monitor = spice console? I don't believe either of them had a spice connection. I don't see anything in the ovirt001 sanlock.log: 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4 fd 14 pid 31132 2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4 fd 14 pid 31132 flags 1 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 ci_in 5 fd 15 count 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 ci_in 6 fd 16 count 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5 fd 15 pid 31319 2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5 fd 15 pid 31319 flags 1 2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead 5,15,31319 cmd_active 0 suspend 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 ovirt002 sanlock.log has on entries during that time frame. *Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | *Rethink Traffic* *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately. On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron <dron@redhat.com> wrote:
mmm... that is very interesting... both vm's are identical? are they server or desktops type? created as thin copy or clone? what storage type are you using? did you happen to have an open monitor on the vm that failed migration? I wonder if it can be sanlock lock on the source template but I can only see this bug happening if the vm's are linked to the template can you look at the sanlock log and see if there are any warning or errors?
All logs are in debug so I don't think we can get anything more from it but I am adding Meital and Omer to this mail to help debug this - perhaps they can think of something that can cause that from the trace.
This case is really interesting... sorry, probably not what you want to hear... thanks for helping with this :)
Dafna
On 02/17/2014 05:08 PM, Steve Dainard wrote:
Failed live migration is wider spread than these two VM's, but they are a good example because they were both built from the same template and have no modifications after they were created. They were also migrated one after the other, with one successfully migrating and the other not.
Are there any increased logging levels that might help determine what the issue is?
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron@redhat.com <mailto: dron@redhat.com>> wrote:
did you install these vm's from a cd? run it as run-once with a special monitor? try to think if there is anything different in the configuration of these vm's from the other vm's that succeed to migrate?
On 02/17/2014 04:36 PM, Steve Dainard wrote:
Hi Dafna,
No snapshots of either of those VM's have been taken, and there are no updates for any of those packages on EL 6.5.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------ ------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> wrote:
does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update
On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ ------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com>
<mailto:dron@redhat.com <mailto:dron@redhat.com>>>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site- packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579::TaskManager.Task::(_ updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2- 0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ ------------------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>>
http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron

really interesting case :) maybe gluster related? Elad, can you please try to reproduce this? gluster storage -> at least two vm's server type created from template as thin provision (it's clone copy). after create run and migrate all vm's from one host to the second host. I think it would be a locking issue. Steve, can you please also check the sanlock log in the second host + look if there are any errors in the gluster logs (on both hosts)? Thanks, Dafna On 02/17/2014 06:52 PM, Steve Dainard wrote:
VM's are identical, same template, same cpu/mem/nic. Server type, thin provisioned on NFS (backend is glusterfs 3.4).
Does monitor = spice console? I don't believe either of them had a spice connection.
I don't see anything in the ovirt001 sanlock.log:
2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4 fd 14 pid 31132 2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4 fd 14 pid 31132 flags 1 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 ci_in 5 fd 15 count 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 ci_in 6 fd 16 count 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5 fd 15 pid 31319 2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5 fd 15 pid 31319 flags 1 2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead 5,15,31319 cmd_active 0 suspend 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0
ovirt002 sanlock.log has on entries during that time frame.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com>> wrote:
mmm... that is very interesting... both vm's are identical? are they server or desktops type? created as thin copy or clone? what storage type are you using? did you happen to have an open monitor on the vm that failed migration? I wonder if it can be sanlock lock on the source template but I can only see this bug happening if the vm's are linked to the template can you look at the sanlock log and see if there are any warning or errors?
All logs are in debug so I don't think we can get anything more from it but I am adding Meital and Omer to this mail to help debug this - perhaps they can think of something that can cause that from the trace.
This case is really interesting... sorry, probably not what you want to hear... thanks for helping with this :)
Dafna
On 02/17/2014 05:08 PM, Steve Dainard wrote:
Failed live migration is wider spread than these two VM's, but they are a good example because they were both built from the same template and have no modifications after they were created. They were also migrated one after the other, with one successfully migrating and the other not.
Are there any increased logging levels that might help determine what the issue is?
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> wrote:
did you install these vm's from a cd? run it as run-once with a special monitor? try to think if there is anything different in the configuration of these vm's from the other vm's that succeed to migrate?
On 02/17/2014 04:36 PM, Steve Dainard wrote:
Hi Dafna,
No snapshots of either of those VM's have been taken, and there are no updates for any of those packages on EL 6.5.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>*
------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>> wrote:
does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update
On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>*
------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>
<mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14
11:17:41,752::task::579::TaskManager.Task::(_updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14
11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect:
getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>>>
http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron

sanlock.log on the second host (ovirt002) doesn't have any entries anywhere near that time of failure. I see some heal-failed errors in gluster, but seeing as the storage is exposed via NFS I'm surprised to think this might be the issue. I'm working on fixing those files now, I'll update if I make any progress. *Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | *Rethink Traffic* *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately. On Mon, Feb 17, 2014 at 5:32 PM, Dafna Ron <dron@redhat.com> wrote:
really interesting case :) maybe gluster related? Elad, can you please try to reproduce this? gluster storage -> at least two vm's server type created from template as thin provision (it's clone copy). after create run and migrate all vm's from one host to the second host. I think it would be a locking issue.
Steve, can you please also check the sanlock log in the second host + look if there are any errors in the gluster logs (on both hosts)?
Thanks,
Dafna
On 02/17/2014 06:52 PM, Steve Dainard wrote:
VM's are identical, same template, same cpu/mem/nic. Server type, thin provisioned on NFS (backend is glusterfs 3.4).
Does monitor = spice console? I don't believe either of them had a spice connection.
I don't see anything in the ovirt001 sanlock.log:
2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4 fd 14 pid 31132 2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4 fd 14 pid 31132 flags 1 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 ci_in 5 fd 15 count 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 ci_in 6 fd 16 count 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5 fd 15 pid 31319 2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5 fd 15 pid 31319 flags 1 2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead 5,15,31319 cmd_active 0 suspend 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0
ovirt002 sanlock.log has on entries during that time frame.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron <dron@redhat.com <mailto: dron@redhat.com>> wrote:
mmm... that is very interesting... both vm's are identical? are they server or desktops type? created as thin copy or clone? what storage type are you using? did you happen to have an open monitor on the vm that failed migration? I wonder if it can be sanlock lock on the source template but I can only see this bug happening if the vm's are linked to the template can you look at the sanlock log and see if there are any warning or errors?
All logs are in debug so I don't think we can get anything more from it but I am adding Meital and Omer to this mail to help debug this - perhaps they can think of something that can cause that from the trace.
This case is really interesting... sorry, probably not what you want to hear... thanks for helping with this :)
Dafna
On 02/17/2014 05:08 PM, Steve Dainard wrote:
Failed live migration is wider spread than these two VM's, but they are a good example because they were both built from the same template and have no modifications after they were created. They were also migrated one after the other, with one successfully migrating and the other not.
Are there any increased logging levels that might help determine what the issue is?
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------ ------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> wrote:
did you install these vm's from a cd? run it as run-once with a special monitor? try to think if there is anything different in the configuration of these vm's from the other vm's that succeed to migrate?
On 02/17/2014 04:36 PM, Steve Dainard wrote:
Hi Dafna,
No snapshots of either of those VM's have been taken, and there are no updates for any of those packages on EL 6.5.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ ------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>> wrote:
does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update
On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ ------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>
<mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579:: TaskManager.Task::(_updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44:: dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID=' a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972- 94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn < https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------ ------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
______________________________ _________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>>>
http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron

I added another vlan on both hosts, and designated it a migration network. Still the same issue, failed to migrate one of the two VM's. I then deleted a failed posix domain on another gluster volume with some heal tasks pending, with no hosts attached to it, and the VM's migrated successfully. Perhaps gluster isn't passing storage errors up properly for non-dependent volumes. Anyways this is solved for now, just wanted this here for posterity. *Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | *Rethink Traffic* *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately. On Tue, Feb 18, 2014 at 5:00 PM, Steve Dainard <sdainard@miovision.com>wrote:
sanlock.log on the second host (ovirt002) doesn't have any entries anywhere near that time of failure.
I see some heal-failed errors in gluster, but seeing as the storage is exposed via NFS I'm surprised to think this might be the issue. I'm working on fixing those files now, I'll update if I make any progress.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | *Rethink Traffic*
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 5:32 PM, Dafna Ron <dron@redhat.com> wrote:
really interesting case :) maybe gluster related? Elad, can you please try to reproduce this? gluster storage -> at least two vm's server type created from template as thin provision (it's clone copy). after create run and migrate all vm's from one host to the second host. I think it would be a locking issue.
Steve, can you please also check the sanlock log in the second host + look if there are any errors in the gluster logs (on both hosts)?
Thanks,
Dafna
On 02/17/2014 06:52 PM, Steve Dainard wrote:
VM's are identical, same template, same cpu/mem/nic. Server type, thin provisioned on NFS (backend is glusterfs 3.4).
Does monitor = spice console? I don't believe either of them had a spice connection.
I don't see anything in the ovirt001 sanlock.log:
2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4 fd 14 pid 31132 2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4 fd 14 pid 31132 flags 1 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 ci_in 5 fd 15 count 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 ci_in 6 fd 16 count 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5 fd 15 pid 31319 2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5 fd 15 pid 31319 flags 1 2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead 5,15,31319 cmd_active 0 suspend 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data- center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0
ovirt002 sanlock.log has on entries during that time frame.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn < https://www.linkedin.com/company/miovision-technologies> | Twitter < https://twitter.com/miovision> | Facebook <https://www.facebook.com/ miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron <dron@redhat.com <mailto: dron@redhat.com>> wrote:
mmm... that is very interesting... both vm's are identical? are they server or desktops type? created as thin copy or clone? what storage type are you using? did you happen to have an open monitor on the vm that failed migration? I wonder if it can be sanlock lock on the source template but I can only see this bug happening if the vm's are linked to the template can you look at the sanlock log and see if there are any warning or errors?
All logs are in debug so I don't think we can get anything more from it but I am adding Meital and Omer to this mail to help debug this - perhaps they can think of something that can cause that from the trace.
This case is really interesting... sorry, probably not what you want to hear... thanks for helping with this :)
Dafna
On 02/17/2014 05:08 PM, Steve Dainard wrote:
Failed live migration is wider spread than these two VM's, but they are a good example because they were both built from the same template and have no modifications after they were created. They were also migrated one after the other, with one successfully migrating and the other not.
Are there any increased logging levels that might help determine what the issue is?
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------ ------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> wrote:
did you install these vm's from a cd? run it as run-once with a special monitor? try to think if there is anything different in the configuration of these vm's from the other vm's that succeed to migrate?
On 02/17/2014 04:36 PM, Steve Dainard wrote:
Hi Dafna,
No snapshots of either of those VM's have been taken, and there are no updates for any of those packages on EL 6.5.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ ------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>> wrote:
does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update
On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/ company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ ------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>
<mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run) vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579:: TaskManager.Task::(_updateState) Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44:: dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID=' a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972- 94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn < https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------ ------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
______________________________ _________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>>>
http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron

Thanks Steve! I am still perplexed how come one out of two identical vm's would fail migration... logic states that they should both fail. On 02/19/2014 06:29 PM, Steve Dainard wrote:
I added another vlan on both hosts, and designated it a migration network. Still the same issue, failed to migrate one of the two VM's.
I then deleted a failed posix domain on another gluster volume with some heal tasks pending, with no hosts attached to it, and the VM's migrated successfully. Perhaps gluster isn't passing storage errors up properly for non-dependent volumes. Anyways this is solved for now, just wanted this here for posterity.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Tue, Feb 18, 2014 at 5:00 PM, Steve Dainard <sdainard@miovision.com <mailto:sdainard@miovision.com>> wrote:
sanlock.log on the second host (ovirt002) doesn't have any entries anywhere near that time of failure.
I see some heal-failed errors in gluster, but seeing as the storage is exposed via NFS I'm surprised to think this might be the issue. I'm working on fixing those files now, I'll update if I make any progress.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 5:32 PM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com>> wrote:
really interesting case :) maybe gluster related? Elad, can you please try to reproduce this? gluster storage -> at least two vm's server type created from template as thin provision (it's clone copy). after create run and migrate all vm's from one host to the second host. I think it would be a locking issue.
Steve, can you please also check the sanlock log in the second host + look if there are any errors in the gluster logs (on both hosts)?
Thanks,
Dafna
On 02/17/2014 06:52 PM, Steve Dainard wrote:
VM's are identical, same template, same cpu/mem/nic. Server type, thin provisioned on NFS (backend is glusterfs 3.4).
Does monitor = spice console? I don't believe either of them had a spice connection.
I don't see anything in the ovirt001 sanlock.log:
2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace 4,14 done 0 2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4 fd 14 pid 31132 2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4 fd 14 pid 31132 flags 1 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 ci_in 5 fd 15 count 0 2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 ci_in 6 fd 16 count 0 2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire 4,14,31132 result 0 pid_dead 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5 fd 15 pid 31319 2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5 fd 15 pid 31319 flags 1 2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead 5,15,31319 cmd_active 0 suspend 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0 2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace 5,15 done 0 2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace 5,15 a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0 flags 0
ovirt002 sanlock.log has on entries during that time frame.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> wrote:
mmm... that is very interesting... both vm's are identical? are they server or desktops type? created as thin copy or clone? what storage type are you using? did you happen to have an open monitor on the vm that failed migration? I wonder if it can be sanlock lock on the source template but I can only see this bug happening if the vm's are linked to the template can you look at the sanlock log and see if there are any warning or errors?
All logs are in debug so I don't think we can get anything more from it but I am adding Meital and Omer to this mail to help debug this - perhaps they can think of something that can cause that from the trace.
This case is really interesting... sorry, probably not what you want to hear... thanks for helping with this :)
Dafna
On 02/17/2014 05:08 PM, Steve Dainard wrote:
Failed live migration is wider spread than these two VM's, but they are a good example because they were both built from the same template and have no modifications after they were created. They were also migrated one after the other, with one successfully migrating and the other not.
Are there any increased logging levels that might help determine what the issue is?
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>*
------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>> wrote:
did you install these vm's from a cd? run it as run-once with a special monitor? try to think if there is anything different in the configuration of these vm's from the other vm's that succeed to migrate?
On 02/17/2014 04:36 PM, Steve Dainard wrote:
Hi Dafna,
No snapshots of either of those VM's have been taken, and there are no updates for any of those packages on EL 6.5.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>>> wrote:
does the vm that fails migration have a live snapshot? if so how many snapshots does the vm have. I think that there are newer packages of vdsm, libvirt and qemu - can you try to update
On 02/16/2014 12:33 AM, Steve Dainard wrote:
Versions are the same:
[root@ovirt001 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
[root@ovirt002 ~]# rpm -qa | egrep 'libvirt|vdsm|qemu' | sort gpxe-roms-qemu-0.9.7-6.10.el6.noarch libvirt-0.10.2-29.el6_5.3.x86_64 libvirt-client-0.10.2-29.el6_5.3.x86_64 libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64 libvirt-python-0.10.2-29.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 vdsm-4.13.3-3.el6.x86_64 vdsm-cli-4.13.3-3.el6.noarch vdsm-gluster-4.13.3-3.el6.noarch vdsm-python-4.13.3-3.el6.x86_64 vdsm-xmlrpc-4.13.3-3.el6.noarch
Logs attached, thanks.
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------------------------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
On Sat, Feb 15, 2014 at 6:24 AM, Dafna Ron <dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>
<mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>> <mailto:dron@redhat.com <mailto:dron@redhat.com> <mailto:dron@redhat.com <mailto:dron@redhat.com>>>>>> wrote:
the migration fails in libvirt:
Thread-153709::ERROR::2014-02-14 11:17:40,420::vm::337::vm.Vm::(run)
vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 323, in run
self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 841, in f ret = attr(*args, **kwargs) File
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File
"/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer Thread-54041::DEBUG::2014-02-14 11:17:41,752::task::579::TaskManager.Task::(_updateState)
Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from state init -> state preparing Thread-54041::INFO::2014-02-14 11:17:41,753::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5', spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814', imgUUID='97c9108f-a506-415f-ad2 c-370d707cb130', volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836', options=None)
Do you have the same libvirt/vdsm/qemu on both your hosts? Please attach the libvirt and vm logs from both hosts.
Thanks, Dafna
On 02/14/2014 04:50 PM, Steve Dainard wrote:
Quick overview: Ovirt 3.3.2 running on CentOS 6.5 Two hosts: ovirt001, ovirt002 Migrating two VM's: puppet-agent1, puppet-agent2 from ovirt002 to ovirt001.
The first VM puppet-agent1 migrates successfully. The second VM puppet-agent2 fails with "Migration failed due to Error: Fatal error during migration (VM: puppet-agent2, Source: ovirt002, Destination: ovirt001)."
I've attached the logs if anyone can help me track down the issue.
Thanks,
*Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | /Rethink Traffic/
*Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook
<https://www.facebook.com/miovision>*
------------------------------------------------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>> <mailto:Users@ovirt.org <mailto:Users@ovirt.org> <mailto:Users@ovirt.org <mailto:Users@ovirt.org>>>>>
http://lists.ovirt.org/mailman/listinfo/users
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
-- Dafna Ron
participants (2)
-
Dafna Ron
-
Steve Dainard