[Users] migrations failing with latest master + vdsm

VM migrations are failing with latest master engine and vdsm logs attached from engine and both hosts Hosts are EL 6.4 with latest master VDSM Additionally the hosts seem to lose connection with their storage domains (new behavior), are offline then recovered (even though not is physically wrong). - DHC

On Fri, Oct 11, 2013 at 03:30:35PM -0500, Dead Horse wrote:
VM migrations are failing with latest master engine and vdsm
logs attached from engine and both hosts
Hosts are EL 6.4 with latest master VDSM
Thanks for your report! Thread-195::ERROR::2013-10-11 15:22:39,508::vm::304::vm.Vm::(run) vmId=`4bad94ad-c338-4ec5-8e5b-9910d58c1854`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 291, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 369, in _startUnderlyingMigration self._abortOnError else 0), AttributeError: 'module' object has no attribute 'VIR_MIGRATE_ABORT_ON_ERROR' Peter, Michal, I think that VIR_MIGRATE_ABORT_ON_ERROR is expected only in el6.5, which is still not public Bug 972675 - Fail migration when VM get paused due to EIO This must be reverted in vdsm or hacked in Engine (do not set abortOnError=True if libvirt < libvirt-0.10.2-20.el6)
Additionally the hosts seem to lose connection with their storage domains (new behavior), are offline then recovered (even though not is physically wrong).

On Oct 12, 2013, at 00:30 , Dan Kenigsberg <danken@redhat.com> wrote:
On Fri, Oct 11, 2013 at 03:30:35PM -0500, Dead Horse wrote:
VM migrations are failing with latest master engine and vdsm
logs attached from engine and both hosts
Hosts are EL 6.4 with latest master VDSM
Thanks for your report!
Thread-195::ERROR::2013-10-11 15:22:39,508::vm::304::vm.Vm::(run) vmId=`4bad94ad-c338-4ec5-8e5b-9910d58c1854`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 291, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 369, in _startUnderlyingMigration self._abortOnError else 0), AttributeError: 'module' object has no attribute 'VIR_MIGRATE_ABORT_ON_ERROR'
Peter, Michal, I think that VIR_MIGRATE_ABORT_ON_ERROR is expected only in el6.5, which is still not public Bug 972675 - Fail migration when VM get paused due to EIO
This must be reverted in vdsm or hacked in Engine (do not set abortOnError=True if libvirt < libvirt-0.10.2-20.el6)
as of http://gerrit.ovirt.org/#/c/19312/ the flag is sent for 3.3 clusters, which is correct I thought you're not supposed to have EL 6.4 host in 3.3 cluster. 6.5 should work.. Thanks, michal
Additionally the hosts seem to lose connection with their storage domains (new behavior), are offline then recovered (even though not is physically wrong).

I have been running EL 6.4 hosts in 3.3 mode for quite some time, I only noticed this breakage in the latest master VDSM 4.13.x. Tagged vdsm versions: ovirt-3.3.0<http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/heads/ovirt-3.3.0>and ovirt-3.3<http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/heads/ovirt-3.3>do work (if not using master which one of these should be used with 3.3 btw?) The running version of libvirt on the hosts is: libvirt-0.10.2-18.0.1.el6_4.14.x86_64 - DHC On Tue, Oct 15, 2013 at 9:05 AM, Michal Skrivanek <mskrivan@redhat.com>wrote:
On Oct 12, 2013, at 00:30 , Dan Kenigsberg <danken@redhat.com> wrote:
On Fri, Oct 11, 2013 at 03:30:35PM -0500, Dead Horse wrote:
VM migrations are failing with latest master engine and vdsm
logs attached from engine and both hosts
Hosts are EL 6.4 with latest master VDSM
Thanks for your report!
Thread-195::ERROR::2013-10-11 15:22:39,508::vm::304::vm.Vm::(run) vmId=`4bad94ad-c338-4ec5-8e5b-9910d58c1854`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 291, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 369, in _startUnderlyingMigration self._abortOnError else 0), AttributeError: 'module' object has no attribute 'VIR_MIGRATE_ABORT_ON_ERROR'
Peter, Michal, I think that VIR_MIGRATE_ABORT_ON_ERROR is expected only in el6.5, which is still not public Bug 972675 - Fail migration when VM get paused due to EIO
This must be reverted in vdsm or hacked in Engine (do not set abortOnError=True if libvirt < libvirt-0.10.2-20.el6)
as of http://gerrit.ovirt.org/#/c/19312/ the flag is sent for 3.3 clusters, which is correct I thought you're not supposed to have EL 6.4 host in 3.3 cluster. 6.5 should work..
Thanks, michal
Additionally the hosts seem to lose connection with their storage
(new behavior), are offline then recovered (even though not is
domains physically
wrong).

On Oct 15, 2013, at 16:36 , Dead Horse <deadhorseconsulting@gmail.com> wrote:
I have been running EL 6.4 hosts in 3.3 mode for quite some time, I only noticed this breakage in the latest master VDSM 4.13.x. Tagged vdsm versions: ovirt-3.3.0 and ovirt-3.3 do work (if not using master which one of these should be used with 3.3 btw?)
it is correct I suppose as a workaround you can use engine-config and set AbortMigrationOnError to false for now… the default should change to false till EL 6.5 comes out, I guess… Thanks, michal
The running version of libvirt on the hosts is: libvirt-0.10.2-18.0.1.el6_4.14.x86_64
- DHC
On Tue, Oct 15, 2013 at 9:05 AM, Michal Skrivanek <mskrivan@redhat.com> wrote:
On Oct 12, 2013, at 00:30 , Dan Kenigsberg <danken@redhat.com> wrote:
On Fri, Oct 11, 2013 at 03:30:35PM -0500, Dead Horse wrote:
VM migrations are failing with latest master engine and vdsm
logs attached from engine and both hosts
Hosts are EL 6.4 with latest master VDSM
Thanks for your report!
Thread-195::ERROR::2013-10-11 15:22:39,508::vm::304::vm.Vm::(run) vmId=`4bad94ad-c338-4ec5-8e5b-9910d58c1854`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 291, in run self._startUnderlyingMigration() File "/usr/share/vdsm/vm.py", line 369, in _startUnderlyingMigration self._abortOnError else 0), AttributeError: 'module' object has no attribute 'VIR_MIGRATE_ABORT_ON_ERROR'
Peter, Michal, I think that VIR_MIGRATE_ABORT_ON_ERROR is expected only in el6.5, which is still not public Bug 972675 - Fail migration when VM get paused due to EIO
This must be reverted in vdsm or hacked in Engine (do not set abortOnError=True if libvirt < libvirt-0.10.2-20.el6)
as of http://gerrit.ovirt.org/#/c/19312/ the flag is sent for 3.3 clusters, which is correct I thought you're not supposed to have EL 6.4 host in 3.3 cluster. 6.5 should work..
Thanks, michal
Additionally the hosts seem to lose connection with their storage domains (new behavior), are offline then recovered (even though not is physically wrong).

On Tue, Oct 15, 2013 at 04:42:11PM +0200, Michal Skrivanek wrote:
On Oct 15, 2013, at 16:36 , Dead Horse <deadhorseconsulting@gmail.com> wrote:
I have been running EL 6.4 hosts in 3.3 mode for quite some time, I only noticed this breakage in the latest master VDSM 4.13.x. Tagged vdsm versions: ovirt-3.3.0 and ovirt-3.3 do work (if not using master which one of these should be used with 3.3 btw?)
it is correct I suppose as a workaround you can use engine-config and set AbortMigrationOnError to false for now… the default should change to false till EL 6.5 comes out, I guess…
Global configurable are evil, but yes. Please set it to False by default, until we can require a libvirt version that has it. Dan.

I just took a look through the vdc_options keys and I only found key entries of "AbortMigrationOnError" for cluster levels 3.0, 3.1, and 3.2 but not 3.3. - DHC On Tue, Oct 15, 2013 at 10:55 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Oct 15, 2013 at 04:42:11PM +0200, Michal Skrivanek wrote:
On Oct 15, 2013, at 16:36 , Dead Horse <deadhorseconsulting@gmail.com>
wrote:
I have been running EL 6.4 hosts in 3.3 mode for quite some time, I
only noticed this breakage in the latest master VDSM 4.13.x. Tagged vdsm versions: ovirt-3.3.0 and ovirt-3.3 do work (if not using master which one of these should be used with 3.3 btw?)
it is correct I suppose as a workaround you can use engine-config and set
AbortMigrationOnError to false for now…
the default should change to false till EL 6.5 comes out, I guess…
Global configurable are evil, but yes. Please set it to False by default, until we can require a libvirt version that has it.
Dan.

--Apple-Mail-B5D6075B-ED03-4C00-A2E0-515CD9178B4C Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: base64 DQoNCk9uIDE1IE9jdCAyMDEzLCBhdCAyMjoyMiwgRGVhZCBIb3JzZSA8ZGVhZGhvcnNlY29uc3Vs dGluZ0BnbWFpbC5jb20+IHdyb3RlOg0KDQo+IEkganVzdCB0b29rIGEgbG9vayB0aHJvdWdoIHRo ZSB2ZGNfb3B0aW9ucyBrZXlzIGFuZCBJIG9ubHkgZm91bmQga2V5IGVudHJpZXMgb2YgIkFib3J0 TWlncmF0aW9uT25FcnJvciIgZm9yIGNsdXN0ZXIgbGV2ZWxzIDMuMCwgMy4xLCBhbmQgMy4yIGJ1 dCBub3QgMy4zLiANCg0KWWVzLCBzZXQgYSBuZXcgb25lIGZvciAzLjMgdG8gZmFsc2UNCg0KPiAN Cj4gLSBESEMNCj4gDQo+IA0KPiBPbiBUdWUsIE9jdCAxNSwgMjAxMyBhdCAxMDo1NSBBTSwgRGFu IEtlbmlnc2JlcmcgPGRhbmtlbkByZWRoYXQuY29tPiB3cm90ZToNCj4+IE9uIFR1ZSwgT2N0IDE1 LCAyMDEzIGF0IDA0OjQyOjExUE0gKzAyMDAsIE1pY2hhbCBTa3JpdmFuZWsgd3JvdGU6DQo+PiA+ DQo+PiA+IE9uIE9jdCAxNSwgMjAxMywgYXQgMTY6MzYgLCBEZWFkIEhvcnNlIDxkZWFkaG9yc2Vj b25zdWx0aW5nQGdtYWlsLmNvbT4gd3JvdGU6DQo+PiA+DQo+PiA+ID4gSSBoYXZlIGJlZW4gcnVu bmluZyBFTCA2LjQgaG9zdHMgaW4gMy4zIG1vZGUgZm9yIHF1aXRlIHNvbWUgdGltZSwgSSBvbmx5 IG5vdGljZWQgdGhpcyBicmVha2FnZSBpbiB0aGUgbGF0ZXN0IG1hc3RlciBWRFNNIDQuMTMueC4g VGFnZ2VkIHZkc20gdmVyc2lvbnM6IG92aXJ0LTMuMy4wIGFuZCBvdmlydC0zLjMgZG8gd29yayAo aWYgbm90IHVzaW5nIG1hc3RlciB3aGljaCBvbmUgb2YgdGhlc2Ugc2hvdWxkIGJlIHVzZWQgd2l0 aCAzLjMgYnR3PykNCj4+ID4NCj4+ID4gaXQgaXMgY29ycmVjdA0KPj4gPiBJIHN1cHBvc2UgYXMg YSB3b3JrYXJvdW5kIHlvdSBjYW4gdXNlIGVuZ2luZS1jb25maWcgYW5kIHNldCBBYm9ydE1pZ3Jh dGlvbk9uRXJyb3IgdG8gZmFsc2UgZm9yIG5vd+KApg0KPj4gPiB0aGUgZGVmYXVsdCBzaG91bGQg Y2hhbmdlIHRvIGZhbHNlIHRpbGwgRUwgNi41IGNvbWVzIG91dCwgSSBndWVzc+KApg0KPj4gDQo+ PiBHbG9iYWwgY29uZmlndXJhYmxlIGFyZSBldmlsLCBidXQgeWVzLiBQbGVhc2Ugc2V0IGl0IHRv IEZhbHNlIGJ5DQo+PiBkZWZhdWx0LCB1bnRpbCB3ZSBjYW4gcmVxdWlyZSBhIGxpYnZpcnQgdmVy c2lvbiB0aGF0IGhhcyBpdC4NCj4+IA0KPj4gRGFuLg0KPiANCg== --Apple-Mail-B5D6075B-ED03-4C00-A2E0-515CD9178B4C Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: base64 PGh0bWw+PGhlYWQ+PG1ldGEgaHR0cC1lcXVpdj0iY29udGVudC10eXBlIiBjb250ZW50PSJ0ZXh0 L2h0bWw7IGNoYXJzZXQ9dXRmLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPjxkaXY+PGJyPjxi cj5PbiAxNSBPY3QgMjAxMywgYXQgMjI6MjIsIERlYWQgSG9yc2UgJmx0OzxhIGhyZWY9Im1haWx0 bzpkZWFkaG9yc2Vjb25zdWx0aW5nQGdtYWlsLmNvbSI+ZGVhZGhvcnNlY29uc3VsdGluZ0BnbWFp bC5jb208L2E+Jmd0OyB3cm90ZTo8YnI+PGJyPjwvZGl2PjxibG9ja3F1b3RlIHR5cGU9ImNpdGUi PjxkaXY+PGRpdiBkaXI9Imx0ciI+PGRpdj5JIGp1c3QgdG9vayBhIGxvb2sgdGhyb3VnaCB0aGUg dmRjX29wdGlvbnMga2V5cyBhbmQgSSBvbmx5IGZvdW5kIGtleSBlbnRyaWVzIG9mICJBYm9ydE1p Z3JhdGlvbk9uRXJyb3IiIGZvciBjbHVzdGVyIGxldmVscyAzLjAsIDMuMSwgYW5kIDMuMiBidXQg bm90IDMuMy4gPGJyPjwvZGl2PjwvZGl2PjwvZGl2PjwvYmxvY2txdW90ZT48ZGl2Pjxicj48L2Rp dj5ZZXMsIHNldCBhIG5ldyBvbmUgZm9yIDMuMyB0byBmYWxzZTxkaXY+PGJyPjxibG9ja3F1b3Rl IHR5cGU9ImNpdGUiPjxkaXY+PGRpdiBkaXI9Imx0ciI+PGRpdj48YnI+PC9kaXY+LSBESEM8YnI+ PC9kaXY+DQo8ZGl2IGNsYXNzPSJnbWFpbF9leHRyYSI+PGJyPjxicj48ZGl2IGNsYXNzPSJnbWFp bF9xdW90ZSI+T24gVHVlLCBPY3QgMTUsIDIwMTMgYXQgMTA6NTUgQU0sIERhbiBLZW5pZ3NiZXJn IDxzcGFuIGRpcj0ibHRyIj4mbHQ7PGEgaHJlZj0ibWFpbHRvOmRhbmtlbkByZWRoYXQuY29tIiB0 YXJnZXQ9Il9ibGFuayI+ZGFua2VuQHJlZGhhdC5jb208L2E+Jmd0Ozwvc3Bhbj4gd3JvdGU6PGJy Pg0KPGJsb2NrcXVvdGUgY2xhc3M9ImdtYWlsX3F1b3RlIiBzdHlsZT0ibWFyZ2luOjAgMCAwIC44 ZXg7Ym9yZGVyLWxlZnQ6MXB4ICNjY2Mgc29saWQ7cGFkZGluZy1sZWZ0OjFleCI+PGRpdiBjbGFz cz0iaW0iPk9uIFR1ZSwgT2N0IDE1LCAyMDEzIGF0IDA0OjQyOjExUE0gKzAyMDAsIE1pY2hhbCBT a3JpdmFuZWsgd3JvdGU6PGJyPg0KJmd0Ozxicj4NCiZndDsgT24gT2N0IDE1LCAyMDEzLCBhdCAx NjozNiAsIERlYWQgSG9yc2UgJmx0OzxhIGhyZWY9Im1haWx0bzpkZWFkaG9yc2Vjb25zdWx0aW5n QGdtYWlsLmNvbSI+ZGVhZGhvcnNlY29uc3VsdGluZ0BnbWFpbC5jb208L2E+Jmd0OyB3cm90ZTo8 YnI+DQomZ3Q7PGJyPg0KJmd0OyAmZ3Q7IEkgaGF2ZSBiZWVuIHJ1bm5pbmcgRUwgNi40IGhvc3Rz IGluIDMuMyBtb2RlIGZvciBxdWl0ZSBzb21lIHRpbWUsIEkgb25seSBub3RpY2VkIHRoaXMgYnJl YWthZ2UgaW4gdGhlIGxhdGVzdCBtYXN0ZXIgVkRTTSA0LjEzLnguIFRhZ2dlZCB2ZHNtIHZlcnNp b25zOiBvdmlydC0zLjMuMCBhbmQgb3ZpcnQtMy4zIGRvIHdvcmsgKGlmIG5vdCB1c2luZyBtYXN0 ZXIgd2hpY2ggb25lIG9mIHRoZXNlIHNob3VsZCBiZSB1c2VkIHdpdGggMy4zIGJ0dz8pPGJyPg0K DQomZ3Q7PGJyPg0KJmd0OyBpdCBpcyBjb3JyZWN0PGJyPg0KJmd0OyBJIHN1cHBvc2UgYXMgYSB3 b3JrYXJvdW5kIHlvdSBjYW4gdXNlIGVuZ2luZS1jb25maWcgYW5kIHNldCBBYm9ydE1pZ3JhdGlv bk9uRXJyb3IgdG8gZmFsc2UgZm9yIG5vd+KApjxicj4NCiZndDsgdGhlIGRlZmF1bHQgc2hvdWxk IGNoYW5nZSB0byBmYWxzZSB0aWxsIEVMIDYuNSBjb21lcyBvdXQsIEkgZ3Vlc3PigKY8YnI+DQo8 YnI+DQo8L2Rpdj5HbG9iYWwgY29uZmlndXJhYmxlIGFyZSBldmlsLCBidXQgeWVzLiBQbGVhc2Ug c2V0IGl0IHRvIEZhbHNlIGJ5PGJyPg0KZGVmYXVsdCwgdW50aWwgd2UgY2FuIHJlcXVpcmUgYSBs aWJ2aXJ0IHZlcnNpb24gdGhhdCBoYXMgaXQuPGJyPg0KPHNwYW4gY2xhc3M9IkhPRW5aYiI+PGZv bnQgY29sb3I9IiM4ODg4ODgiPjxicj4NCkRhbi48YnI+DQo8L2ZvbnQ+PC9zcGFuPjwvYmxvY2tx dW90ZT48L2Rpdj48YnI+PC9kaXY+DQo8L2Rpdj48L2Jsb2NrcXVvdGU+PC9kaXY+PC9ib2R5Pjwv aHRtbD4= --Apple-Mail-B5D6075B-ED03-4C00-A2E0-515CD9178B4C--
participants (3)
-
Dan Kenigsberg
-
Dead Horse
-
Michal Skrivanek