
Hi, please consider this bugreport: https://bugzilla.redhat.com/show_bug.cgi?id=1111520 it is about the default fencing behaviour of ovirt in the special case of a single node cluster when a logical network marked as "required" is not available on the single host in the cluster. the default behaviour is, to shutdown all vms. You can find reasoning in the BZ why this is, well, "bad", to say the least. Thanks in advance. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On Fri, Jun 20, 2014 at 08:41:21AM +0000, Sven Kieske wrote:
Hi,
please consider this bugreport:
https://bugzilla.redhat.com/show_bug.cgi?id=1111520
it is about the default fencing behaviour of ovirt in the special case of a single node cluster when a logical network marked as "required" is not available on the single host in the cluster.
the default behaviour is, to shutdown all vms.
You can find reasoning in the BZ why this is, well, "bad", to say the least.
I do not quite understand the problem you describe. Does the problem go away if you set your network to "non-required"? If your VM's app does not strictly require unintermitant networking, just set the networ to non-required. Flipping the default of requiredness can be considered only for 4.0 ("required" was our default since ever, other users' scripts may depend on that). Could you attach Vdsm and Engine logs to the bug? How was the host fenced? Dan.

Am 20.06.2014 11:34, schrieb Dan Kenigsberg:
I do not quite understand the problem you describe.
See below, I hope this clears some things up.
Does the problem go away if you set your network to "non-required"? If your VM's app does not strictly require unintermitant networking, just set the networ to non-required. Flipping the default of requiredness can be considered only for 4.0 ("required" was our default since ever, other users' scripts may depend on that).
I know you can't simply switch the default behaviour, and yes, at least according to the documentation setting the network to "non-required" should mitigate the issue. however, the default is, to set every network as "required" without indicating this to the user in the first place (and the consequences of this not advertised behaviour).
Could you attach Vdsm and Engine logs to the bug? How was the host fenced?
the host was not fenced, the vms where fenced. here is a link to the documentation which should explain what I mean: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtua... this is about a single host in a cluster - ovirt can't even fence single hosts in a single cluster yet, see my other bug report for this: https://bugzilla.redhat.com/show_bug.cgi?id=1054778 I could provide logs if they are really necessary, but I doubt they are. This is documented behaviour, but it is poorly designed, as described in the BZ. The fencing mechanism is really buggy / not helpful, see also this (not really related) bug: https://bugzilla.redhat.com/show_bug.cgi?id=1090799 -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On Fri, Jun 20, 2014 at 09:53:24AM +0000, Sven Kieske wrote:
Am 20.06.2014 11:34, schrieb Dan Kenigsberg:
I do not quite understand the problem you describe.
See below, I hope this clears some things up.
Does the problem go away if you set your network to "non-required"? If your VM's app does not strictly require unintermitant networking, just set the networ to non-required. Flipping the default of requiredness can be considered only for 4.0 ("required" was our default since ever, other users' scripts may depend on that).
I know you can't simply switch the default behaviour, and yes, at least according to the documentation setting the network to "non-required" should mitigate the issue.
however, the default is, to set every network as "required" without indicating this to the user in the first place (and the consequences of this not advertised behaviour).
Could you attach Vdsm and Engine logs to the bug? How was the host fenced?
the host was not fenced, the vms where fenced.
here is a link to the documentation which should explain what I mean:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtua...
Are you refering to the paragraph: "When a required network becomes non-operational, the virtual machines running on the network are fenced and migrated to another host. This is beneficial if you have machines running mission critical workloads."?
this is about a single host in a cluster - ovirt can't even fence single hosts in a single cluster yet, see my other bug report for this: https://bugzilla.redhat.com/show_bug.cgi?id=1054778
I could provide logs if they are really necessary, but I doubt they are. This is documented behaviour, but it is poorly designed, as described in the BZ.
Apparently, I am not familiar enough with Engine's fencing logic; logs may help me understand the issue, for me they are necessary is this case. In particular, I'd like to see with my own eyes whether the VMs where explicitly destroyed by Engine. Migrating VMs to an operational destination makes a lot of sense. Destroying a running VM in attempt to recuperate of a host networking issue is extraordinary (and as such, requires exraordinary evidence).
The fencing mechanism is really buggy / not helpful, see
I vote for "buggy/imperfect". I am aware of mission-critical VMs that are kept highly-available due to it.
also this (not really related) bug: https://bugzilla.redhat.com/show_bug.cgi?id=1090799
Regards, Dan.

Am 20.06.2014 14:19, schrieb Dan Kenigsberg:
the host was not fenced, the vms where fenced.
here is a link to the documentation which should explain what I mean:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtua...
Are you refering to the paragraph: "When a required network becomes non-operational, the virtual machines running on the network are fenced and migrated to another host. This is beneficial if you have machines running mission critical workloads."?
yes
this is about a single host in a cluster - ovirt can't even fence single hosts in a single cluster yet, see my other bug report for this: https://bugzilla.redhat.com/show_bug.cgi?id=1054778
I could provide logs if they are really necessary, but I doubt they are. This is documented behaviour, but it is poorly designed, as described in the BZ.
Apparently, I am not familiar enough with Engine's fencing logic; logs may help me understand the issue, for me they are necessary is this case. In particular, I'd like to see with my own eyes whether the VMs where explicitly destroyed by Engine. Migrating VMs to an operational destination makes a lot of sense. Destroying a running VM in attempt to recuperate of a host networking issue is extraordinary (and as such, requires exraordinary evidence).
I might be able to attach some logs later. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On 06/20/2014 02:52 PM, Sven Kieske wrote:
Am 20.06.2014 14:19, schrieb Dan Kenigsberg:
the host was not fenced, the vms where fenced.
here is a link to the documentation which should explain what I mean:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtua... Are you refering to the paragraph: "When a required network becomes non-operational, the virtual machines running on the network are fenced and migrated to another host. This is beneficial if you have machines running mission critical workloads."? yes
Isn't that section referring to HA VMs?
this is about a single host in a cluster - ovirt can't even fence single hosts in a single cluster yet, see my other bug report for this: https://bugzilla.redhat.com/show_bug.cgi?id=1054778
I could provide logs if they are really necessary, but I doubt they are. This is documented behaviour, but it is poorly designed, as described in the BZ. Apparently, I am not familiar enough with Engine's fencing logic; logs may help me understand the issue, for me they are necessary is this case. In particular, I'd like to see with my own eyes whether the VMs where explicitly destroyed by Engine. Migrating VMs to an operational destination makes a lot of sense. Destroying a running VM in attempt to recuperate of a host networking issue is extraordinary (and as such, requires exraordinary evidence). I might be able to attach some logs later.
-- Regards, Vinzenz Feenstra | Senior Software Engineer RedHat Engineering Virtualization R & D Phone: +420 532 294 625 IRC: vfeenstr or evilissimo Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Am 20.06.2014 15:32, schrieb Vinzenz Feenstra:
Isn't that section referring to HA VMs?
No. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
participants (3)
-
Dan Kenigsberg
-
Sven Kieske
-
Vinzenz Feenstra