[ovirt-users] Strange fencing behaviour 3.5.3
Martin Perina
mperina at redhat.com
Tue Sep 15 07:51:56 UTC 2015
Hi,
sorry for late reponse I somehow missed your email :-(
I cannot completely understand you exact issue from the description,
but the situation when engine loses connection to all hypervisors
is always bad. Fortunately we made a few improvements in 3.5, which
should help in those scenarios. Please take a look at "Fencing policy"
tab in "Edit cluster" dialog:
1. Skip fencing if host has live lease on storage
- when host is connected to storage it has to renew its
storage lease at least every 60 secs
- so if the option is enabled and engine tries to fence the host
host using fence proxy (another host in cluster/DC which has
good connection), fence proxy checks if non responsive host
renewed its storage lease in the last 90 secs. And if lease
was renewed, fencing is aborted
2. Skip fencing on cluster connectivity issues
- if this options is enabled, engine test prior to fencing
how many of the hosts in the cluster has connectivity
issues. And if number of hosts with connectivity issues
is higher than the specified percentage, fencing is aborted
- of course this option is useless in clusters with less than
3 hosts
3. Enable fencing
- by disabling this option you can completely disable fencing
for hosts in the cluster
- this is usable in the situation when you expect connectivity
issues between engine and hosts (for example during switch
replacement), so you can disable fencing, replace the switch
and when connection is restored, enable fencing again
- however if you disable fencing completely, your HA VMs won't
be restarted on different hosts, so please use this option
with caution
Please let me known if have any other issues/questions with fencing.
Thanks
Martin Perina
----- Original Message -----
> From: "Martin Breault" <martyb at creenet.com>
> To: users at ovirt.org
> Sent: Friday, September 11, 2015 9:14:23 PM
> Subject: [ovirt-users] Strange fencing behaviour 3.5.3
>
> Hello,
>
> I manage 2 oVirt clusters that are not associated in any way, they each
> have their own management engine running ovirt-engine-3.5.3.1-1. The
> servers are Dell 6xx series and the power-management is configured using
> idrac5 settings and each cluster is a pair of hypervisors.
>
> The engines are both in a datacenter that had an electrical issue, each
> cluster is at a different unrelated location. The problem I had was
> caused by a downed switch causing the individual engines to continue to
> function, however no longer have connectivity to their respective
> clusters. Once the switch was replaced (about 30 minutes of downtime) ,
> when connectivity was resumed, both engines chose to fence one of the
> two "unresponsive hypervisors" by sending an iDrac command to power down.
>
> The downed hypervisor Cluster1 for some reason, 8 minutes later, got a
> iDrac command to power-up again. When I logged into the engine, the
> guests that were running on the powered-down host were in "off" state.
> I simply powered them back on.
>
> The downed hypervisor on Cluster2 stayed off, and was unresponsive
> according to the engine, however the VMs that were running on it were in
> an unknown state. I had to power on the host and click the "host has
> been rebooted" dialog for the cluster to free these guests to be booted
> again.
>
> My question is, is it normal for the engine to fence one or more hosts
> when it loses connectivity to all thehypervisors in the cluster? Is
> there a minimum of 3 hosts in a cluster for it to not fall into this
> mode? I'd like to know what I can troubleshoot or how I can avoid an
> issue like this should the engine be disconnected from the hypervisors
> temporarily and then resume connectivity only to kill the well-running
> guests.
>
> Thanks in advance,
>
> Marty
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
More information about the Users
mailing list