Hello,I have an oVirt 4.1.1 environment with:
- engine is a vSphere CentOS 7.3 VM with its nic on say vlan1
- 2 x hosts (CentOS 7.3) with their ovirtmgmt lan on a bonding (active-backup) on say vlan2
network architecture layout is to put hypervisors and mgmt servers in different vlans
Today we had these 4 events below shown in our engine, with root cause apparently a maintenance network routing activity (it should have been transparent, network guys told..., but this is another story ;-)
No alert message inside VMs
4) May 23, 2017 1:43:58 PM Host ov300 power management was verified successfully.
3) May 23, 2017 1:43:58 PM Status of host ov300 was set to Up.
2) May 23, 2017 1:43:55 PM Executing power management status on Host ov300 using Proxy Host ov301 and Fence Agent ipmilan:10.10.193.103.
1) May 23, 2017 1:43:37 PM Host ov300 is not responding. It will stay in Connecting state for a grace period of 61 seconds and after that an attempt to fence the host will be issued.
Can anyone tell exactly the meaning of the different lines?
Is the 1) detected because the engine, from only a network point of view, was not able to ping/reach the hostname of the host ov300, or the "not responding" is any particular specific check?
Is the "61 seconds" delay tunable?
Is 2) an additional check to verify status of ov300?
In case of failure of test in 2) would the fencing have been immediate or the delay described in 1) would have taken place?
Are 3) and 4) messages independent from the engine being able to reach ov300 or the 61 seconds delay would have been true anyway?
Hope I have explained my doubts related to events that could determine a potential fencing of an active node with its running VMs... with the "only" temporary problem of connectivity between the engine and one of the nodes...
Thanks in advance,
Gianluca