oVirt 4.0.3 (Hosted Engine) - High Availability VM not restart after auto-fencing of host.

Hello oVirt guru`s ! I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). These events I see in the oVirt web console: Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest What am I doing wrong? Why HA VM not start on a second host?

Hi, could you please share whole engine.log? Thanks Martin Perina On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hello oVirt guru`s !
I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.
1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)
2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain- logical-network-vlan-vm-settings-console-guest-agent-live-migration/) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"
3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).
Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).
These events I see in the oVirt web console:
Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?
AFAIK it's not, but I'm not na expert, adding Arik. You need correct power management setup for the hosts and VM has to be marked as highly available for sure.
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:
Hi,
could you please share whole engine.log?
Thanks
Martin Perina
On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hello oVirt guru`s !
I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.
1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)
2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain- logical-network-vlan-vm-settings-console-guest-agent-live-migration/) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"
3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).
Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).
These events I see in the oVirt web console:
Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

engine.log for this period. 15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?
AFAIK it's not, but I'm not na expert, adding Arik.
You need correct power management setup for the hosts and VM has to be marked as highly available for sure.
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:
Hi,
could you please share whole engine.log?
Thanks
Martin Perina
On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hello oVirt guru`s !
I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.
1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)
2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"
3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).
Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).
These events I see in the oVirt web console:
Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi, I found out this in the log: 2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest If I'm not mistaken, this means that VM was properly shutted down from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive? If you want to test fencing properly, then I suggest you to either block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied. Martin On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:
engine.log for this period.
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?
AFAIK it's not, but I'm not na expert, adding Arik.
You need correct power management setup for the hosts and VM has to be marked as highly available for sure.
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:
Hi,
could you please share whole engine.log?
Thanks
Martin Perina
On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hello oVirt guru`s !
I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.
1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)
2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain- logical-network-vlan-vm-settings-console-guest-agent-live-migration/) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"
3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).
Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).
These events I see in the oVirt web console:
Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after
15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>: that an attempt to fence the host will be issued.
Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down. 15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:
Hi,
I found out this in the log:
2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
If I'm not mistaken, this means that VM was properly shutted down from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?
If you want to test fencing properly, then I suggest you to either block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.
Martin
On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:
engine.log for this period.
15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?
AFAIK it's not, but I'm not na expert, adding Arik.
You need correct power management setup for the hosts and VM has to be marked as highly available for sure.
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:
Hi,
could you please share whole engine.log?
Thanks
Martin Perina
On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hello oVirt guru`s !
I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.
1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)
2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"
3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).
Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).
These events I see in the oVirt web console:
Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

There are more ideas? 15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" <aleksey.maksimov@it-kb.ru>:
Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.
15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:
Hi,
I found out this in the log:
2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
If I'm not mistaken, this means that VM was properly shutted down from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?
If you want to test fencing properly, then I suggest you to either block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.
Martin
On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:
engine.log for this period.
15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?
AFAIK it's not, but I'm not na expert, adding Arik.
You need correct power management setup for the hosts and VM has to be marked as highly available for sure.
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:
Hi,
could you please share whole engine.log?
Thanks
Martin Perina
On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: > Hello oVirt guru`s ! > > I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. > > 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) > > 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent > (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) > In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" > > 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). > > Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). > > These events I see in the oVirt web console: > > Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. > Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. > Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. > Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. > Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. > Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. > Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. > Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. > Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. > Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded > Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest > > What am I doing wrong? Why HA VM not start on a second host? > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru wrote:
There are more ideas?
15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" <aleksey.maksimov@it-kb.ru>:
Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.
15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:
Hi,
I found out this in the log:
2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
since it shut down cleanly, can you please check the guest's logs to see what triggered the shutdown? In such cases it is considered a user requested shutdown and such VMs are not restarted automatically We are aware of a similar issue on specific hw - https://bugzilla.redhat.com/show_bug.cgi?id=1341106
If I'm not mistaken, this means that VM was properly shutted down from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?
If you want to test fencing properly, then I suggest you to either block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.
Martin
On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:
engine.log for this period.
15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?
AFAIK it's not, but I'm not na expert, adding Arik.
You need correct power management setup for the hosts and VM has to be marked as highly available for sure.
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>: > Hi, > > could you please share whole engine.log? > > Thanks > > Martin Perina > > On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: >> Hello oVirt guru`s ! >> >> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >> >> 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) >> >> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >> (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) >> In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" >> >> 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >> >> Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). >> >> These events I see in the oVirt web console: >> >> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. >> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. >> Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. >> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. >> Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. >> Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. >> Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. >> Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. >> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. >> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded >> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest >> >> What am I doing wrong? Why HA VM not start on a second host? >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru wrote:
There are more ideas?
15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" < aleksey.maksimov@it-kb.ru>:
Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.
15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:
Hi,
I found out this in the log:
2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core. vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal. dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
since it shut down cleanly, can you please check the guest's logs to see what triggered the shutdown? In such cases it is considered a user requested shutdown and such VMs are not restarted automatically
That's exactly what I meant by my response. From the log it's obvious that VM was shutdown properly, so engine will not restart it on a different. host. Also on most modern hosts if you execute power management off action, a signal is sent to OS to execute regular shutdown so VMs are also shutted down properly.
We are aware of a similar issue on specific hw - https://bugzilla.redhat.com/show_bug.cgi?id=1341106
If I'm not mistaken, this means that VM was properly shutted down
from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?
If you want to test fencing properly, then I suggest you to either
block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.
Try above if you want to test fencing. Of course you can always configure firewall rule to drop all packets between engine and host or unplug host network cable.
Martin
On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:
engine.log for this period.
15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru>
wrote:
> Hi Martin. > I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?
AFAIK it's not, but I'm not na expert, adding Arik.
You need correct power management setup for the hosts and VM has to be marked as highly available for sure.
> 15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>: >> Hi, >> >> could you please share whole engine.log? >> >> Thanks >> >> Martin Perina >> >> On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: >>> Hello oVirt guru`s ! >>> >>> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >>> >>> 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) >>> >>> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >>> (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain- logical-network-vlan-vm-settings-console-guest-agent-live-migration/) >>> In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" >>> >>> 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >>> >>> Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). >>> >>> These events I see in the oVirt web console: >>> >>> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. >>> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. >>> Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. >>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. >>> Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. >>> Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. >>> Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. >>> Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. >>> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. >>> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded >>> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest >>> >>> What am I doing wrong? Why HA VM not start on a second host? >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina <mperina@redhat.com> wrote:
On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru wrote:
There are more ideas?
15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" < aleksey.maksimov@it-kb.ru>:
Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.
15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:
Hi,
I found out this in the log:
2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbb roker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
since it shut down cleanly, can you please check the guest's logs to see what triggered the shutdown? In such cases it is considered a user requested shutdown and such VMs are not restarted automatically
That's exactly what I meant by my response. From the log it's obvious that VM was shutdown properly, so engine will not restart it on a different. host. Also on most modern hosts if you execute power management off action, a signal is sent to OS to execute
regular shutdown so VMs are also shutted down properly.
I understand the reason, but is it really what the user expects? I mean, if I set HA mode on a VM I'd expect the that the engine cares to keep it up of restart if needed regardless of shutdown reasons. For instance, on hosted-engine the HA agent, if not in global maintenance mode, will restart the engine VM regardless of who or why it went off.
We are aware of a similar issue on specific hw - https://bugzilla.redhat.com/show_bug.cgi?id=1341106
If I'm not mistaken, this means that VM was properly shutted down
from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?
If you want to test fencing properly, then I suggest you to either
block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.
Try above if you want to test fencing. Of course you can always configure firewall rule to drop all packets between engine and host or unplug host network cable.
Martin
On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:
engine.log for this period.
15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>: > On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru>
wrote:
>> Hi Martin. >> I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process? > > AFAIK it's not, but I'm not na expert, adding Arik. > > You need correct power management setup for the hosts and VM has to be marked as highly available for sure. > >> 15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>: >>> Hi, >>> >>> could you please share whole engine.log? >>> >>> Thanks >>> >>> Martin Perina >>> >>> On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: >>>> Hello oVirt guru`s ! >>>> >>>> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >>>> >>>> 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) >>>> >>>> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >>>> (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain-log ical-network-vlan-vm-settings-console-guest-agent-live-migration/) >>>> In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" >>>> >>>> 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >>>> >>>> Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). >>>> >>>> These events I see in the oVirt web console: >>>> >>>> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. >>>> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. >>>> Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. >>>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. >>>> Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. >>>> Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. >>>> Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. >>>> Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. >>>> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. >>>> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded >>>> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest >>>> >>>> What am I doing wrong? Why HA VM not start on a second host? >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Fri, Sep 16, 2016 at 1:54 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:
On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina <mperina@redhat.com> wrote:
On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru wrote:
There are more ideas?
15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" < aleksey.maksimov@it-kb.ru>:
Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.
15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:
Hi,
I found out this in the log:
2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbb roker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
since it shut down cleanly, can you please check the guest's logs to see what triggered the shutdown? In such cases it is considered a user requested shutdown and such VMs are not restarted automatically
That's exactly what I meant by my response. From the log it's obvious that VM was shutdown properly, so engine will not restart it on a different. host. Also on most modern hosts if you execute power management off action, a signal is sent to OS to execute
regular shutdown so VMs are also shutted down properly.
I understand the reason, but is it really what the user expects?
I mean, if I set HA mode on a VM I'd expect the that the engine cares to keep it up of restart if needed regardless of shutdown reasons.
AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. For instance, on hosted-engine the HA agent, if not in global maintenance
mode, will restart the engine VM regardless of who or why it went off.
Well, HE VM is definitely not a standard HA VM :-)
We are aware of a similar issue on specific hw - https://bugzilla.redhat.com/show_bug.cgi?id=1341106
If I'm not mistaken, this means that VM was properly shutted down
from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?
If you want to test fencing properly, then I suggest you to either
block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.
Try above if you want to test fencing. Of course you can always configure firewall rule to drop all packets between engine and host or unplug host network cable.
Martin
On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote: > engine.log for this period. > > 15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>: >> On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru>
>>> Hi Martin. >>> I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process? >> >> AFAIK it's not, but I'm not na expert, adding Arik. >> >> You need correct power management setup for the hosts and VM has to be marked as highly available for sure. >> >>> 15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>: >>>> Hi, >>>> >>>> could you please share whole engine.log? >>>> >>>> Thanks >>>> >>>> Martin Perina >>>> >>>> On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: >>>>> Hello oVirt guru`s ! >>>>> >>>>> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >>>>> >>>>> 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) >>>>> >>>>> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >>>>> (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain-log ical-network-vlan-vm-settings-console-guest-agent-live-migration/) >>>>> In VM settings on "High Availability" I turned on the
wrote: option "Highly Available" and change "Priority" to "High"
>>>>> >>>>> 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >>>>> >>>>> Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). >>>>> >>>>> These events I see in the oVirt web console: >>>>> >>>>> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. >>>>> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. >>>>> Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. >>>>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. >>>>> Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. >>>>> Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. >>>>> Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. >>>>> Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. >>>>> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. >>>>> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded >>>>> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest >>>>> >>>>> What am I doing wrong? Why HA VM not start on a second host? >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Apple-Mail=_888E7FA5-6914-4728-8646-862772EBB5CF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 16 Sep 2016, at 14:23, Martin Perina <mperina@redhat.com> wrote: =20 =20 =20 On Fri, Sep 16, 2016 at 1:54 PM, Simone Tiraboschi = <stirabos@redhat.com <mailto:stirabos@redhat.com>> wrote: =20 =20 On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina <mperina@redhat.com = <mailto:mperina@redhat.com>> wrote: =20 =20 On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> = wrote: =20
On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru> wrote:
There are more ideas?
15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru>" <aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru>>:
Martin, I physically turned off the server through the iLO2. See = screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host = shut down.
15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com = <mailto:mperina@redhat.com>>:
Hi,
I found out this in the log:
2016-09-15 12:02:04,661 INFO = [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] = (ForkJoinPool-1-worker-6) [] VM = '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' = --> 'Down' 2016-09-15 12:02:04,788 INFO = [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] = (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, = Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: = User shut down from within the guest =20 since it shut down cleanly, can you please check the guest's logs to = see what triggered the shutdown? In such cases it is considered a user = requested shutdown and such VMs are not restarted automatically =20 =E2=80=8BThat's exactly what I meant by my response. =46rom the log = it's obvious that VM was shutdown properly, so engine will not restart = it on a different. host. Also on most modern hosts if you execute power = management off action, a signal is sent to OS to execute =E2=80=8B = =E2=80=8Bregular shutdown so VMs are also shutted down properly. =20 I understand the reason, but is it really what the user expects? =20 I mean, if I set HA mode on a VM I'd expect the that the engine cares = to keep it up of restart if needed regardless of shutdown reasons.
no, that=E2=80=99s not how HA works today. When you log into a guest and = issue =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your = hands. We can argue how it should or may work, but this is the defined = behavior since the dawn of oVirt.
=20 =E2=80=8BAFAIK that's correct, we need to be able =E2=80=8B=E2=80=8Bshut= down HA VM=E2=80=8B=E2=80=8B=E2=80=8B without being it immediately = restarted on different host. We want to restart HA VM only if host, = where HA VM is running, is non-responsive.
=20 For instance, on hosted-engine the HA agent, if not in global =
=20 =E2=80=8BWell, HE VM is definitely not a standard HA VM :-) =E2=80=8B=20 =20 =20 =E2=80=8B We are aware of a similar issue on specific hw - = https://bugzilla.redhat.com/show_bug.cgi?id=3D1341106 = <https://bugzilla.redhat.com/show_bug.cgi?id=3D1341106> =20
If I'm not mistaken, this means that VM was properly shutted down =
from within itself and in that case it's not restarted automatically. So = I'm curious what actions have you made to make host KOM-AD01-VM31 = non-responsive?
If you want to test fencing properly, then I suggest you to =
either block connection between host and engine on host side and = forcibly stop ovirtmgmt network interface on host and watch fencing is = applied. =20 =E2=80=8BTry above if you want to test fencing. Of course you can = always configure firewall rule to drop all packets between engine and = host or unplug host network cable=E2=80=8B. =20
Martin
On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru =
<mailto:aleksey.maksimov@it-kb.ru>> wrote:
engine.log for this period.
15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com = <mailto:mperina@redhat.com>>:
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru>> wrote: > Hi Martin. > I have a stupid question. Use Watchdog device mandatory to = automatically start a virtual machine in host Fencing process?
=E2=80=8BAFAIK it's not, but I'm not na expert, adding Arik.
You need correct power management setup for the hosts and VM = has to be marked as highly available=E2=80=8B for sure.
> 15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com = <mailto:mperina@redhat.com>>: >> Hi, >> >> could you please share whole engine.log? >> >> Thanks >> >> Martin Perina >> >> On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru>> wrote: >>> Hello oVirt guru`s ! >>> >>> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS = 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >>> >>> 1. I configured Power Management for the Hosts (successfully = added Fencing Agent for iLO2 from my hosts) >>> >>> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS = (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >>> (As described herein = https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-cente= r-iso-domain-logical-network-vlan-vm-settings-console-guest-agent-live-mig= ration/ = <https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-cent= er-iso-domain-logical-network-vlan-vm-settings-console-guest-agent-live-mi= gration/>) >>> In VM settings on "High Availability" I turned on the =
we try to restart it in all other cases other than user initiated = shutdown, e.g. a QEMU process crash on an otherwise-healthy host maintenance mode, will restart the engine VM regardless of who or why it = went off. option "Highly Available" and change "Priority" to "High"
>>> >>> 3. Now I'm trying to check Hard-Fencing and power off my = first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >>> >>> Fencing successfully works and server is automatically = turned on, but my HA VM not started on second host (KOM-AD01-VM32). >>> >>> These events I see in the oVirt web console: >>> >>> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power = management was verified successfully. >>> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 = was set to Up. >>> Sep 15, 2016 12:08:05 PM Executing power management = status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is = rebooting. >>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was = started by SYSTEM. >>> Sep 15, 2016 12:05:48 PM Power management start of = Host KOM-AD01-VM31 succeeded. >>> Sep 15, 2016 12:05:41 PM Executing power management = status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:19 PM Executing power management = start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:19 PM Power management start of = Host KOM-AD01-VM31 initiated. >>> Sep 15, 2016 12:05:19 PM Auto fence for host = KOM-AD01-VM31 was started. >>> Sep 15, 2016 12:05:11 PM Executing power management = status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:04 PM Executing power management = status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non = responsive. >>> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not = responding. It will stay in Connecting state for a grace period of 60 = seconds and after that an attempt to fence the host will be issued. >>> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command = failed: Heartbeat exeeded >>> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. = Exit message: User shut down from within the guest >>> >>> What am I doing wrong? Why HA VM not start on a second host? >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org <mailto:Users@ovirt.org> >>> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>
Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>
=20 =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 =20
--Apple-Mail=_888E7FA5-6914-4728-8646-862772EBB5CF Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 16 Sep 2016, at 14:23, Martin Perina <<a = href=3D"mailto:mperina@redhat.com" class=3D"">mperina@redhat.com</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif"><br class=3D""></div><div= class=3D"gmail_extra"><br class=3D""><div class=3D"gmail_quote">On Fri, = Sep 16, 2016 at 1:54 PM, Simone Tiraboschi <span dir=3D"ltr" = class=3D""><<a href=3D"mailto:stirabos@redhat.com" target=3D"_blank" = class=3D"">stirabos@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" = class=3D""><br class=3D""><div class=3D"gmail_extra"><br class=3D""><div = class=3D"gmail_quote">On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina = <span dir=3D"ltr" class=3D""><<a href=3D"mailto:mperina@redhat.com" = target=3D"_blank" class=3D"">mperina@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" = class=3D""><div style=3D"font-family:arial,helvetica,sans-serif" = class=3D""><br class=3D""></div><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote"><span class=3D"">On Fri, Sep 16, = 2016 at 9:26 AM, Michal Skrivanek <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:michal.skrivanek@redhat.com" target=3D"_blank" = class=3D"">michal.skrivanek@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class=3D""> > On 16 Sep 2016, at 08:29, <a = href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a> wrote:<br class=3D""> ><br class=3D""> > There are more ideas?<br class=3D""> ><br class=3D""> > 15.09.2016, 14:40, "<a href=3D"mailto:aleksey.maksimov@it-kb.ru" = target=3D"_blank" class=3D"">aleksey.maksimov@it-kb.ru</a>" <<a = href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a>>:<br class=3D""> >> Martin, I physically turned off the server through the iLO2. = See screenshots.<br class=3D""> >> I did not touch Virtual Machine (KOM-AD01-PBX02) at the same = time.<br class=3D""> >> The virtual machine has been turned on at the time when the = host shut down.<br class=3D""> >><br class=3D""> >> 15.09.2016, 14:27, "Martin Perina" <<a = href=3D"mailto:mperina@redhat.com" target=3D"_blank" = class=3D"">mperina@redhat.com</a>>:<br class=3D""> >>> Hi,<br class=3D""> >>><br class=3D""> >>> I found out this in the log:<br class=3D""> >>><br class=3D""> >>> 2016-09-15 12:02:04,661 INFO = [org.ovirt.engine.core.vdsbrok<wbr class=3D"">er.monitoring.VmAnalyzer] = (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff<wbr = class=3D"">8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down'<br = class=3D""> >>> 2016-09-15 12:02:04,788 INFO = [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.auditloghandling.AuditLo<wbr class=3D"">gDirector] = (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, = Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: = User shut down from within the guest<br class=3D""> <br class=3D""> since it shut down cleanly, can you please check the guest's logs to see = what triggered the shutdown? In such cases it is considered a user = requested shutdown and such VMs are not restarted automatically<br = class=3D""></blockquote></span><div class=3D""><br class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8BThat's exactly what I meant by my response. =46rom = the log it's obvious that VM was shutdown properly, so engine will not = restart it on a different. host. Also on most modern hosts if you = execute power management off action, a signal is sent to OS to execute = =E2=80=8B</div> <div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8Bregular shutdown so VMs are also shutted down = properly.<br class=3D""></div></div></div></div></div></blockquote><div = class=3D""><br class=3D""></div><div class=3D"">I understand the reason, = but is it really what the user expects?</div><div class=3D""><br = class=3D""></div><div class=3D"">I mean, if I set HA mode on a VM I'd = expect the that the engine cares to keep it up of restart if needed = regardless of shutdown = reasons.</div></div></div></div></blockquote></div></div></div></div></blo= ckquote><div><br class=3D""></div>no, that=E2=80=99s not how HA works = today. When you log into a guest and issue =E2=80=9Cshutdown=E2=80=9D we = do not restart the VM under your hands. We can argue how it should or = may work, but this is the defined behavior since the dawn of = oVirt.</div><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D""><br class=3D""><div = class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8BA= FAIK that's correct, we need to be able =E2=80=8B</div><div = class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8Bs= hutdown HA VM=E2=80=8B</div>=E2=80=8B<div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8B = without being it immediately restarted on different host. We want to = restart HA VM only if host, where HA VM is running, is = non-responsive.<br = class=3D""></div></div></div></div></div></div></blockquote><div><br = class=3D""></div>we try to restart it in all other cases other than user = initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy = host</div><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D""><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline"><br = class=3D""></div></div><blockquote class=3D"gmail_quote" style=3D"margin:0= 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" = class=3D""><div class=3D"gmail_extra"><div class=3D"gmail_quote"><div = class=3D"">For instance, on hosted-engine the HA agent, if not in global = maintenance mode, will restart the engine VM regardless of who or why it = went off.</div></div></div></div></blockquote><div class=3D""><br = class=3D""><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8BW= ell, HE VM is definitely not a standard HA VM :-)<br = class=3D"">=E2=80=8B</div> </div><blockquote class=3D"gmail_quote" = style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><div class=3D""><br = class=3D""></div><div class=3D""> </div><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><div class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8B</div></div><span class=3D""><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"> We are aware of a similar issue on specific hw - <a = href=3D"https://bugzilla.redhat.com/show_bug.cgi?id=3D1341106" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">https://bugzilla.redhat.com/sh<wbr = class=3D"">ow_bug.cgi?id=3D1341106</a><br class=3D""> <br class=3D""> >>><br class=3D""> >>> If I'm not mistaken, this means that VM was properly = shutted down from within itself and in that case it's not restarted = automatically. So I'm curious what actions have you made to make host = KOM-AD01-VM31 non-responsive?<br class=3D""> >>><br class=3D""> >>> If you want to test fencing properly, then I suggest = you to either block connection between host and engine on host side and = forcibly stop ovirtmgmt network interface on host and watch fencing is = applied.<br class=3D""></blockquote></span><div class=3D""><br = class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8BTry above if you want to test fencing. Of course you = can always configure firewall rule to drop all packets between engine = and host or unplug host network cable=E2=80=8B.<br class=3D""><br = class=3D""></div></div><div class=3D""><div class=3D""><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"> >>><br class=3D""> >>> Martin<br class=3D""> >>><br class=3D""> >>> On Thu, Sep 15, 2016 at 1:16 PM, <<a = href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a>> wrote:<br class=3D""> >>>> engine.log for this period.<br class=3D""> >>>><br class=3D""> >>>> 15.09.2016, 14:01, "Martin Perina" <<a = href=3D"mailto:mperina@redhat.com" target=3D"_blank" = class=3D"">mperina@redhat.com</a>>:<br class=3D""> >>>>> On Thu, Sep 15, 2016 at 12:47 PM, <<a = href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a>> wrote:<br class=3D""> >>>>>> Hi Martin.<br class=3D""> >>>>>> I have a stupid question. Use Watchdog = device mandatory to automatically start a virtual machine in host = Fencing process?<br class=3D""> >>>>><br class=3D""> >>>>> =E2=80=8BAFAIK it's not, but I'm not na = expert, adding Arik.<br class=3D""> >>>>><br class=3D""> >>>>> You need correct power management setup for = the hosts and VM has to be marked as highly available=E2=80=8B for = sure.<br class=3D""> >>>>><br class=3D""> >>>>>> 15.09.2016, 13:43, "Martin Perina" <<a = href=3D"mailto:mperina@redhat.com" target=3D"_blank" = class=3D"">mperina@redhat.com</a>>:<br class=3D""> >>>>>>> Hi,<br class=3D""> >>>>>>><br class=3D""> >>>>>>> could you please share whole = engine.log?<br class=3D""> >>>>>>><br class=3D""> >>>>>>> Thanks<br class=3D""> >>>>>>><br class=3D""> >>>>>>> Martin Perina<br class=3D""> >>>>>>><br class=3D""> >>>>>>> On Thu, Sep 15, 2016 at 12:01 PM, = <<a href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a>> wrote:<br class=3D""> >>>>>>>> Hello oVirt guru`s !<br class=3D"">= >>>>>>>><br class=3D""> >>>>>>>> I have oVirt Hosted Engine = 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) = connected to shared FC SAN Storage.<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> 1. I configured Power Management = for the Hosts (successfully added Fencing Agent for iLO2 from my = hosts)<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> 2. I created new VM = (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and = oVirt Guest Agent<br class=3D""> >>>>>>>> (As described herein <a = href=3D"https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-da= ta-center-iso-domain-logical-network-vlan-vm-settings-console-guest-agent-= live-migration/" rel=3D"noreferrer" target=3D"_blank" = class=3D"">https://blog.it-kb.ru/2016/09/<wbr = class=3D"">14/install-ovirt-4-0-part-2-ab<wbr = class=3D"">out-data-center-iso-domain-log<wbr = class=3D"">ical-network-vlan-vm-settings-<wbr = class=3D"">console-guest-agent-live-migra<wbr class=3D"">tion/</a>)<br = class=3D""> >>>>>>>> In VM settings on = "High Availability" I turned on the option "Highly Available" and change = "Priority" to "High"<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> 3. Now I'm trying to check = Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO = (KOM-AD01-ILO31).<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> Fencing successfully works and = server is automatically turned on, but my HA VM not started on second = host (KOM-AD01-VM32).<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> These events I see in the oVirt = web console:<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> Sep 15, 2016 12:08:13 PM = Host KOM-AD01-VM31 power management was verified = successfully.<br class=3D""> >>>>>>>> Sep 15, 2016 12:08:13 PM = Status of host KOM-AD01-VM31 was set to Up.<br = class=3D""> >>>>>>>> Sep 15, 2016 12:08:05 PM = Executing power management status on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:48 PM = Host KOM-AD01-VM31 is rebooting.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:48 PM = Host KOM-AD01-VM31 was started by SYSTEM.<br = class=3D""> >>>>>>>> Sep 15, 2016 12:05:48 PM = Power management start of Host KOM-AD01-VM31 = succeeded.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:41 PM = Executing power management status on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:19 PM = Executing power management start on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:19 PM = Power management start of Host KOM-AD01-VM31 = initiated.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:19 PM = Auto fence for host KOM-AD01-VM31 was started.<br = class=3D""> >>>>>>>> Sep 15, 2016 12:05:11 PM = Executing power management status on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:04 PM = Executing power management status on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:04 PM = Host KOM-AD01-VM31 is non responsive.<br class=3D""> >>>>>>>> Sep 15, 2016 12:02:32 PM = Host KOM-AD01-VM31 is not responding. It will stay = in Connecting state for a grace period of 60 seconds and after that an = attempt to fence the host will be issued.<br class=3D""> >>>>>>>> Sep 15, 2016 12:02:32 PM = VDSM KOM-AD01-VM31 command failed: Heartbeat = exeeded<br class=3D""> >>>>>>>> Sep 15, 2016 12:02:04 PM = VM KOM-AD01-PBX02 is down. Exit message: User shut = down from within the guest<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> What am I doing wrong? Why HA VM = not start on a second host?<br class=3D""> >>>>>>>> = ______________________________<wbr class=3D"">_________________<br = class=3D""> >>>>>>>> Users mailing list<br class=3D""> >>>>>>>> <a href=3D"mailto:Users@ovirt.org" = target=3D"_blank" class=3D"">Users@ovirt.org</a><br class=3D""> >>>>>>>> <a = href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer" = target=3D"_blank" class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/users</a><br class=3D""> > ______________________________<wbr class=3D"">_________________<br = class=3D""> > Users mailing list<br class=3D""> > <a href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""> > <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/users</a><br class=3D""> ><br class=3D""> ><br class=3D""> <br class=3D""> </blockquote></div></div></div><br class=3D""></div></div> <br class=3D"">______________________________<wbr = class=3D"">_________________<br class=3D""> Users mailing list<br class=3D""> <a href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/users</a><br class=3D""> <br class=3D""></blockquote></div><br class=3D""></div></div> </blockquote></div><br class=3D""></div></div> </div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_888E7FA5-6914-4728-8646-862772EBB5CF--

On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt.
AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive.
we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host
Hi, just another question in case HA is not configured at all. If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes? Thanks, Gianluca

On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> = wrote: =20 On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> = wrote: =20 no, that=E2=80=99s not how HA works today. When you log into a guest = and issue =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your = hands. We can argue how it should or may work, but this is the defined = behavior since the dawn of oVirt. =20
=20 =E2=80=8BAFAIK that's correct, we need to be able =E2=80=8B=E2=80=8Bshu=
--Apple-Mail=_3EFDE72F-C600-476D-A90A-60B2A94A261E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 tdown HA VM=E2=80=8B=E2=80=8B=E2=80=8B without being it immediately = restarted on different host. We want to restart HA VM only if host, = where HA VM is running, is non-responsive.
=20 we try to restart it in all other cases other than user initiated = shutdown, e.g. a QEMU process crash on an otherwise-healthy host =20 =20 Hi, just another question in case HA is not configured at all.
by =E2=80=9CHA configured=E2=80=9D I expect you=E2=80=99re referring to = the =E2=80=9CHighly Available=E2=80=9D checkbox in Edit VM dialog.
If I run the "shutdown -h now" command on an host where some VMs are = running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't = complete?) or crash of their related QEMU processes?
expectation is that you won=E2=80=99t do that. That=E2=80=99s why there = is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will = be terminated in a regular system way, i.e. all QEMU processes get = SIGTERM. =46rom the perspective of each guest this is not a clean = shutdown and it would just get killed=20 Thanks, michal
=20 Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_3EFDE72F-C600-476D-A90A-60B2A94A261E Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 16 Sep 2016, at 15:05, Gianluca Cecchi <<a = href=3D"mailto:gianluca.cecchi@gmail.com" = class=3D"">gianluca.cecchi@gmail.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" = class=3D""><div class=3D"gmail_extra"><div class=3D"gmail_quote">On Fri, = Sep 16, 2016 at 2:50 PM, Michal Skrivanek <span dir=3D"ltr" = class=3D""><<a href=3D"mailto:michal.skrivanek@redhat.com" = target=3D"_blank" class=3D"">michal.skrivanek@redhat.com</a>></span> = wrote:<br class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 = 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div = style=3D"word-wrap:break-word" class=3D""><div class=3D""><span = class=3D""><div class=3D""><br class=3D""></div></span>no, that=E2=80=99s = not how HA works today. When you log into a guest and issue = =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your hands. We = can argue how it should or may work, but this is the defined behavior = since the dawn of oVirt.</div><div class=3D""><span class=3D""><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D""><br class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8BAFAIK that's correct, we need to be able = =E2=80=8B</div><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8Bshutdown HA VM=E2=80=8B</div>=E2=80=8B<div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8B without being it immediately restarted on different = host. We want to restart HA VM only if host, where HA VM is running, is = non-responsive.<br = class=3D""></div></div></div></div></div></div></blockquote><div = class=3D""><br class=3D""></div></span>we try to restart it in all other = cases other than user initiated shutdown, e.g. a QEMU process crash on = an otherwise-healthy host</div><div class=3D""><div class=3D"h5"><div = class=3D""><br class=3D""></div></div></div></div></blockquote></div><br = class=3D""></div><div class=3D"gmail_extra">Hi, just another question in = case HA is not configured at all.</div></div></div></blockquote><div><br = class=3D""></div>by =E2=80=9CHA configured=E2=80=9D I expect you=E2=80=99r= e referring to the =E2=80=9CHighly Available=E2=80=9D checkbox in Edit = VM dialog.</div><div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra">If I run the "shutdown -h now" command on an host = where some VMs are running, what is the expected behavior?</div><div = class=3D"gmail_extra">Clean VM shutdown (with or without timeout in case = it doesn't complete?) or crash of their related QEMU = processes?</div></div></div></blockquote><div><br = class=3D""></div>expectation is that you won=E2=80=99t do that. That=E2=80= =99s why there is the Maintenance host state.</div><div>But if you do = that regardless, with VMs running, all the processes will be terminated = in a regular system way, i.e. all QEMU processes get SIGTERM. =46rom the = perspective of each guest this is not a clean shutdown and it would just = get killed </div><div><br = class=3D""></div><div>Thanks,</div><div>michal<br class=3D""><blockquote = type=3D"cite" class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><br class=3D""></div><div = class=3D"gmail_extra">Thanks,</div><div = class=3D"gmail_extra">Gianluca</div></div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_3EFDE72F-C600-476D-A90A-60B2A94A261E--

On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt.
AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive.
we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host
Hi, just another question in case HA is not configured at all.
by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.
Yes
If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?
expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed
Yes, I was thinking about the scenario of one guy issuing the command (or pressing the button) by mistake. Thanks, Gianluca

On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt.
AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive.
we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host
Hi, just another question in case HA is not configured at all.
by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.
If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?
expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed
Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.
Thanks, michal
Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Apple-Mail=_AF0DFAC1-FA37-4D5C-A60C-1F7D8972E4B0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote: =20 Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the = guest off.
=20 16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:
=20 =20 On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> = wrote: =20
On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com = <mailto:gianluca.cecchi@gmail.com>> wrote: =20 On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> = wrote: =20 no, that=E2=80=99s not how HA works today. When you log into a guest = and issue =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your = hands. We can argue how it should or may work, but this is the defined = behavior since the dawn of oVirt. =20
=20 =E2=80=8BAFAIK that's correct, we need to be able =E2=80=8B=E2=80=8B= shutdown HA VM=E2=80=8B=E2=80=8B=E2=80=8B without being it immediately = restarted on different host. We want to restart HA VM only if host, = where HA VM is running, is non-responsive. =20 we try to restart it in all other cases other than user initiated = shutdown, e.g. a QEMU process crash on an otherwise-healthy host =20 Hi, just another question in case HA is not configured at all. =20 by =E2=80=9CHA configured=E2=80=9D I expect you=E2=80=99re referring = to the =E2=80=9CHighly Available=E2=80=9D checkbox in Edit VM dialog. =20 =20 If I run the "shutdown -h now" command on an host where some VMs are = running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't = complete?) or crash of their related QEMU processes? =20 expectation is that you won=E2=80=99t do that. That=E2=80=99s why =
great. thanks there is the Maintenance host state.
But if you do that regardless, with VMs running, all the processes = will be terminated in a regular system way, i.e. all QEMU processes get = SIGTERM. =46rom the perspective of each guest this is not a clean = shutdown and it would just get killed=20 =20 =20 Aleksey is reporting that he started a shutdown on his host by power = management and the VM processes didn't get roughly killed but smoothly = shut down and so they didn't restarted regardless of their HA flag and = so this thread.=20
Gianluca talks about =E2=80=9Cshutdown -h now=E2=80=9D, you talk about = power management action, those are two different things. The current = idea is that systemd or some other component just propagates the action = to the guest and if that guest is configured to handle it as a shutdown = it starts it itself as well so it looks like a user-initiated one. Even = though this mostly makes sense it is not ok for current HA logic
=20 =20 Thanks, michal
=20 =20 Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>
Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_AF0DFAC1-FA37-4D5C-A60C-1F7D8972E4B0 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 16 Sep 2016, at 15:31, <a = href=3D"mailto:aleksey.maksimov@it-kb.ru" = class=3D"">aleksey.maksimov@it-kb.ru</a> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div class=3D"">Hi = Simone.</div><div class=3D"">Exactly.</div><div class=3D"">Now I'll put = the journald on the guest and try to understand how the guest = off.</div></div></blockquote><div><br class=3D""></div>great. = thanks</div><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div class=3D""> </div><div class=3D"">16.09.2016, = 16:25, "Simone Tiraboschi" <<a href=3D"mailto:stirabos@redhat.com" = class=3D"">stirabos@redhat.com</a>>:</div><blockquote type=3D"cite" = class=3D""><div class=3D""> <div class=3D""> <div class=3D"">On = Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <span class=3D""><<a = href=3D"mailto:michal.skrivanek@redhat.com" target=3D"_blank" = class=3D"">michal.skrivanek@redhat.com</a>></span> wrote:<blockquote = style=3D"margin:0px 0px 0px = 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-widt= h:1px;border-left-style:solid;" class=3D""><div class=3D""> <div = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><span = class=3D"">On 16 Sep 2016, at 15:05, Gianluca Cecchi <<a = href=3D"mailto:gianluca.cecchi@gmail.com" target=3D"_blank" = class=3D"">gianluca.cecchi@gmail.com</a>> = wrote:</span></div> <div class=3D""><div class=3D""><div = class=3D""><div class=3D""><span class=3D"">On Fri, Sep 16, 2016 at 2:50 = PM, Michal Skrivanek <span class=3D""><<a = href=3D"mailto:michal.skrivanek@redhat.com" target=3D"_blank" = class=3D"">michal.skrivanek@redhat.com</a>></span> = wrote:</span><blockquote style=3D"margin:0px 0px 0px = 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-widt= h:1px;border-left-style:solid;" class=3D""><div class=3D""><div = class=3D""><div class=3D""> </div><span class=3D"">no, that=E2=80=99s= not how HA works today. When you log into a guest and issue = =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your hands. We = can argue how it should or may work, but this is the defined behavior = since the dawn of oVirt.</span></div><div class=3D""> <blockquote = type=3D"cite" class=3D""><div class=3D""><div class=3D""><div = class=3D""><div class=3D""><div class=3D""> <div = style=3D"font-family:arial,helvetica,sans-serif;display:inline;" = class=3D""><span class=3D""><span class=3D"">=E2=80=8BAFAIK that's = correct, we need to be able =E2=80=8B</span></span></div><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline;" = class=3D""><span class=3D""><span class=3D"">=E2=80=8Bshutdown HA = VM=E2=80=8B</span></span></div><span class=3D""><span = class=3D"">=E2=80=8B</span></span><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline;" = class=3D""><span class=3D""><span class=3D"">=E2=80=8B without being it = immediately restarted on different host. We want to restart HA VM only = if host, where HA VM is running, is = non-responsive.</span></span></div></div></div></div></div></div></blockqu= ote><div class=3D""> </div><span class=3D"">we try to restart it in = all other cases other than user initiated shutdown, e.g. a QEMU process = crash on an otherwise-healthy host</span></div><div class=3D""><div = class=3D""><div = class=3D""> </div></div></div></div></blockquote></div></div><div = class=3D""><span class=3D"">Hi, just another question in case HA is not = configured at all.</span></div></div></div></blockquote><div = class=3D""> </div>by =E2=80=9CHA configured=E2=80=9D I expect = you=E2=80=99re referring to the =E2=80=9CHighly Available=E2=80=9D = checkbox in Edit VM dialog.</div><div class=3D""> <blockquote = type=3D"cite" class=3D""><div class=3D""><div class=3D""><div = class=3D""><span class=3D"">If I run the "shutdown -h now" command on an = host where some VMs are running, what is the expected = behavior?</span></div><div class=3D""><span class=3D"">Clean VM shutdown = (with or without timeout in case it doesn't complete?) or crash of their = related QEMU processes?</span></div></div></div></blockquote><div = class=3D""> </div>expectation is that you won=E2=80=99t do that. = That=E2=80=99s why there is the Maintenance host state.</div><div = class=3D"">But if you do that regardless, with VMs running, all the = processes will be terminated in a regular system way, i.e. all QEMU = processes get SIGTERM. =46rom the perspective of each guest this is not = a clean shutdown and it would just get killed </div><div = class=3D""> </div></div></blockquote><div class=3D""> </div><div= class=3D"">Aleksey is reporting that he started a shutdown on his host = by power management and the VM processes didn't get roughly killed = but smoothly shut down and so they didn't restarted regardless of their = HA flag and so this = thread. </div></div></div></div></blockquote></div></blockquote><div>= <br class=3D""></div>Gianluca talks about =E2=80=9Cshutdown -h now=E2=80=9D= , you talk about power management action, those are two different = things. The current idea is that systemd or some other component just = propagates the action to the guest and if that guest is configured to = handle it as a shutdown it starts it itself as well so it looks like a = user-initiated one. Even though this mostly makes sense it is not ok for = current HA logic</div><div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div class=3D""><div class=3D""><div = class=3D""> </div><blockquote style=3D"margin:0px 0px 0px = 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-widt= h:1px;border-left-style:solid;" class=3D""><div class=3D""><div = class=3D""> </div><div class=3D"">Thanks,</div><div = class=3D"">michal<blockquote type=3D"cite" class=3D""><div class=3D""><div= class=3D""><div class=3D""> </div><div class=3D"">Thanks,</div><div = class=3D"">Gianluca</div></div><span = class=3D"">_______________________________________________<br = class=3D"">Users mailing list<br class=3D""><a = href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""><a = href=3D"http://lists.ovirt.org/mailman/listinfo/users" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a></span></div><= /blockquote></div></div><br = class=3D"">_______________________________________________<br = class=3D"">Users mailing list<br class=3D""><a = href=3D"mailto:Users@ovirt.org" class=3D"">Users@ovirt.org</a><br = class=3D""><a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a><br = class=3D""> </blockquote></div></div></div></blockquote>_____________= __________________________________<br class=3D"">Users mailing list<br = class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_AF0DFAC1-FA37-4D5C-A60C-1F7D8972E4B0--

On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:
Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.
great. thanks
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt.
AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive.
we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host
Hi, just another question in case HA is not configured at all.
by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.
If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?
expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed
Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.
Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic
Aleksey, can you please also test this scenario?
Thanks, michal
Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM) This journald log in my guest OS when I press the power-button on the host: ... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. ... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot -- Perhaps this feature of HP ProLiant DL 360 G5. I dont know. If I test the unavailability of a host other ways that everything is going well. I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen... Thank you all very much for your participation and support. Michal, what kind of scenario are you talking about? PS: Excuse me for my bad English :) 16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:
Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.
great. thanks
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: > no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. > >> AFAIK that's correct, we need to be able >> shutdown HA VM >> >> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. > > we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host Hi, just another question in case HA is not configured at all.
by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.
If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?
expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed
Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.
Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic
Aleksey, can you please also test this scenario?
Thanks, michal
Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Fri, Sep 16, 2016 at 4:02 PM, <aleksey.maksimov@it-kb.ru> wrote:
So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM) This journald log in my guest OS when I press the power-button on the host:
... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. ... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --
Perhaps this feature of HP ProLiant DL 360 G5. I dont know.
If I test the unavailability of a host other ways that everything is going well.
I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4- about-ssh-soft-fencing-and-hard-fencing-over-hp-proliant- ilo2-power-managment-agent-and-test-of-high-availability/
Thank you all very much for your participation and support.
Michal, what kind of scenario are you talking about?
Basically what you just did, the question is what happens when you run 'shutdown -h now' (or press the physical button if configured to trigger a soft shutdown); is it going to propagate somehow the shutdown action to the VMs or to brutally kill them? In the first case the VMs will not restart regardless of their HA flags.
PS: Excuse me for my bad English :)
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:
Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.
great. thanks
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
> On 16 Sep 2016, at 15:05, Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote: > > On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote: >> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >> >>> AFAIK that's correct, we need to be able >>> shutdown HA VM >>> >>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >> >> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host > Hi, just another question in case HA is not configured at all.
by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.
> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? > Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?
expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed
Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this
16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>: thread.
Gianluca talks about “shutdown -h now”, you talk about power management
action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic
Aleksey, can you please also test this scenario?
Thanks, michal > Thanks, > Gianluca > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Тested. If I run 'shutdown -h now' on host with running HA VM (not HostedEngine VM)... in oVirt web-console appears event: Sep 16, 2016 5:13:18 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest HA VM is turned off and will not start on another host. This journald log from HA VM guest OS: ... Sep 16 17:06:48 KOM-AD01-PBX02 python[2637]: [100B blob data] Sep 16 17:06:53 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.91.157:123 (ntp.ubuntu.com). Sep 16 17:07:03 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.89.199:123 (ntp.ubuntu.com). Sep 16 17:07:13 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.89.198:123 (ntp.ubuntu.com). Sep 16 17:07:23 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.94.4:123 (ntp.ubuntu.com). Sep 16 17:08:48 KOM-AD01-PBX02 python[2637]: [90B blob data] Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [155B blob data] Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [100B blob data] Sep 16 17:10:49 KOM-AD01-PBX02 python[2637]: [90B blob data] Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [155B blob data] Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [100B blob data] -- Reboot -- ... Before shutting down in the log no termination procedures. It looks like a rough poweroff the VM 16.09.2016, 17:08, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 4:02 PM, <aleksey.maksimov@it-kb.ru> wrote:
So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM) This journald log in my guest OS when I press the power-button on the host:
... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. ... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --
Perhaps this feature of HP ProLiant DL 360 G5. I dont know.
If I test the unavailability of a host other ways that everything is going well.
I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen...
Thank you all very much for your participation and support.
Michal, what kind of scenario are you talking about?
Basically what you just did, the question is what happens when you run 'shutdown -h now' (or press the physical button if configured to trigger a soft shutdown); is it going to propagate somehow the shutdown action to the VMs or to brutally kill them?
In the first case the VMs will not restart regardless of their HA flags.
PS: Excuse me for my bad English :)
16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:
Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.
great. thanks
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >> On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote: >> >> On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >>> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >>> >>>> AFAIK that's correct, we need to be able >>>> shutdown HA VM >>>> >>>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >>> >>> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host >> Hi, just another question in case HA is not configured at all. > > by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog. > >> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? >> Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes? > > expectation is that you won’t do that. That’s why there is the Maintenance host state. > But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed
Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.
Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic
Aleksey, can you please also test this scenario?
> Thanks, > michal >> Thanks, >> Gianluca >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 16 Sep 2016, at 16:34, aleksey.maksimov@it-kb.ru wrote:
Тested.
If I run 'shutdown -h now' on host with running HA VM (not HostedEngine VM)...
in oVirt web-console appears event:
Sep 16, 2016 5:13:18 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest
that would be another bug. It should be recognized properly as a “kill”. Can you please share host logs from this attempt as well?
HA VM is turned off and will not start on another host.
This journald log from HA VM guest OS:
... Sep 16 17:06:48 KOM-AD01-PBX02 python[2637]: [100B blob data] Sep 16 17:06:53 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.91.157:123 (ntp.ubuntu.com). Sep 16 17:07:03 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.89.199:123 (ntp.ubuntu.com). Sep 16 17:07:13 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.89.198:123 (ntp.ubuntu.com). Sep 16 17:07:23 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.94.4:123 (ntp.ubuntu.com). Sep 16 17:08:48 KOM-AD01-PBX02 python[2637]: [90B blob data] Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [155B blob data] Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [100B blob data] Sep 16 17:10:49 KOM-AD01-PBX02 python[2637]: [90B blob data] Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [155B blob data] Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [100B blob data] -- Reboot -- ...
Before shutting down in the log no termination procedures. It looks like a rough poweroff the VM
yep, that is expected. But it should be properly detected as such and HE VM should restart. Somehow vdsm misidentifies the reason for the shutdown.
16.09.2016, 17:08, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 4:02 PM, <aleksey.maksimov@it-kb.ru> wrote:
So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM) This journald log in my guest OS when I press the power-button on the host:
... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. ... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --
Perhaps this feature of HP ProLiant DL 360 G5. I dont know.
If I test the unavailability of a host other ways that everything is going well.
I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen...
Thank you all very much for your participation and support.
Michal, what kind of scenario are you talking about?
Basically what you just did, the question is what happens when you run 'shutdown -h now' (or press the physical button if configured to trigger a soft shutdown); is it going to propagate somehow the shutdown action to the VMs or to brutally kill them?
In the first case the VMs will not restart regardless of their HA flags.
PS: Excuse me for my bad English :)
16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:
Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.
great. thanks
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>: > On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >>> On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote: >>> >>> On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >>>> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >>>> >>>>> AFAIK that's correct, we need to be able >>>>> shutdown HA VM >>>>> >>>>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >>>> >>>> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host >>> Hi, just another question in case HA is not configured at all. >> >> by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog. >> >>> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? >>> Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes? >> >> expectation is that you won’t do that. That’s why there is the Maintenance host state. >> But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed > > Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.
Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic
Aleksey, can you please also test this scenario?
>> Thanks, >> michal >>> Thanks, >>> Gianluca >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 16 Sep 2016, at 16:02, aleksey.maksimov@it-kb.ru wrote:
So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM)
thanks for confirmation, then it is indeed https://bugzilla.redhat.com/show_bug.cgi?id=1341106 I’m not sure if there is any good workaround. You can always reconfigure(disable) ACPI in the guest, then HA logic would work ok but it also means there is no graceful shutdown and your VM would be killed uncleanly.
This journald log in my guest OS when I press the power-button on the host:
.. Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. .. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --
Perhaps this feature of HP ProLiant DL 360 G5. I dont know.
If I test the unavailability of a host other ways that everything is going well.
I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen...
Thank you all very much for your participation and support.
Michal, what kind of scenario are you talking about?
PS: Excuse me for my bad English :)
16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:
Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.
great. thanks
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
> On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote: > > On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >> >>> AFAIK that's correct, we need to be able >>> shutdown HA VM >>> >>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >> >> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host > Hi, just another question in case HA is not configured at all.
by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.
> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? > Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?
expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed
Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.
Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic
Aleksey, can you please also test this scenario?
Thanks, michal > Thanks, > Gianluca > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

"your VM would be killed uncleanly." This is not a good idea, I think 16.09.2016, 17:14, "Michal Skrivanek" <michal.skrivanek@redhat.com>:
On 16 Sep 2016, at 16:02, aleksey.maksimov@it-kb.ru wrote:
So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM)
thanks for confirmation, then it is indeed https://bugzilla.redhat.com/show_bug.cgi?id=1341106
I’m not sure if there is any good workaround. You can always reconfigure(disable) ACPI in the guest, then HA logic would work ok but it also means there is no graceful shutdown and your VM would be killed uncleanly.
This journald log in my guest OS when I press the power-button on the host:
.. Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. .. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --
Perhaps this feature of HP ProLiant DL 360 G5. I dont know.
If I test the unavailability of a host other ways that everything is going well.
I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen...
Thank you all very much for your participation and support.
Michal, what kind of scenario are you talking about?
PS: Excuse me for my bad English :)
16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:
Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.
great. thanks
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >> On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote: >> >> On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >>> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >>> >>>> AFAIK that's correct, we need to be able >>>> shutdown HA VM >>>> >>>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >>> >>> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host >> Hi, just another question in case HA is not configured at all. > > by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog. > >> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? >> Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes? > > expectation is that you won’t do that. That’s why there is the Maintenance host state. > But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed
Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.
Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic
Aleksey, can you please also test this scenario?
> Thanks, > michal >> Thanks, >> Gianluca >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (5)
-
aleksey.maksimov@it-kb.ru
-
Gianluca Cecchi
-
Martin Perina
-
Michal Skrivanek
-
Simone Tiraboschi