oVirt 4.0.3 (Hosted Engine) - High Availability VM not restart after auto-fencing of host. - Users

oVirt 4.0.3 (Hosted Engine) - High Availability VM not restart after auto-fencing of host.

aleksey.maksimov＠it-kb.ru

15 Sep 2016 15 Sep '16

6:01 a.m.

Hello oVirt guru`s ! I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). These events I see in the oVirt web console: Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest What am I doing wrong? Why HA VM not start on a second host?

Show replies by date

Martin Perina

15 Sep 15 Sep

6:43 a.m.

Hi, could you please share whole engine.log? Thanks Martin Perina On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:

...

Hello oVirt guru`s !

I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.

1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)

2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain- logical-network-vlan-vm-settings-console-guest-agent-live-migration/) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"

3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).

Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).

These events I see in the oVirt web console:

Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

aleksey.maksimov＠it-kb.ru

6:47 a.m.

Martin Perina

7:01 a.m.

On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:

...

Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?

AFAIK it's not, but I'm not na expert, adding Arik. You need correct power management setup for the hosts and VM has to be marked as highly available for sure.

...

15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:

Hi,

could you please share whole engine.log?

Thanks

Martin Perina

On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:

Hello oVirt guru`s !

I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.

1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)

2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain- logical-network-vlan-vm-settings-console-guest-agent-live-migration/) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"

3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).

Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).

These events I see in the oVirt web console:

Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

aleksey.maksimov＠it-kb.ru

7:16 a.m.

engine.log for this period. 15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:

...

On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?

AFAIK it's not, but I'm not na expert, adding Arik.

You need correct power management setup for the hosts and VM has to be marked as highly available for sure.

...
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:

...
Hi,

could you please share whole engine.log?

Thanks

Martin Perina

On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
Hello oVirt guru`s !

I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.

1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)

2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"

3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).

Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).

These events I see in the oVirt web console:

Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Martin Perina

7:27 a.m.

Hi, I found out this in the log: 2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest If I'm not mistaken, this means that VM was properly shutted down from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive? If you want to test fencing properly, then I suggest you to either block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied. Martin On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:

...

engine.log for this period.

...
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?

AFAIK it's not, but I'm not na expert, adding Arik.

You need correct power management setup for the hosts and VM has to be marked as highly available for sure.

...
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:

...
Hi,

could you please share whole engine.log?

Thanks

Martin Perina

On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
Hello oVirt guru`s !

I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.

1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)

2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain- logical-network-vlan-vm-settings-console-guest-agent-live-migration/) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"

3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).

Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).

These events I see in the oVirt web console:

Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after

15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>: that an attempt to fence the host will be issued.

...
...
...
...
Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

aleksey.maksimov＠it-kb.ru

7:40 a.m.

Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down. 15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:

...

Hi,

I found out this in the log:

2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

If I'm not mistaken, this means that VM was properly shutted down from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?

If you want to test fencing properly, then I suggest you to either block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.

Martin

On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
engine.log for this period.

15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:

...
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?

AFAIK it's not, but I'm not na expert, adding Arik.

You need correct power management setup for the hosts and VM has to be marked as highly available for sure.

...
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:

...
Hi,

could you please share whole engine.log?

Thanks

Martin Perina

On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
Hello oVirt guru`s !

I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage.

1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts)

2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High"

3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31).

Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32).

These events I see in the oVirt web console:

Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

What am I doing wrong? Why HA VM not start on a second host? _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

aleksey.maksimov＠it-kb.ru

16 Sep 16 Sep

2:29 a.m.

There are more ideas? 15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" <aleksey.maksimov@it-kb.ru>:

...

Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.

15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:

...
Hi,

I found out this in the log:

2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

If I'm not mistaken, this means that VM was properly shutted down from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?

If you want to test fencing properly, then I suggest you to either block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.

Martin

On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
engine.log for this period.

15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:

...
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?

AFAIK it's not, but I'm not na expert, adding Arik.

You need correct power management setup for the hosts and VM has to be marked as highly available for sure.

...
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>:

...
Hi,

could you please share whole engine.log?

Thanks

Martin Perina

On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: > Hello oVirt guru`s ! > > I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. > > 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) > > 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent > (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) > In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" > > 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). > > Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). > > These events I see in the oVirt web console: > > Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. > Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. > Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. > Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. > Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. > Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. > Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. > Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. > Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. > Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. > Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded > Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest > > What am I doing wrong? Why HA VM not start on a second host? > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

Michal Skrivanek

3:26 a.m.

...

On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru wrote:

There are more ideas?

15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" <aleksey.maksimov@it-kb.ru>:

...
Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.

15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:

...
Hi,

I found out this in the log:

2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

since it shut down cleanly, can you please check the guest's logs to see what triggered the shutdown? In such cases it is considered a user requested shutdown and such VMs are not restarted automatically We are aware of a similar issue on specific hw - https://bugzilla.redhat.com/show_bug.cgi?id=1341106

...

...
...
If I'm not mistaken, this means that VM was properly shutted down from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?

If you want to test fencing properly, then I suggest you to either block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.

Martin

On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
engine.log for this period.

15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:

...
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
Hi Martin. I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?

AFAIK it's not, but I'm not na expert, adding Arik.

You need correct power management setup for the hosts and VM has to be marked as highly available for sure.

...
15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>: > Hi, > > could you please share whole engine.log? > > Thanks > > Martin Perina > > On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: >> Hello oVirt guru`s ! >> >> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >> >> 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) >> >> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >> (As described herein https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-center-...) >> In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" >> >> 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >> >> Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). >> >> These events I see in the oVirt web console: >> >> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. >> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. >> Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. >> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. >> Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. >> Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. >> Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. >> Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. >> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. >> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded >> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest >> >> What am I doing wrong? Why HA VM not start on a second host? >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Martin Perina

6:50 a.m.

On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...

...
On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru wrote:

There are more ideas?

15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" < aleksey.maksimov@it-kb.ru>:

...
Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.

15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:

...
Hi,

I found out this in the log:

2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core. vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal. dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

since it shut down cleanly, can you please check the guest's logs to see what triggered the shutdown? In such cases it is considered a user requested shutdown and such VMs are not restarted automatically

That's exactly what I meant by my response. From the log it's obvious that VM was shutdown properly, so engine will not restart it on a different. host. Also on most modern hosts if you execute power management off action, a signal is sent to OS to execute regular shutdown so VMs are also shutted down properly.

...

We are aware of a similar issue on specific hw - https://bugzilla.redhat.com/show_bug.cgi?id=1341106

...
...
...
If I'm not mistaken, this means that VM was properly shutted down

from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?

...
If you want to test fencing properly, then I suggest you to either

block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.

Try above if you want to test fencing. Of course you can always configure firewall rule to drop all packets between engine and host or unplug host network cable.

...

...
...
...
Martin

On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
engine.log for this period.

15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>:

...
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru>

wrote:

...
...
...
...
> Hi Martin. > I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process?

AFAIK it's not, but I'm not na expert, adding Arik.

You need correct power management setup for the hosts and VM has to be marked as highly available for sure.

> 15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>: >> Hi, >> >> could you please share whole engine.log? >> >> Thanks >> >> Martin Perina >> >> On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: >>> Hello oVirt guru`s ! >>> >>> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >>> >>> 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) >>> >>> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >>> (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain- logical-network-vlan-vm-settings-console-guest-agent-live-migration/) >>> In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" >>> >>> 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >>> >>> Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). >>> >>> These events I see in the oVirt web console: >>> >>> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. >>> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. >>> Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. >>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. >>> Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. >>> Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo: KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. >>> Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. >>> Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. >>> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. >>> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded >>> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest >>> >>> What am I doing wrong? Why HA VM not start on a second host? >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Simone Tiraboschi

7:54 a.m.

On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina <mperina@redhat.com> wrote:

...

On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...
...
On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru wrote:

There are more ideas?

15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" < aleksey.maksimov@it-kb.ru>:

...
Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.

15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:

...
Hi,

I found out this in the log:

2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbb roker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

since it shut down cleanly, can you please check the guest's logs to see what triggered the shutdown? In such cases it is considered a user requested shutdown and such VMs are not restarted automatically

That's exactly what I meant by my response. From the log it's obvious that VM was shutdown properly, so engine will not restart it on a different. host. Also on most modern hosts if you execute power management off action, a signal is sent to OS to execute

regular shutdown so VMs are also shutted down properly.

I understand the reason, but is it really what the user expects? I mean, if I set HA mode on a VM I'd expect the that the engine cares to keep it up of restart if needed regardless of shutdown reasons. For instance, on hosted-engine the HA agent, if not in global maintenance mode, will restart the engine VM regardless of who or why it went off.

...

...
We are aware of a similar issue on specific hw - https://bugzilla.redhat.com/show_bug.cgi?id=1341106

...
...
...
If I'm not mistaken, this means that VM was properly shutted down

from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?

...
If you want to test fencing properly, then I suggest you to either

block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.

Try above if you want to test fencing. Of course you can always configure firewall rule to drop all packets between engine and host or unplug host network cable.

...
...
...
...
Martin

On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
engine.log for this period.

15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>: > On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru>

wrote:

...
...
...
>> Hi Martin. >> I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process? > > AFAIK it's not, but I'm not na expert, adding Arik. > > You need correct power management setup for the hosts and VM has to be marked as highly available for sure. > >> 15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>: >>> Hi, >>> >>> could you please share whole engine.log? >>> >>> Thanks >>> >>> Martin Perina >>> >>> On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: >>>> Hello oVirt guru`s ! >>>> >>>> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >>>> >>>> 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) >>>> >>>> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >>>> (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain-log ical-network-vlan-vm-settings-console-guest-agent-live-migration/) >>>> In VM settings on "High Availability" I turned on the option "Highly Available" and change "Priority" to "High" >>>> >>>> 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >>>> >>>> Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). >>>> >>>> These events I see in the oVirt web console: >>>> >>>> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. >>>> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. >>>> Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. >>>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. >>>> Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. >>>> Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. >>>> Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. >>>> Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. >>>> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. >>>> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded >>>> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest >>>> >>>> What am I doing wrong? Why HA VM not start on a second host? >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Martin Perina

8:23 a.m.

On Fri, Sep 16, 2016 at 1:54 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina <mperina@redhat.com> wrote:

...
On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...
...
On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru wrote:

There are more ideas?

15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru" < aleksey.maksimov@it-kb.ru>:

...
Martin, I physically turned off the server through the iLO2. See screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host shut down.

15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com>:

...
Hi,

I found out this in the log:

2016-09-15 12:02:04,661 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down' 2016-09-15 12:02:04,788 INFO [org.ovirt.engine.core.dal.dbb roker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

since it shut down cleanly, can you please check the guest's logs to see what triggered the shutdown? In such cases it is considered a user requested shutdown and such VMs are not restarted automatically

That's exactly what I meant by my response. From the log it's obvious that VM was shutdown properly, so engine will not restart it on a different. host. Also on most modern hosts if you execute power management off action, a signal is sent to OS to execute

regular shutdown so VMs are also shutted down properly.

I understand the reason, but is it really what the user expects?

I mean, if I set HA mode on a VM I'd expect the that the engine cares to keep it up of restart if needed regardless of shutdown reasons.

AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. For instance, on hosted-engine the HA agent, if not in global maintenance

...

mode, will restart the engine VM regardless of who or why it went off.

Well, HE VM is definitely not a standard HA VM :-)

...

...

...
We are aware of a similar issue on specific hw - https://bugzilla.redhat.com/show_bug.cgi?id=1341106

...
...
...
If I'm not mistaken, this means that VM was properly shutted down

from within itself and in that case it's not restarted automatically. So I'm curious what actions have you made to make host KOM-AD01-VM31 non-responsive?

...
If you want to test fencing properly, then I suggest you to either

block connection between host and engine on host side and forcibly stop ovirtmgmt network interface on host and watch fencing is applied.

Try above if you want to test fencing. Of course you can always configure firewall rule to drop all packets between engine and host or unplug host network cable.

...
...
...
...
Martin

On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru> wrote: > engine.log for this period. > > 15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com>: >> On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru>

...
...
>>> Hi Martin. >>> I have a stupid question. Use Watchdog device mandatory to automatically start a virtual machine in host Fencing process? >> >> AFAIK it's not, but I'm not na expert, adding Arik. >> >> You need correct power management setup for the hosts and VM has to be marked as highly available for sure. >> >>> 15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com>: >>>> Hi, >>>> >>>> could you please share whole engine.log? >>>> >>>> Thanks >>>> >>>> Martin Perina >>>> >>>> On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru> wrote: >>>>> Hello oVirt guru`s ! >>>>> >>>>> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >>>>> >>>>> 1. I configured Power Management for the Hosts (successfully added Fencing Agent for iLO2 from my hosts) >>>>> >>>>> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >>>>> (As described herein https://blog.it-kb.ru/2016/09/ 14/install-ovirt-4-0-part-2-about-data-center-iso-domain-log ical-network-vlan-vm-settings-console-guest-agent-live-migration/) >>>>> In VM settings on "High Availability" I turned on the

wrote: option "Highly Available" and change "Priority" to "High"

...
...
>>>>> >>>>> 3. Now I'm trying to check Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >>>>> >>>>> Fencing successfully works and server is automatically turned on, but my HA VM not started on second host (KOM-AD01-VM32). >>>>> >>>>> These events I see in the oVirt web console: >>>>> >>>>> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power management was verified successfully. >>>>> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 was set to Up. >>>>> Sep 15, 2016 12:08:05 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is rebooting. >>>>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was started by SYSTEM. >>>>> Sep 15, 2016 12:05:48 PM Power management start of Host KOM-AD01-VM31 succeeded. >>>>> Sep 15, 2016 12:05:41 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:19 PM Executing power management start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:19 PM Power management start of Host KOM-AD01-VM31 initiated. >>>>> Sep 15, 2016 12:05:19 PM Auto fence for host KOM-AD01-VM31 was started. >>>>> Sep 15, 2016 12:05:11 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:04 PM Executing power management status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:KOM-AD01-ILO31.holding.com. >>>>> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non responsive. >>>>> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not responding. It will stay in Connecting state for a grace period of 60 seconds and after that an attempt to fence the host will be issued. >>>>> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command failed: Heartbeat exeeded >>>>> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest >>>>> >>>>> What am I doing wrong? Why HA VM not start on a second host? >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Michal Skrivanek

8:50 a.m.

--Apple-Mail=_888E7FA5-6914-4728-8646-862772EBB5CF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8

...

On 16 Sep 2016, at 14:23, Martin Perina <mperina@redhat.com> wrote: =20 =20 =20 On Fri, Sep 16, 2016 at 1:54 PM, Simone Tiraboschi = <stirabos@redhat.com <mailto:stirabos@redhat.com>> wrote: =20 =20 On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina <mperina@redhat.com = <mailto:mperina@redhat.com>> wrote: =20 =20 On Fri, Sep 16, 2016 at 9:26 AM, Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> = wrote: =20

...
On 16 Sep 2016, at 08:29, aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru> wrote:

There are more ideas?

15.09.2016, 14:40, "aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru>" <aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru>>:

...
Martin, I physically turned off the server through the iLO2. See = screenshots. I did not touch Virtual Machine (KOM-AD01-PBX02) at the same time. The virtual machine has been turned on at the time when the host = shut down.

15.09.2016, 14:27, "Martin Perina" <mperina@redhat.com = <mailto:mperina@redhat.com>>:

...
Hi,

I found out this in the log:

2016-09-15 12:02:04,661 INFO = [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] = (ForkJoinPool-1-worker-6) [] VM = '660bafca-e9c3-4191-99b4-295ff8553488'(KOM-AD01-PBX02) moved from 'Up' = --> 'Down' 2016-09-15 12:02:04,788 INFO = [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] = (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, = Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: = User shut down from within the guest =20 since it shut down cleanly, can you please check the guest's logs to = see what triggered the shutdown? In such cases it is considered a user = requested shutdown and such VMs are not restarted automatically =20 =E2=80=8BThat's exactly what I meant by my response. =46rom the log = it's obvious that VM was shutdown properly, so engine will not restart = it on a different. host. Also on most modern hosts if you execute power = management off action, a signal is sent to OS to execute =E2=80=8B = =E2=80=8Bregular shutdown so VMs are also shutted down properly. =20 I understand the reason, but is it really what the user expects? =20 I mean, if I set HA mode on a VM I'd expect the that the engine cares = to keep it up of restart if needed regardless of shutdown reasons.

no, that=E2=80=99s not how HA works today. When you log into a guest and = issue =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your = hands. We can argue how it should or may work, but this is the defined = behavior since the dawn of oVirt.

...

=20 =E2=80=8BAFAIK that's correct, we need to be able =E2=80=8B=E2=80=8Bshut= down HA VM=E2=80=8B=E2=80=8B=E2=80=8B without being it immediately = restarted on different host. We want to restart HA VM only if host, = where HA VM is running, is non-responsive.

...

=20 For instance, on hosted-engine the HA agent, if not in global =

...

=20 =E2=80=8BWell, HE VM is definitely not a standard HA VM :-) =E2=80=8B=20 =20 =20 =E2=80=8B We are aware of a similar issue on specific hw - = https://bugzilla.redhat.com/show_bug.cgi?id=3D1341106 = <https://bugzilla.redhat.com/show_bug.cgi?id=3D1341106> =20

...
...
...
If I'm not mistaken, this means that VM was properly shutted down =

from within itself and in that case it's not restarted automatically. So = I'm curious what actions have you made to make host KOM-AD01-VM31 = non-responsive?

...
If you want to test fencing properly, then I suggest you to =

either block connection between host and engine on host side and = forcibly stop ovirtmgmt network interface on host and watch fencing is = applied. =20 =E2=80=8BTry above if you want to test fencing. Of course you can = always configure firewall rule to drop all packets between engine and = host or unplug host network cable=E2=80=8B. =20

...
Martin

On Thu, Sep 15, 2016 at 1:16 PM, <aleksey.maksimov@it-kb.ru =

<mailto:aleksey.maksimov@it-kb.ru>> wrote:

...
...
engine.log for this period.

15.09.2016, 14:01, "Martin Perina" <mperina@redhat.com = <mailto:mperina@redhat.com>>:

...
On Thu, Sep 15, 2016 at 12:47 PM, <aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru>> wrote: > Hi Martin. > I have a stupid question. Use Watchdog device mandatory to = automatically start a virtual machine in host Fencing process?

=E2=80=8BAFAIK it's not, but I'm not na expert, adding Arik.

You need correct power management setup for the hosts and VM = has to be marked as highly available=E2=80=8B for sure.

> 15.09.2016, 13:43, "Martin Perina" <mperina@redhat.com = <mailto:mperina@redhat.com>>: >> Hi, >> >> could you please share whole engine.log? >> >> Thanks >> >> Martin Perina >> >> On Thu, Sep 15, 2016 at 12:01 PM, <aleksey.maksimov@it-kb.ru = <mailto:aleksey.maksimov@it-kb.ru>> wrote: >>> Hello oVirt guru`s ! >>> >>> I have oVirt Hosted Engine 4.0.3-1.el7.centos on two CentOS = 7.2 hosts (HP ProLiant DL 360 G5) connected to shared FC SAN Storage. >>> >>> 1. I configured Power Management for the Hosts (successfully = added Fencing Agent for iLO2 from my hosts) >>> >>> 2. I created new VM (KOM-AD01-PBX02) and installed Guest OS = (Ubuntu Server 16.04 LTS) and oVirt Guest Agent >>> (As described herein = https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-cente= r-iso-domain-logical-network-vlan-vm-settings-console-guest-agent-live-mig= ration/ = <https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-data-cent= er-iso-domain-logical-network-vlan-vm-settings-console-guest-agent-live-mi= gration/>) >>> In VM settings on "High Availability" I turned on the =

we try to restart it in all other cases other than user initiated = shutdown, e.g. a QEMU process crash on an otherwise-healthy host maintenance mode, will restart the engine VM regardless of who or why it = went off. option "Highly Available" and change "Priority" to "High"

...

...
...
...
...
...
>>> >>> 3. Now I'm trying to check Hard-Fencing and power off my = first host (KOM-AD01-VM31) from his iLO (KOM-AD01-ILO31). >>> >>> Fencing successfully works and server is automatically = turned on, but my HA VM not started on second host (KOM-AD01-VM32). >>> >>> These events I see in the oVirt web console: >>> >>> Sep 15, 2016 12:08:13 PM Host KOM-AD01-VM31 power = management was verified successfully. >>> Sep 15, 2016 12:08:13 PM Status of host KOM-AD01-VM31 = was set to Up. >>> Sep 15, 2016 12:08:05 PM Executing power management = status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 is = rebooting. >>> Sep 15, 2016 12:05:48 PM Host KOM-AD01-VM31 was = started by SYSTEM. >>> Sep 15, 2016 12:05:48 PM Power management start of = Host KOM-AD01-VM31 succeeded. >>> Sep 15, 2016 12:05:41 PM Executing power management = status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:19 PM Executing power management = start on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:19 PM Power management start of = Host KOM-AD01-VM31 initiated. >>> Sep 15, 2016 12:05:19 PM Auto fence for host = KOM-AD01-VM31 was started. >>> Sep 15, 2016 12:05:11 PM Executing power management = status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:04 PM Executing power management = status on Host KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence = Agent ilo:KOM-AD01-ILO31.holding.com = <http://kom-ad01-ilo31.holding.com/>. >>> Sep 15, 2016 12:05:04 PM Host KOM-AD01-VM31 is non = responsive. >>> Sep 15, 2016 12:02:32 PM Host KOM-AD01-VM31 is not = responding. It will stay in Connecting state for a grace period of 60 = seconds and after that an attempt to fence the host will be issued. >>> Sep 15, 2016 12:02:32 PM VDSM KOM-AD01-VM31 command = failed: Heartbeat exeeded >>> Sep 15, 2016 12:02:04 PM VM KOM-AD01-PBX02 is down. = Exit message: User shut down from within the guest >>> >>> What am I doing wrong? Why HA VM not start on a second host? >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org <mailto:Users@ovirt.org> >>> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>

Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>

=20 =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 =20

--Apple-Mail=_888E7FA5-6914-4728-8646-862772EBB5CF Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 16 Sep 2016, at 14:23, Martin Perina <<a = href=3D"mailto:mperina@redhat.com" class=3D"">mperina@redhat.com</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif"><br class=3D""></div><div= class=3D"gmail_extra"><br class=3D""><div class=3D"gmail_quote">On Fri, = Sep 16, 2016 at 1:54 PM, Simone Tiraboschi <span dir=3D"ltr" = class=3D""><<a href=3D"mailto:stirabos@redhat.com" target=3D"_blank" = class=3D"">stirabos@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" = class=3D""><br class=3D""><div class=3D"gmail_extra"><br class=3D""><div = class=3D"gmail_quote">On Fri, Sep 16, 2016 at 12:50 PM, Martin Perina = <span dir=3D"ltr" class=3D""><<a href=3D"mailto:mperina@redhat.com" = target=3D"_blank" class=3D"">mperina@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" = class=3D""><div style=3D"font-family:arial,helvetica,sans-serif" = class=3D""><br class=3D""></div><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote"><span class=3D"">On Fri, Sep 16, = 2016 at 9:26 AM, Michal Skrivanek <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:michal.skrivanek@redhat.com" target=3D"_blank" = class=3D"">michal.skrivanek@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class=3D""> > On 16 Sep 2016, at 08:29, <a = href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a> wrote:<br class=3D""> ><br class=3D""> > There are more ideas?<br class=3D""> ><br class=3D""> > 15.09.2016, 14:40, "<a href=3D"mailto:aleksey.maksimov@it-kb.ru" = target=3D"_blank" class=3D"">aleksey.maksimov@it-kb.ru</a>" <<a = href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a>>:<br class=3D""> >> Martin, I physically turned off the server through the iLO2. = See screenshots.<br class=3D""> >> I did not touch Virtual Machine (KOM-AD01-PBX02) at the same = time.<br class=3D""> >> The virtual machine has been turned on at the time when the = host shut down.<br class=3D""> >><br class=3D""> >> 15.09.2016, 14:27, "Martin Perina" <<a = href=3D"mailto:mperina@redhat.com" target=3D"_blank" = class=3D"">mperina@redhat.com</a>>:<br class=3D""> >>> Hi,<br class=3D""> >>><br class=3D""> >>> I found out this in the log:<br class=3D""> >>><br class=3D""> >>> 2016-09-15 12:02:04,661 INFO = [org.ovirt.engine.core.vdsbrok<wbr class=3D"">er.monitoring.VmAnalyzer] = (ForkJoinPool-1-worker-6) [] VM '660bafca-e9c3-4191-99b4-295ff<wbr = class=3D"">8553488'(KOM-AD01-PBX02) moved from 'Up' --> 'Down'<br = class=3D""> >>> 2016-09-15 12:02:04,788 INFO = [org.ovirt.engine.core.dal.dbb<wbr = class=3D"">roker.auditloghandling.AuditLo<wbr class=3D"">gDirector] = (ForkJoinPool-1-worker-6) [] Correlation ID: null, Call Stack: null, = Custom Event ID: -1, Message: VM KOM-AD01-PBX02 is down. Exit message: = User shut down from within the guest<br class=3D""> <br class=3D""> since it shut down cleanly, can you please check the guest's logs to see = what triggered the shutdown? In such cases it is considered a user = requested shutdown and such VMs are not restarted automatically<br = class=3D""></blockquote></span><div class=3D""><br class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8BThat's exactly what I meant by my response. =46rom = the log it's obvious that VM was shutdown properly, so engine will not = restart it on a different. host. Also on most modern hosts if you = execute power management off action, a signal is sent to OS to execute = =E2=80=8B</div> <div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8Bregular shutdown so VMs are also shutted down = properly.<br class=3D""></div></div></div></div></div></blockquote><div = class=3D""><br class=3D""></div><div class=3D"">I understand the reason, = but is it really what the user expects?</div><div class=3D""><br = class=3D""></div><div class=3D"">I mean, if I set HA mode on a VM I'd = expect the that the engine cares to keep it up of restart if needed = regardless of shutdown = reasons.</div></div></div></div></blockquote></div></div></div></div></blo= ckquote><div><br class=3D""></div>no, that=E2=80=99s not how HA works = today. When you log into a guest and issue =E2=80=9Cshutdown=E2=80=9D we = do not restart the VM under your hands. We can argue how it should or = may work, but this is the defined behavior since the dawn of = oVirt.</div><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D""><br class=3D""><div = class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8BA= FAIK that's correct, we need to be able =E2=80=8B</div><div = class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8Bs= hutdown HA VM=E2=80=8B</div>=E2=80=8B<div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8B = without being it immediately restarted on different host. We want to = restart HA VM only if host, where HA VM is running, is = non-responsive.<br = class=3D""></div></div></div></div></div></div></blockquote><div><br = class=3D""></div>we try to restart it in all other cases other than user = initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy = host</div><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D""><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline"><br = class=3D""></div></div><blockquote class=3D"gmail_quote" style=3D"margin:0= 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" = class=3D""><div class=3D"gmail_extra"><div class=3D"gmail_quote"><div = class=3D"">For instance, on hosted-engine the HA agent, if not in global = maintenance mode, will restart the engine VM regardless of who or why it = went off.</div></div></div></div></blockquote><div class=3D""><br = class=3D""><div class=3D"gmail_default" = style=3D"font-family:arial,helvetica,sans-serif;display:inline">=E2=80=8BW= ell, HE VM is definitely not a standard HA VM :-)<br = class=3D"">=E2=80=8B</div> </div><blockquote class=3D"gmail_quote" = style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><div class=3D""><br = class=3D""></div><div class=3D""> </div><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><div class=3D"gmail_quote"><div class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8B</div></div><span class=3D""><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"> We are aware of a similar issue on specific hw - <a = href=3D"https://bugzilla.redhat.com/show_bug.cgi?id=3D1341106" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">https://bugzilla.redhat.com/sh<wbr = class=3D"">ow_bug.cgi?id=3D1341106</a><br class=3D""> <br class=3D""> >>><br class=3D""> >>> If I'm not mistaken, this means that VM was properly = shutted down from within itself and in that case it's not restarted = automatically. So I'm curious what actions have you made to make host = KOM-AD01-VM31 non-responsive?<br class=3D""> >>><br class=3D""> >>> If you want to test fencing properly, then I suggest = you to either block connection between host and engine on host side and = forcibly stop ovirtmgmt network interface on host and watch fencing is = applied.<br class=3D""></blockquote></span><div class=3D""><br = class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8BTry above if you want to test fencing. Of course you = can always configure firewall rule to drop all packets between engine = and host or unplug host network cable=E2=80=8B.<br class=3D""><br = class=3D""></div></div><div class=3D""><div class=3D""><blockquote = class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex"> >>><br class=3D""> >>> Martin<br class=3D""> >>><br class=3D""> >>> On Thu, Sep 15, 2016 at 1:16 PM, <<a = href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a>> wrote:<br class=3D""> >>>> engine.log for this period.<br class=3D""> >>>><br class=3D""> >>>> 15.09.2016, 14:01, "Martin Perina" <<a = href=3D"mailto:mperina@redhat.com" target=3D"_blank" = class=3D"">mperina@redhat.com</a>>:<br class=3D""> >>>>> On Thu, Sep 15, 2016 at 12:47 PM, <<a = href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a>> wrote:<br class=3D""> >>>>>> Hi Martin.<br class=3D""> >>>>>> I have a stupid question. Use Watchdog = device mandatory to automatically start a virtual machine in host = Fencing process?<br class=3D""> >>>>><br class=3D""> >>>>> =E2=80=8BAFAIK it's not, but I'm not na = expert, adding Arik.<br class=3D""> >>>>><br class=3D""> >>>>> You need correct power management setup for = the hosts and VM has to be marked as highly available=E2=80=8B for = sure.<br class=3D""> >>>>><br class=3D""> >>>>>> 15.09.2016, 13:43, "Martin Perina" <<a = href=3D"mailto:mperina@redhat.com" target=3D"_blank" = class=3D"">mperina@redhat.com</a>>:<br class=3D""> >>>>>>> Hi,<br class=3D""> >>>>>>><br class=3D""> >>>>>>> could you please share whole = engine.log?<br class=3D""> >>>>>>><br class=3D""> >>>>>>> Thanks<br class=3D""> >>>>>>><br class=3D""> >>>>>>> Martin Perina<br class=3D""> >>>>>>><br class=3D""> >>>>>>> On Thu, Sep 15, 2016 at 12:01 PM, = <<a href=3D"mailto:aleksey.maksimov@it-kb.ru" target=3D"_blank" = class=3D"">aleksey.maksimov@it-kb.ru</a>> wrote:<br class=3D""> >>>>>>>> Hello oVirt guru`s !<br class=3D"">= >>>>>>>><br class=3D""> >>>>>>>> I have oVirt Hosted Engine = 4.0.3-1.el7.centos on two CentOS 7.2 hosts (HP ProLiant DL 360 G5) = connected to shared FC SAN Storage.<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> 1. I configured Power Management = for the Hosts (successfully added Fencing Agent for iLO2 from my = hosts)<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> 2. I created new VM = (KOM-AD01-PBX02) and installed Guest OS (Ubuntu Server 16.04 LTS) and = oVirt Guest Agent<br class=3D""> >>>>>>>> (As described herein <a = href=3D"https://blog.it-kb.ru/2016/09/14/install-ovirt-4-0-part-2-about-da= ta-center-iso-domain-logical-network-vlan-vm-settings-console-guest-agent-= live-migration/" rel=3D"noreferrer" target=3D"_blank" = class=3D"">https://blog.it-kb.ru/2016/09/<wbr = class=3D"">14/install-ovirt-4-0-part-2-ab<wbr = class=3D"">out-data-center-iso-domain-log<wbr = class=3D"">ical-network-vlan-vm-settings-<wbr = class=3D"">console-guest-agent-live-migra<wbr class=3D"">tion/</a>)<br = class=3D""> >>>>>>>> In VM settings on = "High Availability" I turned on the option "Highly Available" and change = "Priority" to "High"<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> 3. Now I'm trying to check = Hard-Fencing and power off my first host (KOM-AD01-VM31) from his iLO = (KOM-AD01-ILO31).<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> Fencing successfully works and = server is automatically turned on, but my HA VM not started on second = host (KOM-AD01-VM32).<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> These events I see in the oVirt = web console:<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> Sep 15, 2016 12:08:13 PM = Host KOM-AD01-VM31 power management was verified = successfully.<br class=3D""> >>>>>>>> Sep 15, 2016 12:08:13 PM = Status of host KOM-AD01-VM31 was set to Up.<br = class=3D""> >>>>>>>> Sep 15, 2016 12:08:05 PM = Executing power management status on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:48 PM = Host KOM-AD01-VM31 is rebooting.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:48 PM = Host KOM-AD01-VM31 was started by SYSTEM.<br = class=3D""> >>>>>>>> Sep 15, 2016 12:05:48 PM = Power management start of Host KOM-AD01-VM31 = succeeded.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:41 PM = Executing power management status on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:19 PM = Executing power management start on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:19 PM = Power management start of Host KOM-AD01-VM31 = initiated.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:19 PM = Auto fence for host KOM-AD01-VM31 was started.<br = class=3D""> >>>>>>>> Sep 15, 2016 12:05:11 PM = Executing power management status on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:04 PM = Executing power management status on Host = KOM-AD01-VM31 using Proxy Host KOM-AD01-VM32 and Fence Agent ilo:<a = href=3D"http://kom-ad01-ilo31.holding.com/" rel=3D"noreferrer" = target=3D"_blank" class=3D"">KOM-AD01-ILO31.holding.com</a><wbr = class=3D"">.<br class=3D""> >>>>>>>> Sep 15, 2016 12:05:04 PM = Host KOM-AD01-VM31 is non responsive.<br class=3D""> >>>>>>>> Sep 15, 2016 12:02:32 PM = Host KOM-AD01-VM31 is not responding. It will stay = in Connecting state for a grace period of 60 seconds and after that an = attempt to fence the host will be issued.<br class=3D""> >>>>>>>> Sep 15, 2016 12:02:32 PM = VDSM KOM-AD01-VM31 command failed: Heartbeat = exeeded<br class=3D""> >>>>>>>> Sep 15, 2016 12:02:04 PM = VM KOM-AD01-PBX02 is down. Exit message: User shut = down from within the guest<br class=3D""> >>>>>>>><br class=3D""> >>>>>>>> What am I doing wrong? Why HA VM = not start on a second host?<br class=3D""> >>>>>>>> = ______________________________<wbr class=3D"">_________________<br = class=3D""> >>>>>>>> Users mailing list<br class=3D""> >>>>>>>> <a href=3D"mailto:Users@ovirt.org" = target=3D"_blank" class=3D"">Users@ovirt.org</a><br class=3D""> >>>>>>>> <a = href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer" = target=3D"_blank" class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/users</a><br class=3D""> > ______________________________<wbr class=3D"">_________________<br = class=3D""> > Users mailing list<br class=3D""> > <a href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""> > <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/users</a><br class=3D""> ><br class=3D""> ><br class=3D""> <br class=3D""> </blockquote></div></div></div><br class=3D""></div></div> <br class=3D"">______________________________<wbr = class=3D"">_________________<br class=3D""> Users mailing list<br class=3D""> <a href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman<wbr = class=3D"">/listinfo/users</a><br class=3D""> <br class=3D""></blockquote></div><br class=3D""></div></div> </blockquote></div><br class=3D""></div></div> </div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_888E7FA5-6914-4728-8646-862772EBB5CF--

Gianluca Cecchi

9:05 a.m.

On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...

no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt.

AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive.

we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host

Hi, just another question in case HA is not configured at all. If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes? Thanks, Gianluca

Michal Skrivanek

9:13 a.m.

...

On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> = wrote: =20 On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> = wrote: =20 no, that=E2=80=99s not how HA works today. When you log into a guest = and issue =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your = hands. We can argue how it should or may work, but this is the defined = behavior since the dawn of oVirt. =20

...
=20 =E2=80=8BAFAIK that's correct, we need to be able =E2=80=8B=E2=80=8Bshu=

--Apple-Mail=_3EFDE72F-C600-476D-A90A-60B2A94A261E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 tdown HA VM=E2=80=8B=E2=80=8B=E2=80=8B without being it immediately = restarted on different host. We want to restart HA VM only if host, = where HA VM is running, is non-responsive.

...

=20 we try to restart it in all other cases other than user initiated = shutdown, e.g. a QEMU process crash on an otherwise-healthy host =20 =20 Hi, just another question in case HA is not configured at all.

by =E2=80=9CHA configured=E2=80=9D I expect you=E2=80=99re referring to = the =E2=80=9CHighly Available=E2=80=9D checkbox in Edit VM dialog.

...

If I run the "shutdown -h now" command on an host where some VMs are = running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't = complete?) or crash of their related QEMU processes?

expectation is that you won=E2=80=99t do that. That=E2=80=99s why there = is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will = be terminated in a regular system way, i.e. all QEMU processes get = SIGTERM. =46rom the perspective of each guest this is not a clean = shutdown and it would just get killed=20 Thanks, michal

...

=20 Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Apple-Mail=_3EFDE72F-C600-476D-A90A-60B2A94A261E Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 16 Sep 2016, at 15:05, Gianluca Cecchi <<a = href=3D"mailto:gianluca.cecchi@gmail.com" = class=3D"">gianluca.cecchi@gmail.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" = class=3D""><div class=3D"gmail_extra"><div class=3D"gmail_quote">On Fri, = Sep 16, 2016 at 2:50 PM, Michal Skrivanek <span dir=3D"ltr" = class=3D""><<a href=3D"mailto:michal.skrivanek@redhat.com" = target=3D"_blank" class=3D"">michal.skrivanek@redhat.com</a>></span> = wrote:<br class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 = 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div = style=3D"word-wrap:break-word" class=3D""><div class=3D""><span = class=3D""><div class=3D""><br class=3D""></div></span>no, that=E2=80=99s = not how HA works today. When you log into a guest and issue = =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your hands. We = can argue how it should or may work, but this is the defined behavior = since the dawn of oVirt.</div><div class=3D""><span class=3D""><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D""><br class=3D""><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8BAFAIK that's correct, we need to be able = =E2=80=8B</div><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8Bshutdown HA VM=E2=80=8B</div>=E2=80=8B<div = style=3D"font-family:arial,helvetica,sans-serif;display:inline" = class=3D"">=E2=80=8B without being it immediately restarted on different = host. We want to restart HA VM only if host, where HA VM is running, is = non-responsive.<br = class=3D""></div></div></div></div></div></div></blockquote><div = class=3D""><br class=3D""></div></span>we try to restart it in all other = cases other than user initiated shutdown, e.g. a QEMU process crash on = an otherwise-healthy host</div><div class=3D""><div class=3D"h5"><div = class=3D""><br class=3D""></div></div></div></div></blockquote></div><br = class=3D""></div><div class=3D"gmail_extra">Hi, just another question in = case HA is not configured at all.</div></div></div></blockquote><div><br = class=3D""></div>by =E2=80=9CHA configured=E2=80=9D I expect you=E2=80=99r= e referring to the =E2=80=9CHighly Available=E2=80=9D checkbox in Edit = VM dialog.</div><div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra">If I run the "shutdown -h now" command on an host = where some VMs are running, what is the expected behavior?</div><div = class=3D"gmail_extra">Clean VM shutdown (with or without timeout in case = it doesn't complete?) or crash of their related QEMU = processes?</div></div></div></blockquote><div><br = class=3D""></div>expectation is that you won=E2=80=99t do that. That=E2=80= =99s why there is the Maintenance host state.</div><div>But if you do = that regardless, with VMs running, all the processes will be terminated = in a regular system way, i.e. all QEMU processes get SIGTERM. =46rom the = perspective of each guest this is not a clean shutdown and it would just = get killed </div><div><br = class=3D""></div><div>Thanks,</div><div>michal<br class=3D""><blockquote = type=3D"cite" class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><div = class=3D"gmail_extra"><br class=3D""></div><div = class=3D"gmail_extra">Thanks,</div><div = class=3D"gmail_extra">Gianluca</div></div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_3EFDE72F-C600-476D-A90A-60B2A94A261E--

Gianluca Cecchi

9:18 a.m.

On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...

On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...
no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt.

AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive.

we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host

Hi, just another question in case HA is not configured at all.

by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.

Yes

...

If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?

expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed

Yes, I was thinking about the scenario of one guy issuing the command (or pressing the button) by mistake. Thanks, Gianluca

Simone Tiraboschi

9:25 a.m.

On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...

On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...
no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt.

AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive.

we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host

Hi, just another question in case HA is not configured at all.

by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.

If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?

expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed

Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.

...

Thanks, michal

Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

aleksey.maksimov＠it-kb.ru

9:31 a.m.

Michal Skrivanek

9:34 a.m.

--Apple-Mail=_AF0DFAC1-FA37-4D5C-A60C-1F7D8972E4B0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8

...

On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote: =20 Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the = guest off.

...

=20 16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:

...
=20 =20 On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> = wrote: =20

...
On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com = <mailto:gianluca.cecchi@gmail.com>> wrote: =20 On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek = <michal.skrivanek@redhat.com <mailto:michal.skrivanek@redhat.com>> = wrote: =20 no, that=E2=80=99s not how HA works today. When you log into a guest = and issue =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your = hands. We can argue how it should or may work, but this is the defined = behavior since the dawn of oVirt. =20

...
=20 =E2=80=8BAFAIK that's correct, we need to be able =E2=80=8B=E2=80=8B= shutdown HA VM=E2=80=8B=E2=80=8B=E2=80=8B without being it immediately = restarted on different host. We want to restart HA VM only if host, = where HA VM is running, is non-responsive. =20 we try to restart it in all other cases other than user initiated = shutdown, e.g. a QEMU process crash on an otherwise-healthy host =20 Hi, just another question in case HA is not configured at all. =20 by =E2=80=9CHA configured=E2=80=9D I expect you=E2=80=99re referring = to the =E2=80=9CHighly Available=E2=80=9D checkbox in Edit VM dialog. =20 =20 If I run the "shutdown -h now" command on an host where some VMs are = running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't = complete?) or crash of their related QEMU processes? =20 expectation is that you won=E2=80=99t do that. That=E2=80=99s why =

great. thanks there is the Maintenance host state.

...

...
But if you do that regardless, with VMs running, all the processes = will be terminated in a regular system way, i.e. all QEMU processes get = SIGTERM. =46rom the perspective of each guest this is not a clean = shutdown and it would just get killed=20 =20 =20 Aleksey is reporting that he started a shutdown on his host by power = management and the VM processes didn't get roughly killed but smoothly = shut down and so they didn't restarted regardless of their HA flag and = so this thread.=20

Gianluca talks about =E2=80=9Cshutdown -h now=E2=80=9D, you talk about = power management action, those are two different things. The current = idea is that systemd or some other component just propagates the action = to the guest and if that guest is configured to handle it as a shutdown = it starts it itself as well so it looks like a user-initiated one. Even = though this mostly makes sense it is not ok for current HA logic

...

...
=20 =20 Thanks, michal

...
=20 =20 Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users>

Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Apple-Mail=_AF0DFAC1-FA37-4D5C-A60C-1F7D8972E4B0 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D""><br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On 16 Sep 2016, at 15:31, <a = href=3D"mailto:aleksey.maksimov@it-kb.ru" = class=3D"">aleksey.maksimov@it-kb.ru</a> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div class=3D"">Hi = Simone.</div><div class=3D"">Exactly.</div><div class=3D"">Now I'll put = the journald on the guest and try to understand how the guest = off.</div></div></blockquote><div><br class=3D""></div>great. = thanks</div><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div class=3D""> </div><div class=3D"">16.09.2016, = 16:25, "Simone Tiraboschi" <<a href=3D"mailto:stirabos@redhat.com" = class=3D"">stirabos@redhat.com</a>>:</div><blockquote type=3D"cite" = class=3D""><div class=3D""> <div class=3D""> <div class=3D"">On = Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <span class=3D""><<a = href=3D"mailto:michal.skrivanek@redhat.com" target=3D"_blank" = class=3D"">michal.skrivanek@redhat.com</a>></span> wrote:<blockquote = style=3D"margin:0px 0px 0px = 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-widt= h:1px;border-left-style:solid;" class=3D""><div class=3D""> <div = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><span = class=3D"">On 16 Sep 2016, at 15:05, Gianluca Cecchi <<a = href=3D"mailto:gianluca.cecchi@gmail.com" target=3D"_blank" = class=3D"">gianluca.cecchi@gmail.com</a>> = wrote:</span></div> <div class=3D""><div class=3D""><div = class=3D""><div class=3D""><span class=3D"">On Fri, Sep 16, 2016 at 2:50 = PM, Michal Skrivanek <span class=3D""><<a = href=3D"mailto:michal.skrivanek@redhat.com" target=3D"_blank" = class=3D"">michal.skrivanek@redhat.com</a>></span> = wrote:</span><blockquote style=3D"margin:0px 0px 0px = 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-widt= h:1px;border-left-style:solid;" class=3D""><div class=3D""><div = class=3D""><div class=3D""> </div><span class=3D"">no, that=E2=80=99s= not how HA works today. When you log into a guest and issue = =E2=80=9Cshutdown=E2=80=9D we do not restart the VM under your hands. We = can argue how it should or may work, but this is the defined behavior = since the dawn of oVirt.</span></div><div class=3D""> <blockquote = type=3D"cite" class=3D""><div class=3D""><div class=3D""><div = class=3D""><div class=3D""><div class=3D""> <div = style=3D"font-family:arial,helvetica,sans-serif;display:inline;" = class=3D""><span class=3D""><span class=3D"">=E2=80=8BAFAIK that's = correct, we need to be able =E2=80=8B</span></span></div><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline;" = class=3D""><span class=3D""><span class=3D"">=E2=80=8Bshutdown HA = VM=E2=80=8B</span></span></div><span class=3D""><span = class=3D"">=E2=80=8B</span></span><div = style=3D"font-family:arial,helvetica,sans-serif;display:inline;" = class=3D""><span class=3D""><span class=3D"">=E2=80=8B without being it = immediately restarted on different host. We want to restart HA VM only = if host, where HA VM is running, is = non-responsive.</span></span></div></div></div></div></div></div></blockqu= ote><div class=3D""> </div><span class=3D"">we try to restart it in = all other cases other than user initiated shutdown, e.g. a QEMU process = crash on an otherwise-healthy host</span></div><div class=3D""><div = class=3D""><div = class=3D""> </div></div></div></div></blockquote></div></div><div = class=3D""><span class=3D"">Hi, just another question in case HA is not = configured at all.</span></div></div></div></blockquote><div = class=3D""> </div>by =E2=80=9CHA configured=E2=80=9D I expect = you=E2=80=99re referring to the =E2=80=9CHighly Available=E2=80=9D = checkbox in Edit VM dialog.</div><div class=3D""> <blockquote = type=3D"cite" class=3D""><div class=3D""><div class=3D""><div = class=3D""><span class=3D"">If I run the "shutdown -h now" command on an = host where some VMs are running, what is the expected = behavior?</span></div><div class=3D""><span class=3D"">Clean VM shutdown = (with or without timeout in case it doesn't complete?) or crash of their = related QEMU processes?</span></div></div></div></blockquote><div = class=3D""> </div>expectation is that you won=E2=80=99t do that. = That=E2=80=99s why there is the Maintenance host state.</div><div = class=3D"">But if you do that regardless, with VMs running, all the = processes will be terminated in a regular system way, i.e. all QEMU = processes get SIGTERM. =46rom the perspective of each guest this is not = a clean shutdown and it would just get killed </div><div = class=3D""> </div></div></blockquote><div class=3D""> </div><div= class=3D"">Aleksey is reporting that he started a shutdown on his host = by power management and the VM processes didn't get roughly killed = but smoothly shut down and so they didn't restarted regardless of their = HA flag and so this = thread. </div></div></div></div></blockquote></div></blockquote><div>= <br class=3D""></div>Gianluca talks about =E2=80=9Cshutdown -h now=E2=80=9D= , you talk about power management action, those are two different = things. The current idea is that systemd or some other component just = propagates the action to the guest and if that guest is configured to = handle it as a shutdown it starts it itself as well so it looks like a = user-initiated one. Even though this mostly makes sense it is not ok for = current HA logic</div><div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div class=3D""><div class=3D""><div = class=3D""> </div><blockquote style=3D"margin:0px 0px 0px = 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-widt= h:1px;border-left-style:solid;" class=3D""><div class=3D""><div = class=3D""> </div><div class=3D"">Thanks,</div><div = class=3D"">michal<blockquote type=3D"cite" class=3D""><div class=3D""><div= class=3D""><div class=3D""> </div><div class=3D"">Thanks,</div><div = class=3D"">Gianluca</div></div><span = class=3D"">_______________________________________________<br = class=3D"">Users mailing list<br class=3D""><a = href=3D"mailto:Users@ovirt.org" target=3D"_blank" = class=3D"">Users@ovirt.org</a><br class=3D""><a = href=3D"http://lists.ovirt.org/mailman/listinfo/users" target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a></span></div><= /blockquote></div></div><br = class=3D"">_______________________________________________<br = class=3D"">Users mailing list<br class=3D""><a = href=3D"mailto:Users@ovirt.org" class=3D"">Users@ovirt.org</a><br = class=3D""><a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = target=3D"_blank" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a><br = class=3D""> </blockquote></div></div></div></blockquote>_____________= __________________________________<br class=3D"">Users mailing list<br = class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></body></html>= --Apple-Mail=_AF0DFAC1-FA37-4D5C-A60C-1F7D8972E4B0--

Simone Tiraboschi

9:36 a.m.

On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...

On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:

Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.

great. thanks

16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:

On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt.

AFAIK that's correct, we need to be able shutdown HA VM without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive.

we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host

Hi, just another question in case HA is not configured at all.

by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.

If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?

expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed

Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.

Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic

Aleksey, can you please also test this scenario?

...

Thanks, michal

Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

aleksey.maksimov＠it-kb.ru

10:02 a.m.

So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM) This journald log in my guest OS when I press the power-button on the host: ... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. ... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot -- Perhaps this feature of HP ProLiant DL 360 G5. I dont know. If I test the unavailability of a host other ways that everything is going well. I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen... Thank you all very much for your participation and support. Michal, what kind of scenario are you talking about? PS: Excuse me for my bad English :) 16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:

...

On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:

...
...
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:

Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.

great. thanks

...
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:

...
...
On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: > no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. > >> AFAIK that's correct, we need to be able >> shutdown HA VM >> >> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. > > we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host Hi, just another question in case HA is not configured at all.

by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.

...
If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?

expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed

Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.

Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic

Aleksey, can you please also test this scenario?

...
...
...
...
Thanks, michal

...
Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Simone Tiraboschi

10:07 a.m.

On Fri, Sep 16, 2016 at 4:02 PM, <aleksey.maksimov@it-kb.ru> wrote:

...

So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM) This journald log in my guest OS when I press the power-button on the host:

... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. ... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --

Perhaps this feature of HP ProLiant DL 360 G5. I dont know.

If I test the unavailability of a host other ways that everything is going well.

I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4- about-ssh-soft-fencing-and-hard-fencing-over-hp-proliant- ilo2-power-managment-agent-and-test-of-high-availability/

Thank you all very much for your participation and support.

Michal, what kind of scenario are you talking about?

Basically what you just did, the question is what happens when you run 'shutdown -h now' (or press the physical button if configured to trigger a soft shutdown); is it going to propagate somehow the shutdown action to the VMs or to brutally kill them? In the first case the VMs will not restart regardless of their HA flags.

...

PS: Excuse me for my bad English :)

...
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...
...
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:

Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.

great. thanks

...
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:

...
> On 16 Sep 2016, at 15:05, Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote: > > On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote: >> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >> >>> AFAIK that's correct, we need to be able >>> shutdown HA VM >>> >>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >> >> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host > Hi, just another question in case HA is not configured at all.

by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.

> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? > Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?

expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed

Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this

16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>: thread.

...
...
Gianluca talks about “shutdown -h now”, you talk about power management

action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic

Aleksey, can you please also test this scenario?

...
...
...
...
Thanks, michal > Thanks, > Gianluca > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

aleksey.maksimov＠it-kb.ru

10:34 a.m.

Тested. If I run 'shutdown -h now' on host with running HA VM (not HostedEngine VM)... in oVirt web-console appears event: Sep 16, 2016 5:13:18 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest HA VM is turned off and will not start on another host. This journald log from HA VM guest OS: ... Sep 16 17:06:48 KOM-AD01-PBX02 python[2637]: [100B blob data] Sep 16 17:06:53 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.91.157:123 (ntp.ubuntu.com). Sep 16 17:07:03 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.89.199:123 (ntp.ubuntu.com). Sep 16 17:07:13 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.89.198:123 (ntp.ubuntu.com). Sep 16 17:07:23 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.94.4:123 (ntp.ubuntu.com). Sep 16 17:08:48 KOM-AD01-PBX02 python[2637]: [90B blob data] Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [155B blob data] Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [100B blob data] Sep 16 17:10:49 KOM-AD01-PBX02 python[2637]: [90B blob data] Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [155B blob data] Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [100B blob data] -- Reboot -- ... Before shutting down in the log no termination procedures. It looks like a rough poweroff the VM 16.09.2016, 17:08, "Simone Tiraboschi" <stirabos@redhat.com>:

...

On Fri, Sep 16, 2016 at 4:02 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM) This journald log in my guest OS when I press the power-button on the host:

... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. ... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --

Perhaps this feature of HP ProLiant DL 360 G5. I dont know.

If I test the unavailability of a host other ways that everything is going well.

I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen...

Thank you all very much for your participation and support.

Michal, what kind of scenario are you talking about?

Basically what you just did, the question is what happens when you run 'shutdown -h now' (or press the physical button if configured to trigger a soft shutdown); is it going to propagate somehow the shutdown action to the VMs or to brutally kill them?

In the first case the VMs will not restart regardless of their HA flags.

...
PS: Excuse me for my bad English :)

16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:

...
...
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:

Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.

great. thanks

...
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >> On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote: >> >> On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >>> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >>> >>>> AFAIK that's correct, we need to be able >>>> shutdown HA VM >>>> >>>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >>> >>> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host >> Hi, just another question in case HA is not configured at all. > > by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog. > >> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? >> Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes? > > expectation is that you won’t do that. That’s why there is the Maintenance host state. > But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed

Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.

Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic

Aleksey, can you please also test this scenario?

...
...
...
> Thanks, > michal >> Thanks, >> Gianluca >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Michal Skrivanek

10:37 a.m.

...

On 16 Sep 2016, at 16:34, aleksey.maksimov@it-kb.ru wrote:

Тested.

If I run 'shutdown -h now' on host with running HA VM (not HostedEngine VM)...

in oVirt web-console appears event:

Sep 16, 2016 5:13:18 PM VM KOM-AD01-PBX02 is down. Exit message: User shut down from within the guest

that would be another bug. It should be recognized properly as a “kill”. Can you please share host logs from this attempt as well?

...

HA VM is turned off and will not start on another host.

This journald log from HA VM guest OS:

... Sep 16 17:06:48 KOM-AD01-PBX02 python[2637]: [100B blob data] Sep 16 17:06:53 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.91.157:123 (ntp.ubuntu.com). Sep 16 17:07:03 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.89.199:123 (ntp.ubuntu.com). Sep 16 17:07:13 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.89.198:123 (ntp.ubuntu.com). Sep 16 17:07:23 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for reply from 91.189.94.4:123 (ntp.ubuntu.com). Sep 16 17:08:48 KOM-AD01-PBX02 python[2637]: [90B blob data] Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [155B blob data] Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [100B blob data] Sep 16 17:10:49 KOM-AD01-PBX02 python[2637]: [90B blob data] Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [155B blob data] Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [100B blob data] -- Reboot -- ...

Before shutting down in the log no termination procedures. It looks like a rough poweroff the VM

yep, that is expected. But it should be properly detected as such and HE VM should restart. Somehow vdsm misidentifies the reason for the shutdown.

...

16.09.2016, 17:08, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 4:02 PM, <aleksey.maksimov@it-kb.ru> wrote:

...
So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM) This journald log in my guest OS when I press the power-button on the host:

... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. ... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --

Perhaps this feature of HP ProLiant DL 360 G5. I dont know.

If I test the unavailability of a host other ways that everything is going well.

I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen...

Thank you all very much for your participation and support.

Michal, what kind of scenario are you talking about?

Basically what you just did, the question is what happens when you run 'shutdown -h now' (or press the physical button if configured to trigger a soft shutdown); is it going to propagate somehow the shutdown action to the VMs or to brutally kill them?

In the first case the VMs will not restart regardless of their HA flags.

...
PS: Excuse me for my bad English :)

16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:

...
...
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:

Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.

great. thanks

...
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>: > On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >>> On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote: >>> >>> On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >>>> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >>>> >>>>> AFAIK that's correct, we need to be able >>>>> shutdown HA VM >>>>> >>>>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >>>> >>>> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host >>> Hi, just another question in case HA is not configured at all. >> >> by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog. >> >>> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? >>> Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes? >> >> expectation is that you won’t do that. That’s why there is the Maintenance host state. >> But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed > > Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.

Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic

Aleksey, can you please also test this scenario?

...
...
>> Thanks, >> michal >>> Thanks, >>> Gianluca >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Michal Skrivanek

10:14 a.m.

...

On 16 Sep 2016, at 16:02, aleksey.maksimov@it-kb.ru wrote:

So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM)

thanks for confirmation, then it is indeed https://bugzilla.redhat.com/show_bug.cgi?id=1341106 I’m not sure if there is any good workaround. You can always reconfigure(disable) ACPI in the guest, then HA logic would work ok but it also means there is no graceful shutdown and your VM would be killed uncleanly.

...

This journald log in my guest OS when I press the power-button on the host:

.. Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. .. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --

Perhaps this feature of HP ProLiant DL 360 G5. I dont know.

If I test the unavailability of a host other ways that everything is going well.

I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen...

Thank you all very much for your participation and support.

Michal, what kind of scenario are you talking about?

PS: Excuse me for my bad English :)

16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:

...
...
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:

Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.

great. thanks

...
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:

...
> On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote: > > On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >> >>> AFAIK that's correct, we need to be able >>> shutdown HA VM >>> >>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >> >> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host > Hi, just another question in case HA is not configured at all.

by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog.

> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? > Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes?

expectation is that you won’t do that. That’s why there is the Maintenance host state. But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed

Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.

Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic

Aleksey, can you please also test this scenario?

...
...
...
...
Thanks, michal > Thanks, > Gianluca > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

aleksey.maksimov＠it-kb.ru

10:35 a.m.

"your VM would be killed uncleanly." This is not a good idea, I think 16.09.2016, 17:14, "Michal Skrivanek" <michal.skrivanek@redhat.com>:

...

...
On 16 Sep 2016, at 16:02, aleksey.maksimov@it-kb.ru wrote:

So, colleagues. I again tested the Fencing and now I think that my host-server power-button (physically or through ILO) sends a KILL-command to the host OS (and as a result to VM)

thanks for confirmation, then it is indeed https://bugzilla.redhat.com/show_bug.cgi?id=1341106

I’m not sure if there is any good workaround. You can always reconfigure(disable) ACPI in the guest, then HA logic would work ok but it also means there is no graceful shutdown and your VM would be killed uncleanly.

...
This journald log in my guest OS when I press the power-button on the host:

.. Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID 1000... Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades Shutdown... Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session closed for user user Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, returning status 0 Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session closed for user root Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: Exiting on terminated signal. Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. .. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All Filesystems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File Systems (Pre). Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel File Systems. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device Nodes in /dev. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* head number too large or missing monitors config: ffffc9000084a000, 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped -- Reboot --

Perhaps this feature of HP ProLiant DL 360 G5. I dont know.

If I test the unavailability of a host other ways that everything is going well.

I described my experience testing Fencing on practical examples on my blog for everyone in Russian. https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fen...

Thank you all very much for your participation and support.

Michal, what kind of scenario are you talking about?

PS: Excuse me for my bad English :)

16.09.2016, 16:37, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:

...
...
On 16 Sep 2016, at 15:31, aleksey.maksimov@it-kb.ru wrote:

Hi Simone. Exactly. Now I'll put the journald on the guest and try to understand how the guest off.

great. thanks

...
16.09.2016, 16:25, "Simone Tiraboschi" <stirabos@redhat.com>:

...
On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >> On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote: >> >> On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek <michal.skrivanek@redhat.com> wrote: >>> no, that’s not how HA works today. When you log into a guest and issue “shutdown” we do not restart the VM under your hands. We can argue how it should or may work, but this is the defined behavior since the dawn of oVirt. >>> >>>> AFAIK that's correct, we need to be able >>>> shutdown HA VM >>>> >>>> without being it immediately restarted on different host. We want to restart HA VM only if host, where HA VM is running, is non-responsive. >>> >>> we try to restart it in all other cases other than user initiated shutdown, e.g. a QEMU process crash on an otherwise-healthy host >> Hi, just another question in case HA is not configured at all. > > by “HA configured” I expect you’re referring to the “Highly Available” checkbox in Edit VM dialog. > >> If I run the "shutdown -h now" command on an host where some VMs are running, what is the expected behavior? >> Clean VM shutdown (with or without timeout in case it doesn't complete?) or crash of their related QEMU processes? > > expectation is that you won’t do that. That’s why there is the Maintenance host state. > But if you do that regardless, with VMs running, all the processes will be terminated in a regular system way, i.e. all QEMU processes get SIGTERM. From the perspective of each guest this is not a clean shutdown and it would just get killed

Aleksey is reporting that he started a shutdown on his host by power management and the VM processes didn't get roughly killed but smoothly shut down and so they didn't restarted regardless of their HA flag and so this thread.

Gianluca talks about “shutdown -h now”, you talk about power management action, those are two different things. The current idea is that systemd or some other component just propagates the action to the guest and if that guest is configured to handle it as a shutdown it starts it itself as well so it looks like a user-initiated one. Even though this mostly makes sense it is not ok for current HA logic

Aleksey, can you please also test this scenario?

...
...
...
> Thanks, > michal >> Thanks, >> Gianluca >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

3355

Age (days ago)

3356

Last active (days ago)

List overview

Download

25 comments

5 participants

participants (5)

aleksey.maksimov＠it-kb.ru
Gianluca Cecchi
Martin Perina
Michal Skrivanek
Simone Tiraboschi

oVirt 4.0.3 (Hosted Engine) - High Availability VM not restart after auto-fencing of host.

tags

participants (5)