On 05/19/2014 05:13 PM, Eli Mesika wrote:
----- Original Message -----
> From: "Yuriy Demchenko" <demchenko.ya(a)gmail.com>
> To: "Eli Mesika" <emesika(a)redhat.com>
> Cc: users(a)ovirt.org
> Sent: Monday, May 19, 2014 4:01:04 PM
> Subject: Re: [ovirt-users] power outage: HA vms not restarted
>
> On 05/19/2014 04:56 PM, Eli Mesika wrote:
>>> but shouldn't engine restart corresponded vms after holding host came
>>>> up? (without manual fence)
>>>> because they up - so engine can query them about running/not running vms
>>>> and get actual state of vms - running or not
>>>> the only host were down at that point is srv5, which holded only 1 vm -
>>>> and it were correctly put in 'unknown' state, other vms were just
'down'
>>>> until we manually started them
>> Are you sure that those VMs are defined as Highly Available VMs ???
>>
> yes, i'm sure. double checked in webinterface, plus log entries like:
May this be related, I think that in your case host came up very fast while the fencing
operation already started ....
https://bugzilla.redhat.com/show_bug.cgi?id=1064860 doesn't seems so, as vm
wasnt put into 'unknown' state and srv19 were
allready up when engine booted, so no fence attempt ever made for it
> 2014-05-17 00:23:10,565 INFO
> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
> (DefaultQuartzScheduler_Worker-14) vm prod.gui running in db and not
> running in vds - add to rerun treatment. vds srv19
> 2014-05-17 00:23:10,909 INFO
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-14) [2989840c] Correlation ID: null, Call
> Stack: null, Custom Event ID: -1, Message: Highly Available VM prod.gui
> failed. It will be restarted automatically.
> 2014-05-17 00:23:10,911 INFO
> [org.ovirt.engine.core.bll.VdsEventListener]
> (DefaultQuartzScheduler_Worker-14) [2989840c] Highly Available VM went
> down. Attempting to restart. VM Name: prod.gui, VM
> Id:bbb7a605-d511-461d-99d2-c5a5bf8d9958
>
>