Hi Bernardo

I would like to suggest a workaround to this problem , can you please check that :

We have a configuration value named FenceQuietTimeBetweenOperationsInSec.
It controls the minimal timeout to wait between fence operation (stop, start),
currently, it is defaulted to 180 sec , The key is not exposed to engine-config, so, I would suggest to

1) Change this key value to 900 by running the following from psql prompt :

update vdc_options set option_value = '900' where option_name = 'FenceQuietTimeBetweenOperationsInSec';

2) Restart the engine

3) Repeat the scenario

Now, the engine will require 15 min between fencing operations and your host can be up again without being fenced again.

Please let me know if this workaround is working for you

Thanks

Eli

On Tue, Sep 5, 2017 at 4:20 PM, Bernardo Juanicó <bjuanico@gmail.com> wrote:
Martin, thanks for your reply, i was aware of the [1] BUG and the implemented solution, changing ServerRebootTimeout to 1200 didnt change a thing...
Now i know about [2] and ill test the fix once it gets released.

Regards,

Bernardo


2017-09-05 8:23 GMT-03:00 Martin Perina <mperina@redhat.com>:
Hi Bernardo,

we have added timeout to wait until host is booted [1] in oVirt 4.1.2. This timeout is by default 5 minutes, but it can be extended using following command:

   engine-config -s ServerRebootTimeout=NNN

where NNN is number of seconds you want to wait until host is booted up.

But be aware that you may be affected by [2], which we are currently trying to fix.

Regards

Martin Perina


On Fri, Sep 1, 2017 at 7:54 PM, Bernardo Juanicó <bjuanico@gmail.com> wrote:
Hi everyone, 

I installed 2 hosts on a new cluster and the servers take a really long to boot up (about 8 minutes).

When a host crashes or is powered off the ovirt-manager starts it via power management, since the servers takes all that time to boot up the ovirt-manager thinks it failed to start and proceeds to reboot it, several times before giving up, when the server is finally started (about 20 minutes after the failure)

I changed some engine variables with engine-config trying to set a higher timeout, but the problem persists.

Any ideas??


Regards,
Bernardo



_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users