Re: [ovirt-users] What recovers a VM from pause?

Monday, 30 May 2016

Hi,

you are aware of the fact that eql sync replication is just about
replication, no single piece of high availability? i am not even sure if
it does ip failover itself. so better think about minutes of
interruptions than seconds.

anyway, dont count on ovirts pause/unpause. theres a real chance that it
will go horrible wrong. a scheduled maint. window where everything gets
shut down whould be best practice

Juergen

Am 5/30/2016 um 3:07 PM schrieb Nicolas Ecarnot:
...
 Hello,

 We're planning a move from our old building towards a new one a few
 meters away.

 In a similar way of Martijn
 (https://www.mail-archive.com/users@ovirt.org/msg33182.html), I have
 maintenance planed on our storage side.

 Say an oVirt DC is using a SAN's LUN via iSCSI (Equallogic).
 This SAN allows me to setup block replication between two SANs, seen by
 oVirt as one (Dell is naming it SyncRep).
 Then switch all the iSCSI accesses to the replicated LUN.

 When doing this, the iSCSI stack of each oVirt host notices the
 de-connection, tries to reconnect, and succeeds.
 Amongst our hosts, this happens between 4 and 15 seconds.

 When this happens fast enough, oVirt engine and the VMs don't even
 notice, and they keep running happily.

 When this takes more than 4 seconds, there are 2 cases :

 1 - The hosts and/or oVirt and/or the SPM (I actually don't know)
 notices that there is a storage failure, and pauses the VMs.
 When the iSCSI stack reconnects, the VMs are automatically recovered
 from pause, and this all takes less than 30 seconds. That is very
 acceptable for us, as this action is extremely rare.

 2 - Same storage failure, VMs paused, and some VMs stay in pause mode
 forever.
 Manual "run" action is mandatory.
 When done, everything recovers correctly.
 This is also quite acceptable, but here come my questions :

 My questions : (!)
 - *WHAT* process or piece of code or what oVirt parts is responsible for
 deciding when to UN-pause a VM, and at what conditions?
 That would help me to understand why some cases are working even more
 smoothly than others.
 - Are there related timeouts I could play with in engine-config options?
 - [a bit off-topic] Is it safe to increase some iSCSI timeouts of
 buffer-sizes in the hope this kind of disconnection would get un-noticed?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [ovirt-users] What recovers a VM from pause?