[ovirt-users] What recovers a VM from pause?
InterNetX - Juergen Gotteswinter
jg at internetx.com
Mon May 30 13:30:33 UTC 2016
Hi,
you are aware of the fact that eql sync replication is just about
replication, no single piece of high availability? i am not even sure if
it does ip failover itself. so better think about minutes of
interruptions than seconds.
anyway, dont count on ovirts pause/unpause. theres a real chance that it
will go horrible wrong. a scheduled maint. window where everything gets
shut down whould be best practice
Juergen
Am 5/30/2016 um 3:07 PM schrieb Nicolas Ecarnot:
> Hello,
>
> We're planning a move from our old building towards a new one a few
> meters away.
>
>
>
> In a similar way of Martijn
> (https://www.mail-archive.com/users@ovirt.org/msg33182.html), I have
> maintenance planed on our storage side.
>
> Say an oVirt DC is using a SAN's LUN via iSCSI (Equallogic).
> This SAN allows me to setup block replication between two SANs, seen by
> oVirt as one (Dell is naming it SyncRep).
> Then switch all the iSCSI accesses to the replicated LUN.
>
> When doing this, the iSCSI stack of each oVirt host notices the
> de-connection, tries to reconnect, and succeeds.
> Amongst our hosts, this happens between 4 and 15 seconds.
>
> When this happens fast enough, oVirt engine and the VMs don't even
> notice, and they keep running happily.
>
> When this takes more than 4 seconds, there are 2 cases :
>
> 1 - The hosts and/or oVirt and/or the SPM (I actually don't know)
> notices that there is a storage failure, and pauses the VMs.
> When the iSCSI stack reconnects, the VMs are automatically recovered
> from pause, and this all takes less than 30 seconds. That is very
> acceptable for us, as this action is extremely rare.
>
> 2 - Same storage failure, VMs paused, and some VMs stay in pause mode
> forever.
> Manual "run" action is mandatory.
> When done, everything recovers correctly.
> This is also quite acceptable, but here come my questions :
>
> My questions : (!)
> - *WHAT* process or piece of code or what oVirt parts is responsible for
> deciding when to UN-pause a VM, and at what conditions?
> That would help me to understand why some cases are working even more
> smoothly than others.
> - Are there related timeouts I could play with in engine-config options?
> - [a bit off-topic] Is it safe to increase some iSCSI timeouts of
> buffer-sizes in the hope this kind of disconnection would get un-noticed?
>
More information about the Users
mailing list