[ovirt-users] What recovers a VM from pause?

Benjamin Marzinski bmarzins at redhat.com
Tue May 31 14:51:42 UTC 2016


On Mon, May 30, 2016 at 10:09:25PM +0300, Nir Soffer wrote:
> On Mon, May 30, 2016 at 4:07 PM, Nicolas Ecarnot <nicolas at ecarnot.net> wrote:
> > Hello,
> >
> > We're planning a move from our old building towards a new one a few meters
> > away.
> >
> >
> >
> > In a similar way of Martijn
> > (https://www.mail-archive.com/users@ovirt.org/msg33182.html), I have
> > maintenance planed on our storage side.
> >
> > Say an oVirt DC is using a SAN's LUN via iSCSI (Equallogic).
> > This SAN allows me to setup block replication between two SANs, seen by
> > oVirt as one (Dell is naming it SyncRep).
> > Then switch all the iSCSI accesses to the replicated LUN.
> >
> > When doing this, the iSCSI stack of each oVirt host notices the
> > de-connection, tries to reconnect, and succeeds.
> > Amongst our hosts, this happens between 4 and 15 seconds.
> >
> > When this happens fast enough, oVirt engine and the VMs don't even notice,
> > and they keep running happily.
> >
> > When this takes more than 4 seconds, there are 2 cases :
> >
> > 1 - The hosts and/or oVirt and/or the SPM (I actually don't know) notices
> > that there is a storage failure, and pauses the VMs.
> > When the iSCSI stack reconnects, the VMs are automatically recovered from
> > pause, and this all takes less than 30 seconds. That is very acceptable for
> > us, as this action is extremely rare.
> >
> > 2 - Same storage failure, VMs paused, and some VMs stay in pause mode
> > forever.
> > Manual "run" action is mandatory.
> > When done, everything recovers correctly.
> > This is also quite acceptable, but here come my questions :
> >
> > My questions : (!)
> > - *WHAT* process or piece of code or what oVirt parts is responsible for
> > deciding when to UN-pause a VM, and at what conditions?
> 
> Vms get paused by qemu, when you get ENOSPC or some other IO error.
> This probably happens when a vm is writing to storage, and all paths to storage
> are faulty - with current configuration, the scsi layer will fail
> after 5 seconds,
> and if no path is available, the write will fail.
> 
> If vdsm storage monitoring system detected the issue, the storage domain
> will become invalid. When the storage domain will become valid again, we
> try to resume all vms paused because of IO errors.
> 
> Storage monitoring is done every 10 seconds in normal conditions, but in
> current release, there can be delays of up to couple of minutes in
> extreme conditions,
> for example, 50 storage domains and doing lot of io. So basically, the
> storage domain
> monitor may miss an error on storage, never become invalid, and would
> never become valid again and the vm will have to be resumed manually.
> See https://bugzilla.redhat.com/1081962
> 
> In ovirt 4.0 monitoring should be improved, and will always monitor
> storage every
> 10 seconds, but even this cannot guarantee that we will detect all
> storage errors
> For example, if the storage outage is shorter then 10 seconds. But I
> guess that chance
> that storage outage was shorter then 10 seconds, but long enough to cause a vm
> to pause is very low.
> 
> > That would help me to understand why some cases are working even more
> > smoothly than others.
> > - Are there related timeouts I could play with in engine-config options?
> 
> Nothing on the engine side...
> 
> > - [a bit off-topic] Is it safe to increase some iSCSI timeouts of
> > buffer-sizes in the hope this kind of disconnection would get un-noticed?
> 
> But you may modify multipath configuration on the host.
> 
> We use now this multipath configuration (/etc/multipath.conf):
> 
> # VDSM REVISION 1.3
> 
> defaults {
>     polling_interval            5
>     no_path_retry               fail
>     user_friendly_names         no
>     flush_on_last_del           yes
>     fast_io_fail_tmo            5
>     dev_loss_tmo                30
>     max_fds                     4096
>     deferred_remove             yes
> }
> 
> devices {
>     device {
>         all_devs                yes
>         no_path_retry           fail
>     }
> }
> 
> This enforces failing of io request on devices that by default will queue such
> requests for long or unlimited time. Queuing requests is very bad for vdsm, and
> cause various commands to block for minutes during storage outage,
> failing various
> flows in vdsm and the ui.
> See https://bugzilla.redhat.com/880738
> 
> However, in your case, using queuing may be the best way to do the switch
> from one storage to another in the smoothest way.
> 
> You may try this setting:
> 
> devices {
>     device {
>         all_devs                yes
>         no_path_retry           30
>     }
> }
> 
> This will queue io requests for 30 seconds before failing.
> Using this normally would be a bad idea with vdsm, since during storage outage,
> vdsm may block for 30 seconds when no paths is available, and is not designed
> for this behavior, but blocking from time to time for short time should be ok.
> 
> I think that modifying the configuration and reloading multipathd service should
> be enough to use the new settings, but I'm not sure if this changes
> existing sessions
> or open devices.
> 
> Adding Ben to add more info about this.

Reloading the multipathd service will update this setting on all
existing devices. Outside of vdsm, multipath settings like this are
fairly common. So, from multipath's point of view this is completely
reasonable.

-Ben

> 
> Nir



More information about the Users mailing list