On Mon, May 30, 2016 at 4:07 PM, Nicolas Ecarnot <nicolas(a)ecarnot.net> wrote:
Hello,
We're planning a move from our old building towards a new one a few meters
away.
In a similar way of Martijn
(
https://www.mail-archive.com/users@ovirt.org/msg33182.html), I have
maintenance planed on our storage side.
Say an oVirt DC is using a SAN's LUN via iSCSI (Equallogic).
This SAN allows me to setup block replication between two SANs, seen by
oVirt as one (Dell is naming it SyncRep).
Then switch all the iSCSI accesses to the replicated LUN.
When doing this, the iSCSI stack of each oVirt host notices the
de-connection, tries to reconnect, and succeeds.
Amongst our hosts, this happens between 4 and 15 seconds.
When this happens fast enough, oVirt engine and the VMs don't even notice,
and they keep running happily.
When this takes more than 4 seconds, there are 2 cases :
1 - The hosts and/or oVirt and/or the SPM (I actually don't know) notices
that there is a storage failure, and pauses the VMs.
When the iSCSI stack reconnects, the VMs are automatically recovered from
pause, and this all takes less than 30 seconds. That is very acceptable for
us, as this action is extremely rare.
2 - Same storage failure, VMs paused, and some VMs stay in pause mode
forever.
Manual "run" action is mandatory.
When done, everything recovers correctly.
This is also quite acceptable, but here come my questions :
My questions : (!)
- *WHAT* process or piece of code or what oVirt parts is responsible for
deciding when to UN-pause a VM, and at what conditions?
Vms get paused by qemu, when you get ENOSPC or some other IO error.
This probably happens when a vm is writing to storage, and all paths to storage
are faulty - with current configuration, the scsi layer will fail
after 5 seconds,
and if no path is available, the write will fail.
If vdsm storage monitoring system detected the issue, the storage domain
will become invalid. When the storage domain will become valid again, we
try to resume all vms paused because of IO errors.
Storage monitoring is done every 10 seconds in normal conditions, but in
current release, there can be delays of up to couple of minutes in
extreme conditions,
for example, 50 storage domains and doing lot of io. So basically, the
storage domain
monitor may miss an error on storage, never become invalid, and would
never become valid again and the vm will have to be resumed manually.
See
https://bugzilla.redhat.com/1081962
In ovirt 4.0 monitoring should be improved, and will always monitor
storage every
10 seconds, but even this cannot guarantee that we will detect all
storage errors
For example, if the storage outage is shorter then 10 seconds. But I
guess that chance
that storage outage was shorter then 10 seconds, but long enough to cause a vm
to pause is very low.
That would help me to understand why some cases are working even
more
smoothly than others.
- Are there related timeouts I could play with in engine-config options?
Nothing on the engine side...
- [a bit off-topic] Is it safe to increase some iSCSI timeouts of
buffer-sizes in the hope this kind of disconnection would get un-noticed?
But you may modify multipath configuration on the host.
We use now this multipath configuration (/etc/multipath.conf):
# VDSM REVISION 1.3
defaults {
polling_interval 5
no_path_retry fail
user_friendly_names no
flush_on_last_del yes
fast_io_fail_tmo 5
dev_loss_tmo 30
max_fds 4096
deferred_remove yes
}
devices {
device {
all_devs yes
no_path_retry fail
}
}
This enforces failing of io request on devices that by default will queue such
requests for long or unlimited time. Queuing requests is very bad for vdsm, and
cause various commands to block for minutes during storage outage,
failing various
flows in vdsm and the ui.
See
https://bugzilla.redhat.com/880738
However, in your case, using queuing may be the best way to do the switch
from one storage to another in the smoothest way.
You may try this setting:
devices {
device {
all_devs yes
no_path_retry 30
}
}
This will queue io requests for 30 seconds before failing.
Using this normally would be a bad idea with vdsm, since during storage outage,
vdsm may block for 30 seconds when no paths is available, and is not designed
for this behavior, but blocking from time to time for short time should be ok.
I think that modifying the configuration and reloading multipathd service should
be enough to use the new settings, but I'm not sure if this changes
existing sessions
or open devices.
Adding Ben to add more info about this.
Nir