Sorry for the slow reply, was out sick end of last week.
Thank you Nir! You have been very helpful in getting a grasp on this issue.
I have gone ahead and open an RFE for resuming on a Direct LUN:
https://bugzilla.redhat.com/show_bug.cgi?id=1610459
Thanks again!
Regards,
Ryan
On Tue, Jul 24, 2018 at 12:30 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
On Tue, Jul 24, 2018 at 8:30 PM Ryan Bullock
<rrb3942(a)gmail.com> wrote:
...
> Vdsm does monitor multipath events for all LUNs, but they are used only
>> for reporting purposes, see:
>>
https://ovirt.org/develop/release-management/features/
>> storage/multipath-events/
>>
>> We could use the events for resuming vms using the multipath devices that
>> became available. This functionality will be even more important in the
>> next version
>> since we plan to move to LUN per disk model.
>>
>>
>
> I will look at doing this. At the very least I feel that
> differences/limitations between storage back-ends/methods should be
> documented. Just so users don't run into any surprises.
>
You can file a bug for documenting this issue.
...
> My other question is, how can I keep my VMs with Direct LUNs from pausing
>>> during short outages? Can I put configurations in my multipath.conf for
>>> just the wwids of my Direct LUNs to increase the ‘no_path_retry’ to prevent
>>> the VMs from pausing in the first place? I know in general you don’t want
>>> to increase the ‘no_path_retry’ because it can cause timeout issues with
>>> VDSM and SPM operations (LVM changes, etc). But in the case of a Direct LUN
>>> would it cause any problems?
>>>
>>
>> You can add a drop-in multipath configuration that will change
>> no_path_retry for specific device, or multiapth.
>>
>> Increasing no_path_retry will cause larger delays when vdsm try to
>> access the LUNs via lvm commands, but the delay should be only on
>> the first access when a LUN is not available.
>>
>>
> Would that increased delay cause any sort of issues for Ovirt (e.g.
> thinking a node is offline/unresponsive) if set globally in multipath.conf?
> Since a Direct LUN doesn't use LVM, would this even be a consideration if
> the increased delay was limited to the Direct LUN only?
>
Vdsm scans all LUNs to discover oVirt volumes, so it will be effected by
multipath
configuration applied only for direct LUNs.
Increasing no_path_retry for any LUN will increase the chance to delay some
vdsm flows accessing LUNs (e.g. updating lvm cache, scsi rescan, listing
devices).
But the delay happens once when the multipath device loose all paths. The
benefit
is smaller chance that a VM will pause or restart because of short outage.
Nir