Hello,
In OVirt, we have a property propagate_error at the disk level that
decides in case of an error, how this error be propagated to the VM.
This value is maintained in the database table with the default value
set as Off. The default setting(Off) results in a policy that ends up
pausing the VM rather than propagating the errors to VM. There is no
provision in the UI currently to configure this property for disk
(images or luns). So there is no easy way to set this value. Further,
even if the value is manually set to "On" in db, it gets overwriiten by
UI everytime some other property is updated as described here -
https://bugzilla.redhat.com/show_bug.cgi?id=1669367
Setting the value to "Off" is not ideal for multipath devices where a
single path failure causes vm to pause. It puts serious restrictions for
the DR situation and unlike VMWare * Hyper-V, oVirt is not able to
support the DR functionality -
https://bugzilla.redhat.com/show_bug.cgi?id=1314160
While we wait for RFE, the proposal here is to revise the out of the box
behavior for LUNs. For LUNs, we should propagate the errors to VM rather
than directly stopping those. This will allow us to handle short-term
multipath outages and improve availability. This is a simple change in
behavior but will have good positive impact. I would like to seek
feedback about this to make sure that everyone is ok with the proposal.
Thanks,
Shubha