[ovirt-devel] Re: Improving VM behavior in case of IO errors

11 Aug 2020

      Am 11.08.2020 um 17:44 hat Nir Soffer geschrieben:
...
On Mon, Aug 10, 2020 at 11:53 AM Kevin Wolf <kwolf@redhat.com> wrote:
...
Am 09.08.2020 um 23:50 hat Nir Soffer geschrieben:
...
On Wed, Jul 29, 2020 at 2:30 PM Shubha Kulkarni
<shubha.kulkarni@oracle.com> wrote:
...
Thanks for the feedback Nir.
I agree in general that having an additional engine config for disk
level error handling default would be the right way. It would be good
to
...
decide the granularity. Would it make sense to have this for a specific
disk type like lun or would you prefer to make it generic for all
types?
This must be for a specific disk type, since for thin images on block
storage we cannot support propagating errors to the guest. This will
break thin provisioning.
Is werror=enospc not enough for thin provisioning to work? This will
still stop the guest for any other kinds of I/O errors.
Right, this should work, and what we actually use now for propagating
errors for anything but cdrom.
Hm, wait, the options you quote below are all either 'stop' or 'report',
but never 'enospc'. Is 'enospc' used for yet another kind of disk?
...
For LUN using werror=enospc,rerror=enospc seems wrong, but we do this for
many years.
This is how we handle cdrom:
-device
ide-cd,bus=ide.2,id=ua-346e176c-f983-4510-af4b-786b368efdd6,bootindex=2,werror=report,rerror=report
Makes sense to me. This is read-only and removable media. Stopping the
guest usually makes sense so that it won't assume the disk is broken,
but if it happens with removable media, you can just eject and re-insert
the same image and it's fixed.
...
Image:
-device
virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.5,addr=0x0,drive=libvirt-2-format,id=ua-1d93fa9e-1665-40d7-9ffc-770513242795,bootindex=1,write-cache=on,serial=1d93fa9e-1665-40d7-9ffc-770513242795,werror=stop,rerror=stop
I assume this is the one that could use 'enospc'?
...
LUN:
-device
virtio-blk-pci,iothread=iothread2,scsi=off,bus=pci.7,addr=0x0,drive=libvirt-1-format,id=ua-19b06845-2c54-422d-921b-6ec0ee2e935b,write-cache=on,werror=stop,rerror=stop
\
Kevin, any reason not to use werror=report,rerror=report for LUN when
we want to propagate errors to the guest?
If you want to propagate errors, then 'report' is the right setting.

What does "LUN" mean exactly? It doesn't seem to be passthrough, so is
it just that you have some restriction like that it's always raw? Maybe
I would use 'enospc' for consistency even though you never expect this
error to happen. But 'report' is fine, too.

Of course, if you ever get an I/O error (e.g. network temporarily down),
propagating errors to the guest means that it will give up on the disk.
Whether this is the desired behaviour should probably be configured by
the user.

Kevin
...
...
Kevin
...
Handling the LUN use case first seems like the best way, since in this
case we
don't manage the LUN and we don't support resuming paused using LUNs yet,
so propagating the error may be more useful.
Managed Block Storage (cinderlib based disks) are very much like
direct LUN. In this
case we do manage the disks on the server, but otherwise we don't
support anything
on the host (e.g. monitoring, resuming paused VMs) so propagating the
error like
direct LUNs may be more useful.
Images are a bigger problem since thin disks cannot support
propagating errors but
preallocated disks can. But once you create a snapshot prealocated disks
behave
exactly like thin disks because they are the same.
Snapshots are also created automatically in for preallocated images,
for example during
live storage migration, and deleted automatically after the migration.
So you cannot
assume that having only preallocated disks is good for propagating
errors.
Even if you limit this option to file based storage, this is going to
break when you migrate
the disks to block storage.
Nir
...
Thanks,
Shubha
On 7/28/2020 2:03 PM, Nir Soffer wrote:
...
On Tue, Jul 28, 2020 at 4:58 AM Shubha Kulkarni
<shubha.kulkarni@oracle.com> wrote:
...
Hello,
In OVirt, we have a property propagate_error at the disk level that
decides in case of an error, how this error be propagated to the VM.
This value is maintained in the database table with the default
...
...
...
set as Off. The default setting(Off) results in a policy that ends
up
pausing the VM rather than propagating the errors to VM.  There is
no
provision in the UI currently to configure this property for disk
(images or luns). So there is no easy way to set this value.
Further,
even if the value is manually set to "On" in db, it gets
overwriiten by
UI everytime some other property is updated as described here -
https://bugzilla.redhat.com/show_bug.cgi?id=1669367
Setting the value to "Off" is not ideal for multipath devices where
a
single path failure causes vm to pause.
Single path failure should be transparent to qemu. multipath will
fail over
the I/O to another path. The I/O will fail only if all paths are
down, and
(with the default configuration), multipath path checkers failed 4
times.
...
It puts serious restrictions for
the DR situation and unlike VMWare * Hyper-V, oVirt is not able to
support the DR functionality -
https://bugzilla.redhat.com/show_bug.cgi?id=1314160
Alghouth in this bug we see that failover that looks successful from
multipath
and vdsm point of view ended in paused VM:
https://bugzilla.redhat.com/1860377
Maybe Ben can explain how this can happen.
I hope that qemu will provide more info on errors in the future. If
we had a log
about the failure I/O it could be helpful.
...
While we wait for RFE, the proposal here is to revise the out of
...
...
...
behavior for LUNs. For LUNs, we should propagate the errors to VM
rather
than directly stopping those. This will allow us to handle
short-term
multipath outages and improve availability. This is a simple change
in
behavior but will have good positive impact. I would like to seek
feedback about this to make sure that everyone is ok with the
value
the box
proposal.
...
...
I think it makes sense, but this is just a default, and it cannot
work
for all cases.
This can end in broken VM with read only file system that must be
rebooted, while
with error_policy="stop", failover may be transparent to the VM even
if it was paused
for a short time.
I would start by making engine defaults configurable using engine
config, so different
oVirt distributions can use different defaults.
Nir