[ovirt-users] qemu-kvm images corruption

Tue Feb 6 09:35:28 UTC 2018

On Feb 6, 2018 11:09 AM, "Nicolas Ecarnot" <nicolas at ecarnot.net> wrote:

Hello,

On our two 3.6 DCs, we're still facing qcow2 corruptions, even on freshly
installed VMs (CentOS7, win2012, win2008...).

Please provide complete information on the issue. When, how often, which
storage, etc.

(We are still hoping to find some time to migrate all this to 4.2, but it's
a big work and our one-person team - me - is overwhelmed.)

Understood. Note that we have some scripts that can assist somewhat.

My workaround is described in my previous thread below, but it's just a
workaround.

Reading further, I found that :

https://forum.proxmox.com/threads/qcow2-corruption-after-
snapshot-or-heavy-disk-i-o.32865/page-2

There are many things I don't know or understand, and I'd like your opinion
:

- Is "virtio" is synonym of "virtio-blk"?

Yes.

- Is it true that the development of virtio-scsi is active and the one of
virtio is stopped?

No.

- People in the proxmox forum seem to say that no qcow2 corruption occurs
when using IDE (not an option for me) neither virtio-scsi.

Anecdotal evidence or properly reproduced?
Have they filed an issue?

Does any Redhat people ever heard of this?

I'm not aware of an existing corruption issue.

- Is converting all my VMs to use virtio-scsi a guarantee against further
corruptions?

No.

- What is the non-official but nonetheless recommended driver oVirt devs
recommend in the sense of future, development and stability?

Depends. I like virtio-scsi for its features (DISCARD mainly), but in some
workloads virtio-blk may be somewhat faster (supposedly lower overhead).
Both interfaces are stable.

We should focus on properly reporting the issue so the qemu folks can look
at this.
Y.

Regards,

-- 
Nicolas ECARNOT

Le 15/09/2017 à 14:06, Nicolas Ecarnot a écrit :

> TL;DR:
> How to avoid images corruption?
>
>
> Hello,
>
> On two of our old 3.6 DC, a recent series of VM migrations lead to some
> issues :
> - I'm putting a host into maintenance mode
> - most of the VM are migrating nicely
> - one remaining VM never migrates, and the logs are showing :
>
> * engine.log : "...VM has been paused due to I/O error..."
> * vdsm.log : "...Improbable extension request for volume..."
>
> After digging amongst the RH BZ tickets, I saved the day by :
> - stopping the VM
> - lvchange -ay the adequate /dev/...
> - qemu-img check [-r all] /rhev/blahblah
> - lvchange -an...
> - boot the VM
> - enjoy!
>
> Yesterday this worked for a VM where only one error occurred on the qemu
> image, and the repair was easily done by qemu-img.
>
> Today, facing the same issue on another VM, it failed because the errors
> were very numerous, and also because of this message :
>
> [...]
> Rebuilding refcount structure
> ERROR writing refblock: No space left on device
> qemu-img: Check failed: No space left on device
> [...]
>
> The PV/VG/LV are far from being full, so I guess I don't where to look at.
> I tried many ways to solve it but I'm not comfortable at all with qemu
> images, corruption and solving, so I ended up exporting this VM (to an NFS
> export domain), importing it into another DC : this had the side effect to
> use qemu-img convert from qcow2 to qcow2, and (maybe?????) to solve some
> errors???
> I also copied it into another qcow2 file with the same qemu-img convert
> way, but it is leading to another clean qcow2 image without errors.
>
> I saw that on 4.x some bugs are fixed about VM migrations, but this is not
> the point here.
> I checked my SANs, my network layers, my blades, the OS (CentOS 7.2) of my
> hosts, but I see nothing special.
>
> The real reason behind my message is not to know how to repair anything,
> rather than to understand what could have lead to this situation?
> Where to keep a keen eye?
>
>

-- 
Nicolas ECARNOT
_______________________________________________
Users mailing list
Users at ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20180206/580ac3ba/attachment.html>