On Feb 6, 2018 11:09 AM, "Nicolas Ecarnot" <nicolas(a)ecarnot.net> wrote:
On our two 3.6 DCs, we're still facing qcow2 corruptions, even on freshly
installed VMs (CentOS7, win2012, win2008...).
Please provide complete information on the issue. When, how often, which
(We are still hoping to find some time to migrate all this to 4.2, but it's
a big work and our one-person team - me - is overwhelmed.)
Understood. Note that we have some scripts that can assist somewhat.
My workaround is described in my previous thread below, but it's just a
Reading further, I found that :
There are many things I don't know or understand, and I'd like your opinion
- Is "virtio" is synonym of "virtio-blk"?
- Is it true that the development of virtio-scsi is active and the one of
virtio is stopped?
- People in the proxmox forum seem to say that no qcow2 corruption occurs
when using IDE (not an option for me) neither virtio-scsi.
Anecdotal evidence or properly reproduced?
Have they filed an issue?
Does any Redhat people ever heard of this?
I'm not aware of an existing corruption issue.
- Is converting all my VMs to use virtio-scsi a guarantee against further
- What is the non-official but nonetheless recommended driver oVirt devs
recommend in the sense of future, development and stability?
Depends. I like virtio-scsi for its features (DISCARD mainly), but in some
workloads virtio-blk may be somewhat faster (supposedly lower overhead).
Both interfaces are stable.
We should focus on properly reporting the issue so the qemu folks can look
Le 15/09/2017 à 14:06, Nicolas Ecarnot a écrit :
How to avoid images corruption?
On two of our old 3.6 DC, a recent series of VM migrations lead to some
- I'm putting a host into maintenance mode
- most of the VM are migrating nicely
- one remaining VM never migrates, and the logs are showing :
* engine.log : "...VM has been paused due to I/O error..."
* vdsm.log : "...Improbable extension request for volume..."
After digging amongst the RH BZ tickets, I saved the day by :
- stopping the VM
- lvchange -ay the adequate /dev/...
- qemu-img check [-r all] /rhev/blahblah
- lvchange -an...
- boot the VM
Yesterday this worked for a VM where only one error occurred on the qemu
image, and the repair was easily done by qemu-img.
Today, facing the same issue on another VM, it failed because the errors
were very numerous, and also because of this message :
Rebuilding refcount structure
ERROR writing refblock: No space left on device
qemu-img: Check failed: No space left on device
The PV/VG/LV are far from being full, so I guess I don't where to look at.
I tried many ways to solve it but I'm not comfortable at all with qemu
images, corruption and solving, so I ended up exporting this VM (to an NFS
export domain), importing it into another DC : this had the side effect to
use qemu-img convert from qcow2 to qcow2, and (maybe?????) to solve some
I also copied it into another qcow2 file with the same qemu-img convert
way, but it is leading to another clean qcow2 image without errors.
I saw that on 4.x some bugs are fixed about VM migrations, but this is not
the point here.
I checked my SANs, my network layers, my blades, the OS (CentOS 7.2) of my
hosts, but I see nothing special.
The real reason behind my message is not to know how to repair anything,
rather than to understand what could have lead to this situation?
Where to keep a keen eye?
Users mailing list