
On Tue, Sep 26, 2017 at 04:31:22PM +0200, Nicolas Ecarnot wrote:
Le 21/09/2017 à 16:31, Stefan Hajnoczi a écrit :
On Tue, Sep 19, 2017 at 12:09:06PM +0200, Nicolas Ecarnot wrote:
Hello,
First post here, so maybe I should introduce myself : - I'm a sysadmin for decades and currently managing 4 oVirt clusters, made out of tens of hypervisors, all are CentOS 7.2+ based. - I'm very happy with this solution we choose especially because it is based on qemu-kvm (open source, reliable, documented).
On one VM, we experienced the following : - oVirt/vdsm is detecting an issue on the image - following this hints https://access.redhat.com/solutions/1173623, I managed to detect one error and fix it - the VM is now running perfectly
On two other VMs, we experienced a similar situation, except the check stage is showing something like 14000+ errors, and the relevant logs are :
Repairing refcount block 14 is outside image ERROR could not resize image: Invalid argument ERROR cluster 425984 refcount=0 reference=1 ERROR cluster 425985 refcount=0 reference=1 [... repeating the previous line 7000+ times...] ERROR cluster 457166 refcount=0 reference=1 Rebuilding refcount structure ERROR writing refblock: No space left on device qemu-img: Check failed: No space left on device
Please run strace qemu-img info /the/relevant/logical/volume/path. It
Sorry, "qemu-img info" should be your "qemu-img check" command.
will print all the syscalls that qemu-img makes. That way we'll be able to verify that the ENOSPC error is coming from a pwritev syscall. I did but I'm not skilled enough to ensure where the ENOSPC error is coming from.
Is your question meaning the reads and/or the writes may come from or go to places outside the expected boundaries?
I was interested in the syscall (probably pwritev or similar) related to the following output from qemu-img check: ERROR writing refblock: No space left on device Feel free to post your strace log so we can analyze it.
You surely know that oVirt/RHEV is storing its qcow2 images in dedicated logical volumes.
pvs/vgs/lvs are all showing there is plenty of space available, so I understand that I don't understand what "No space left on device" means.
After you have the strace data you can look at the file offset from the failing pwritev syscall and check that it's really within the LV.
I think there is no fancy thin provisioning going on at the LVM level with oVirt, but if there is then perhaps a write within the LV could still result in an ENOSPC error. It would be worth confirming that these are class "thick" LVs.
I think there is no such thin prov. at the LVM level, but I wouldn't swear. Don't you mind if I forward your question to the oVirt mailing-list?
Sure, feel free to CC other mailing lists. I have added oVirt devel. Stefan