On Tue, Sep 26, 2017 at 04:31:22PM +0200, Nicolas Ecarnot wrote:
Le 21/09/2017 à 16:31, Stefan Hajnoczi a écrit :
> On Tue, Sep 19, 2017 at 12:09:06PM +0200, Nicolas Ecarnot wrote:
> > Hello,
> >
> > First post here, so maybe I should introduce myself :
> > - I'm a sysadmin for decades and currently managing 4 oVirt clusters, made
> > out of tens of hypervisors, all are CentOS 7.2+ based.
> > - I'm very happy with this solution we choose especially because it is
based
> > on qemu-kvm (open source, reliable, documented).
> >
> > On one VM, we experienced the following :
> > - oVirt/vdsm is detecting an issue on the image
> > - following this hints
https://access.redhat.com/solutions/1173623, I
> > managed to detect one error and fix it
> > - the VM is now running perfectly
> >
> > On two other VMs, we experienced a similar situation, except the check stage
> > is showing something like 14000+ errors, and the relevant logs are :
> >
> > Repairing refcount block 14 is outside image
> > ERROR could not resize image: Invalid argument
> > ERROR cluster 425984 refcount=0 reference=1
> > ERROR cluster 425985 refcount=0 reference=1
> > [... repeating the previous line 7000+ times...]
> > ERROR cluster 457166 refcount=0 reference=1
> > Rebuilding refcount structure
> > ERROR writing refblock: No space left on device
> > qemu-img: Check failed: No space left on device
>
> Please run strace qemu-img info /the/relevant/logical/volume/path. It
Sorry, "qemu-img info" should be your "qemu-img check" command.
> will print all the syscalls that qemu-img makes. That way
we'll be able
> to verify that the ENOSPC error is coming from a pwritev syscall.
I did but I'm not skilled enough to ensure where the ENOSPC error is coming
from.
Is your question meaning the reads and/or the writes may come from or go to
places outside the expected boundaries?
I was interested in the syscall (probably pwritev or similar) related to
the following output from qemu-img check:
ERROR writing refblock: No space left on device
Feel free to post your strace log so we can analyze it.
> > You surely know that oVirt/RHEV is storing its qcow2 images
in dedicated
> > logical volumes.
> >
> > pvs/vgs/lvs are all showing there is plenty of space available, so I
> > understand that I don't understand what "No space left on device"
means.
>
> After you have the strace data you can look at the file offset from the
> failing pwritev syscall and check that it's really within the LV.
>
> I think there is no fancy thin provisioning going on at the LVM level
> with oVirt, but if there is then perhaps a write within the LV could
> still result in an ENOSPC error. It would be worth confirming that
> these are class "thick" LVs.
I think there is no such thin prov. at the LVM level, but I wouldn't swear.
Don't you mind if I forward your question to the oVirt mailing-list?
Sure, feel free to CC other mailing lists. I have added oVirt devel.
Stefan