So,
I did more digging and now I know how to reproduce it.
I created a VM and added a disk on local ssd using scratchpad hook,
formatted and mounted this scratchdisk.
Now, when I try to do heavy IO on this scratchdisk on local ssd, like,
dd if=/dev/zero of=/mnt/scratchdisk/test bs=1M count=10000, qemu
pauses VM.
Debug logs in libvirt shows
2021-09-23 11:04:32.765+0000: 463319: debug : virThreadJobSet:94 :
Thread 463319 (rpc-worker) is now running job
remoteDispatchNodeGetFreePages
2021-09-23 11:04:32.765+0000: 463319: debug : virNodeGetFreePages:1614
: conn=0x7f8620018ba0, npages=3, pages=0x7f8670009960,
startCell=4294967295, cellCount=1, counts=0x7f8670007db0, flags=0x0
2021-09-23 11:04:32.765+0000: 463319: debug : virThreadJobClear:119 :
Thread 463319 (rpc-worker) finished job remoteDispatchNodeGetFreePages
with ret=0
2021-09-23 11:04:34.235+0000: 488774: debug :
qemuMonitorJSONIOProcessLine:220 : Line [{"timestamp": {"seconds":
1632395074, "microseconds": 235454}, "event":
"BLOCK_IO_ERROR",
"data": {"device": "", "nospace": false,
"node-name":
"libvirt-3-format", "reason": "Input/output error",
"operation":
"write", "action": "stop"}}]
2021-09-23 11:04:34.235+0000: 488774: info :
qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT:
mon=0x7f860c14b700 event={"timestamp": {"seconds": 1632395074,
"microseconds": 235454}, "event": "BLOCK_IO_ERROR",
"data": {"device":
"", "nospace": false, "node-name":
"libvirt-3-format", "reason":
"Input/output error", "operation": "write",
"action": "stop"}}
2021-09-23 11:04:34.235+0000: 488774: debug :
qemuMonitorJSONIOProcessEvent:181 : mon=0x7f860c14b700
obj=0x7f860c0e7450
2021-09-23 11:04:34.235+0000: 488774: debug :
qemuMonitorEmitEvent:1166 : mon=0x7f860c14b700 event=BLOCK_IO_ERROR
2021-09-23 11:04:34.235+0000: 488774: debug :
qemuProcessHandleEvent:581 : vm=0x7f86201d6df0
2021-09-23 11:04:34.235+0000: 488774: debug : virObjectEventNew:624 :
obj=0x7f860c0d82f0
2021-09-23 11:04:34.235+0000: 488774: debug :
qemuMonitorJSONIOProcessEvent:206 : handle BLOCK_IO_ERROR
handler=0x7f8639c77a90 data=0x7f860c0661c0
To confirm the local ssd is fine, have enough space where scratch disk
is located and I could run dd in host without any issues.
This happens on other storages as well.
So this seems like an issue with qemu when heavy IO is happening on a disk.
On Thu, Sep 23, 2021 at 7:19 AM Tommy Sway <sz_cuitao(a)163.com> wrote:
>
> Another option with (still tech preview) is Managed Block Storage (Cinder based
storage).
>
> It still tech preview in 4.4 ??
>
>
>
>
>
>
>
> -----Original Message-----
> From: users-bounces(a)ovirt.org <users-bounces(a)ovirt.org> On Behalf Of Nir
Soffer
> Sent: Wednesday, August 11, 2021 4:26 AM
> To: Shantur Rathore <shantur.rathore(a)gmail.com>
> Cc: users <users(a)ovirt.org>; Roman Bednar <rbednar(a)redhat.com>
> Subject: [ovirt-users] Re: Sparse VMs from Templates - Storage issues
>
> On Tue, Aug 10, 2021 at 4:24 PM Shantur Rathore <shantur.rathore(a)gmail.com>
wrote:
> >
> > Hi all,
> >
> > I have a setup as detailed below
> >
> > - iSCSI Storage Domain
> > - Template with Thin QCOW2 disk
> > - Multiple VMs from Template with Thin disk
>
> Note that a single template disk used by many vms can become a performance
bottleneck, and is a single point of failure. Cloning the template when creating vms
avoids such issues.
>
> > oVirt Node 4.4.4
>
> 4.4.4 is old, you should upgrade to 4.4.7.
>
> > When the VMs boots up it downloads some data to it and that leads to increase in
volume size.
> > I see that every few seconds the VM gets paused with
> >
> > "VM X has been paused due to no Storage space error."
> >
> > and then after few seconds
> >
> > "VM X has recovered from paused back to up"
>
> This is normal operation when a vm writes too quickly and oVirt cannot extend the
disk quick enough. To mitigate this, you can increase the volume chunk size.
>
> Created this configuration drop in file:
>
> # cat /etc/vdsm/vdsm.conf.d/99-local.conf
> [irs]
> volume_utilization_percent = 25
> volume_utilization_chunk_mb = 2048
>
> And restart vdsm.
>
> With this setting, when free space in a disk is 1.5g, the disk will be extended by
2g. With the default setting, when free space is 0.5g the disk was extended by 1g.
>
> If this does not eliminate the pauses, try a larger chunk size like 4096.
>
> > Sometimes after a many pause and recovery the VM dies with
> >
> > "VM X is down with error. Exit message: Lost connection with qemu
process."
>
> This means qemu has crashed. You can find more info in the vm log at:
> /var/log/libvirt/qemu/vm-name.log
>
> We know about bugs in qemu that cause such crashes when vm disk is extended. I think
the latest bug was fixed in 4.4.6, so upgrading to 4.4.7 will fix this issue.
>
> Even with these settings, if you have a very bursty io in the vm, it may become
paused. The only way to completely avoid these pauses is to use a preallocated disk, or
use file storage (e.g. NFS). Preallocated disk can be thin provisioned on the server side
so it does not mean you need more storage, but you will not be able to use shared
templates in the way you use them now. You can create vm from template, but the template
is cloned to the new vm.
>
> Another option with (still tech preview) is Managed Block Storage (Cinder based
storage). If your storage server is supported by Cinder, we can managed it using
cinderlib. In this setup every disk is a LUN, which may be thin provisioned on the storage
server. This can also offload storage operations to the server, like cloning disks, which
may be much faster and more efficient.
>
> Nir
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W653KLDZMLU...
>