VMs randomly pause due to unknown storage error, unable to resume

Hi,
VM xxx has been paused due to unknown storage error. Migration failed due to a failed validation: [Migrating a VM in paused status due to I/O error is not supported.] (VM: xxx, Source: yyy).
Up until recently oVirt 4.5.4 has been running fine on our RHEL 8 hypervisors with primarily (and a few appliances) Linux guests. We started to add Windows 2019 VMs to the cluster with the guest agent installed. They seem to run fine at first, but some of the Windows VMs may randomly pause due to an unknown storage error. The VM cannot be resumed through the UI or virsh. The paused VM cannot be migrated to another Hypervisor. The GlusterFS storage volumes seem fine. Resetting the VM seems to work, but eventually it'll become paused again. The only thing that came to my mind is the virtual hard disks are thin provisioned. Is a preallocated disk necessary for Windows VMs? Any helpful hints on where to look next is greatly appreciated. Thank you, Jon

On 11/10/2023 21:05, Jon Sattelberger wrote:
Hi,
VM xxx has been paused due to unknown storage error. Migration failed due to a failed validation: [Migrating a VM in paused status due to I/O error is not supported.] (VM: xxx, Source: yyy). Up until recently oVirt 4.5.4 has been running fine on our RHEL 8 hypervisors with primarily (and a few appliances) Linux guests. We started to add Windows 2019 VMs to the cluster with the guest agent installed. They seem to run fine at first, but some of the Windows VMs may randomly pause due to an unknown storage error. The VM cannot be resumed through the UI or virsh. The paused VM cannot be migrated to another Hypervisor. The GlusterFS storage volumes seem fine. Resetting the VM seems to work, but eventually it'll become paused again. The only thing that came to my mind is the virtual hard disks are thin provisioned. Is a preallocated disk necessary for Windows VMs? Any helpful hints on where to look next is greatly appreciated.
What error do you receive? Do you have discard enabled on the disks? I would have a look at the vdsm/libvirt logs on the hypervisor to see if it gives any additional information.
Thank you,
Jon _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/7MYLWC6YN2TY2O...

Hi Jean-Louis, Sorry for the delay. I was waiting for it to occur again. While waiting for it to occur again I've updated it to the latest oVirt 4.5.5 snapshot sometime late last week.
Do you have discard enabled on the disks?
No. Discard is not enabled. This is a thin provisioned disk, however. Here is what I found: Event log: Oct 23, 2023, 1:00:00 PM VM win-guest-vm1 has been paused due to unknown storage error. Virsh: ~]$ sudo /usr/bin/virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf domblkerror win-guest-vm1 sda: unspecified error Log files: File: vdsm.log 2023-10-23 13:00:00,787-0400 INFO (libvirt/events) [virt.vm] (vmId='9bc1b5cd-7a28-463a-bb47-e315d16aa02b') abnormal vm stop device ua-bf610b0d-fffc-4569-bf31-9ab76d95e5c0 error (vm:5192) 2023-10-23 13:00:00,787-0400 INFO (libvirt/events) [virt.vm] (vmId='9bc1b5cd-7a28-463a-bb47-e315d16aa02b') CPU stopped: onIOError (vm:6073) 2023-10-23 13:00:00,790-0400 INFO (libvirt/events) [virt.vm] (vmId='9bc1b5cd-7a28-463a-bb47-e315d16aa02b') CPU stopped: onSuspend (vm:6073) 2023-10-23 13:00:00,807-0400 WARN (libvirt/events) [virt.vm] (vmId='9bc1b5cd-7a28-463a-bb47-e315d16aa02b') device sda reported I/O error (vm:3990) ... snip ... 2023-10-23 14:02:04,886-0400 WARN (jsonrpc/2) [virt.vm] (vmId='9bc1b5cd-7a28-463a-bb47-e315d16aa02b') monitor became unresponsive (command timeout, age=3731.5355101790046) (vm:6086) File mom.log 2023-10-23 13:00:01,929 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:01,929 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:07,645 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:07,646 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:09,092 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:09,093 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:11,165 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:11,165 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:12,367 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:12,368 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:14,785 - mom.Collectors.GuestMemory - WARNING - getVmMemoryStats(): The ovirt-guest-agent is not active 2023-10-23 13:00:19,349 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:19,365 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:21,372 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:21,373 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:22,763 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:22,763 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:27,380 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:27,381 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:29,597 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:29,598 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:29,798 - mom.GuestMonitor.Thread - INFO - GuestMonitor-win-guest-vm1 ending 2023-10-23 13:00:31,546 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:31,547 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:37,886 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:37,887 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:39,808 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:39,808 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:41,718 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:41,719 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:42,379 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:42,379 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:49,996 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:49,997 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:51,898 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:51,899 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:53,015 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:53,016 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:57,390 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:57,391 - mom.RPCServer - INFO - getStatistics() 2023-10-23 13:00:59,244 - mom.RPCServer - INFO - ping() 2023-10-23 13:00:59,245 - mom.RPCServer - INFO - getStatistics() File: rhev-data-center-mnt-glusterSD-hypervisor-0.gluster.stor.prd.lab:_vmstore.log [2023-10-23 16:53:02.685763 +0000] I [MSGID: 133022] [shard.c:3838:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=c3b441d4-e635-4791-9f19-029e2314c4f5 from backend [2023-10-23 16:58:02.985859 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:670:client4_0_writev_cbk] 0-vmstore-client-4: remote operation failed. [{errno=22}, {error=Invalid argument}] [2023-10-23 16:58:02.985889 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:670:client4_0_writev_cbk] 0-vmstore-client-3: remote operation failed. [{errno=22}, {error=Invalid argument}] [2023-10-23 16:58:02.985905 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:670:client4_0_writev_cbk] 0-vmstore-client-5: remote operation failed. [{errno=22}, {error=Invalid argument}] [2023-10-23 16:58:03.986544 +0000] W [fuse-bridge.c:3067:fuse_writev_cbk] 0-glusterfs-fuse: 38146917: WRITE => -1 gfid=b1fc43e0-2348-42ed-a7e7-cf832bc4f2ed fd=0x56486713f738 (Invalid argume nt) [2023-10-23 16:58:03.994009 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-0: remote operation failed. [{path=<gfid:b1fc43e0-2348-42ed-a7e7-cf832 bc4f2ed>}, {gfid=b1fc43e0-2348-42ed-a7e7-cf832bc4f2ed}, {errno=2}, {error=No such file or directory}] [2023-10-23 16:58:03.994035 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-1: remote operation failed. [{path=<gfid:b1fc43e0-2348-42ed-a7e7-cf832 bc4f2ed>}, {gfid=b1fc43e0-2348-42ed-a7e7-cf832bc4f2ed}, {errno=2}, {error=No such file or directory}] [2023-10-23 16:58:03.994067 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-3: remote operation failed. [{path=<gfid:b1fc43e0-2348-42ed-a7e7-cf832 bc4f2ed>}, {gfid=b1fc43e0-2348-42ed-a7e7-cf832bc4f2ed}, {errno=2}, {error=No such file or directory}] [2023-10-23 16:58:03.994067 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-2: remote operation failed. [{path=<gfid:b1fc43e0-2348-42ed-a7e7-cf832 bc4f2ed>}, {gfid=b1fc43e0-2348-42ed-a7e7-cf832bc4f2ed}, {errno=2}, {error=No such file or directory}] [2023-10-23 16:58:03.994092 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-4: remote operation failed. [{path=<gfid:b1fc43e0-2348-42ed-a7e7-cf832 bc4f2ed>}, {gfid=b1fc43e0-2348-42ed-a7e7-cf832bc4f2ed}, {errno=2}, {error=No such file or directory}] [2023-10-23 16:58:03.994102 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-5: remote operation failed. [{path=<gfid:b1fc43e0-2348-42ed-a7e7-cf832 bc4f2ed>}, {gfid=b1fc43e0-2348-42ed-a7e7-cf832bc4f2ed}, {errno=2}, {error=No such file or directory}] [2023-10-23 16:58:03.996116 +0000] I [MSGID: 133022] [shard.c:3838:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=b1fc43e0-2348-42ed-a7e7-cf832bc4f2ed from backend [2023-10-23 17:00:00.781887 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2683:client4_0_readv_cbk] 0-vmstore-client-5: remote operation failed. [{errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.782585 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1277:client4_0_fstat_cbk] 0-vmstore-client-5: remote operation failed. [{errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.782602 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1277:client4_0_fstat_cbk] 0-vmstore-client-3: remote operation failed. [{errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.782641 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1277:client4_0_fstat_cbk] 0-vmstore-client-4: remote operation failed. [{errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784639 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-0: remote operation failed. [{path=/.shard/6990a191-bd6c-4078-876c-77cdf853bc89.263}, {gfid=2a686544-d463-4c8d-adfc-540fd26d5889}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784640 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-1: remote operation failed. [{path=/.shard/6990a191-bd6c-4078-876c-77cdf853bc89.263}, {gfid=2a686544-d463-4c8d-adfc-540fd26d5889}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784667 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-2: remote operation failed. [{path=/.shard/6990a191-bd6c-4078-876c-77cdf853bc89.263}, {gfid=2a686544-d463-4c8d-adfc-540fd26d5889}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784694 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-4: remote operation failed. [{path=/.shard/6990a191-bd6c-4078-876c-77cdf853bc89.263}, {gfid=2a686544-d463-4c8d-adfc-540fd26d5889}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784704 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-3: remote operation failed. [{path=/.shard/6990a191-bd6c-4078-876c-77cdf853bc89.263}, {gfid=2a686544-d463-4c8d-adfc-540fd26d5889}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784765 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-5: remote operation failed. [{path=/.shard/6990a191-bd6c-4078-876c-77cdf853bc89.263}, {gfid=2a686544-d463-4c8d-adfc-540fd26d5889}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784801 +0000] E [MSGID: 109038] [dht-helper.c:1367:dht_migration_complete_check_task] 0-vmstore-dht: failed to lookup the file [{tmp=2a686544-d463-4c8d-adfc-540fd26d5889}, {name=vmstore-dht}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784838 +0000] W [fuse-bridge.c:2979:fuse_readv_cbk] 0-glusterfs-fuse: 38149792: READ => -1 gfid=6990a191-bd6c-4078-876c-77cdf853bc89 fd=0x5648663a7698 (No such file or directory) [2023-10-23 17:00:00.784062 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1277:client4_0_fstat_cbk] 0-vmstore-client-4: remote operation failed. [{errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784145 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1277:client4_0_fstat_cbk] 0-vmstore-client-3: remote operation failed. [{errno=2}, {error=No such file or directory}] [2023-10-23 17:00:00.784340 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1277:client4_0_fstat_cbk] 0-vmstore-client-5: remote operation failed. [{errno=2}, {error=No such file or directory}] [2023-10-23 17:03:04.290996 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:670:client4_0_writev_cbk] 0-vmstore-client-4: remote operation failed. [{errno=22}, {error=Invalid argument}] [2023-10-23 17:03:04.291057 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:670:client4_0_writev_cbk] 0-vmstore-client-5: remote operation failed. [{errno=22}, {error=Invalid argument}] [2023-10-23 17:03:04.291076 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:670:client4_0_writev_cbk] 0-vmstore-client-3: remote operation failed. [{errno=22}, {error=Invalid argument}] [2023-10-23 17:03:05.291822 +0000] W [fuse-bridge.c:3067:fuse_writev_cbk] 0-glusterfs-fuse: 38167016: WRITE => -1 gfid=8810a5ea-6bd6-4da8-8901-abee03e8a210 fd=0x564867504798 (Invalid argument) [2023-10-23 17:03:05.298181 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-2: remote operation failed. [{path=<gfid:8810a5ea-6bd6-4da8-8901-abee03e8a210>}, {gfid=8810a5ea-6bd6-4da8-8901-abee03e8a210}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:03:05.298204 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-0: remote operation failed. [{path=<gfid:8810a5ea-6bd6-4da8-8901-abee03e8a210>}, {gfid=8810a5ea-6bd6-4da8-8901-abee03e8a210}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:03:05.298207 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-3: remote operation failed. [{path=<gfid:8810a5ea-6bd6-4da8-8901-abee03e8a210>}, {gfid=8810a5ea-6bd6-4da8-8901-abee03e8a210}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:03:05.298216 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-1: remote operation failed. [{path=<gfid:8810a5ea-6bd6-4da8-8901-abee03e8a210>}, {gfid=8810a5ea-6bd6-4da8-8901-abee03e8a210}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:03:05.298223 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-4: remote operation failed. [{path=<gfid:8810a5ea-6bd6-4da8-8901-abee03e8a210>}, {gfid=8810a5ea-6bd6-4da8-8901-abee03e8a210}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:03:05.298241 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-vmstore-client-5: remote operation failed. [{path=<gfid:8810a5ea-6bd6-4da8-8901-abee03e8a210>}, {gfid=8810a5ea-6bd6-4da8-8901-abee03e8a210}, {errno=2}, {error=No such file or directory}] [2023-10-23 17:03:05.299577 +0000] I [MSGID: 133022] [shard.c:3838:shard_delete_shards] 0-vmstore-shard: Deleted shards of gfid=8810a5ea-6bd6-4da8-8901-abee03e8a210 from backend

Unfortunately my suggestion to use a preallocated disk image was not the solution. I see a Windows VM has been "paused due to unknown storage error" as of this morning. Discard is not enabled. Similar message for virsh and messages in the vdsm/gluster logs.
participants (2)
-
Jean-Louis Dupond
-
Jon Sattelberger