[ovirt-users] Had issue with storage and now storage domain won't mount VMs are in unknown status

10 Jun 2023

      Hi All,

We have an odd setup in out environment but each storage data center has one host and one storage domain.

We had an issue with the storage domain attached to a host. After the reboot I am seeing in the vdsm logs over and over again vmrecovery

2023-06-09 21:01:30,419+0000 INFO  (periodic/2) [vdsm.api] START repoStats(domains=()) from=internal, task_id=40f5b198-cb82-4ba2-8c20-b8cee34a7f47 (api:48)
2023-06-09 21:01:30,420+0000 INFO  (periodic/2) [vdsm.api] FINISH repoStats return={} from=internal, task_id=40f5b198-cb82-4ba2-8c20-b8cee34a7f47 (api:54)
2023-06-09 21:01:30,810+0000 INFO  (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=74b1a1cf-fab1-4918-b0da-b3fd152d9d1a (api:48)
2023-06-09 21:01:30,811+0000 INFO  (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=74b1a1cf-fab1-4918-b0da-b3fd152d9d1a (api:54)
2023-06-09 21:01:30,811+0000 INFO  (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)

I've also checked the firewall and it is still disabled. 

systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/libvirtd.service.d
           └─unlimited-core.conf
   Active: active (running) since Fri 2023-06-09 20:51:11 UTC; 16min ago
     Docs: man:libvirtd(8)
           https://libvirt.org
 Main PID: 4984 (libvirtd)
    Tasks: 17 (limit: 32768)
   Memory: 39.7M
   CGroup: /system.slice/libvirtd.service
           └─4984 /usr/sbin/libvirtd --listen

Jun 09 20:51:11 hlkvm01 systemd[1]: Starting Virtualization daemon...
Jun 09 20:51:11 hlkvm01 systemd[1]: Started Virtualization daemon.

● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2023-06-09 20:53:11 UTC; 14min ago
  Process: 10496 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 10587 (vdsmd)
    Tasks: 39
   Memory: 79.5M
   CGroup: /system.slice/vdsmd.service
           └─10587 /usr/bin/python2 /usr/share/vdsm/vdsmd

Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|596001c3-33e7-44a4-bdf9-0b53ab1dd810' args={'596001c3-33e7-44a4-bdf9-0b53ab1dd810': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-2283890943663580625', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '596001c3-33e7-44a4-bdf9-0b53ab1dd810', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893978', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}
Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|87155499-1e10-4228-aa69-7c487007746e' args={'87155499-1e10-4228-aa69-7c487007746e': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-5453960159391982695', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '87155499-1e10-4228-aa69-7c487007746e', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893973', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}
Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|0ec7a66d-fac2-4a4a-a939-e05fc7b097b7' args={'0ec7a66d-fac2-4a4a-a939-e05fc7b097b7': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-1793949836195780752', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '0ec7a66d-fac2-4a4a-a939-e05fc7b097b7', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893976', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}
Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|9c8802c3-c7c9-473c-bbfb-abb0bd0f8fdb' args={'9c8802c3-c7c9-473c-bbfb-abb0bd0f8fdb': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-1144924804541449415', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '9c8802c3-c7c9-473c-bbfb-abb0bd0f8fdb', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893971', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}
Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|f799c326-9969-4892-8d67-3b1229baf0ef' args={'f799c326-9969-4892-8d67-3b1229baf0ef': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '5564598485369155833', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': 'f799c326-9969-4892-8d67-3b1229baf0ef', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893980', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}
Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|e9311d9f-d770-458b-b5ad-cdc2eb35f1bd' args={'e9311d9f-d770-458b-b5ad-cdc2eb35f1bd': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-5622951617346770490', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': 'e9311d9f-d770-458b-b5ad-cdc2eb35f1bd', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893972', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}
Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|1fc4ddad-203f-4cdf-9cb3-c3d66fb97c87' args={'1fc4ddad-203f-4cdf-9cb3-c3d66fb97c87': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-1397731328049024241', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '1fc4ddad-203f-4cdf-9cb3-c3d66fb97c87', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893981', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}
Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|321183ed-b0a6-42c7-bbee-2ad46a5f37ae' args={'321183ed-b0a6-42c7-bbee-2ad46a5f37ae': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '4398712824561987912', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '321183ed-b0a6-42c7-bbee-2ad46a5f37ae', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893970', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}
Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|731a11e8-62ba-4639-bdee-8c44b5790d82' args={'731a11e8-62ba-4639-bdee-8c44b5790d82': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-1278467655696539707', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '731a11e8-62ba-4639-bdee-8c44b5790d82', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893977', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}
Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|411a97e6-41c7-473e-819b-04aa10bc2bf0' args={'411a97e6-41c7-473e-819b-04aa10bc2bf0': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-11964682092647781', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '411a97e6-41c7-473e-819b-04aa10bc2bf0', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893975', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}}

This has been going on for hours. On the management VM I am seeing the following over and over again

2023-06-09 13:59:25,129-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM hlkvm01 command Get Host Capabilities failed: Message timeout which can be caused by communication issues
2023-06-09 13:59:25,129-07 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [] Unable to RefreshCapabilities: VDSNetworkE
xception: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues

I am able to restart the Host from the management VM (Web Console). When I try to put the host in maintenance mode I get "Error while executing action. Cannot switch Host to Maintenance mode. Hose still has running VMs on it and is in Non Responsive state"

If I try to set the "Confirm host has been rebooted" I get an error saying that another power management action is already in progress. Can someone please help me out here? Is there a way to set the manage for all of the VMs to down? Anything I can do to get the storage domain back up?

Thanks

[ovirt-users] Had issue with storage and now storage domain won't mount VMs are in unknown status

jsmith1299＠live.com