Had issue with storage and now storage domain won't mount VMs are in unknown status

Hi All, We have an odd setup in out environment but each storage data center has one host and one storage domain. We had an issue with the storage domain attached to a host. After the reboot I am seeing in the vdsm logs over and over again vmrecovery 2023-06-09 21:01:30,419+0000 INFO (periodic/2) [vdsm.api] START repoStats(domains=()) from=internal, task_id=40f5b198-cb82-4ba2-8c20-b8cee34a7f47 (api:48) 2023-06-09 21:01:30,420+0000 INFO (periodic/2) [vdsm.api] FINISH repoStats return={} from=internal, task_id=40f5b198-cb82-4ba2-8c20-b8cee34a7f47 (api:54) 2023-06-09 21:01:30,810+0000 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=74b1a1cf-fab1-4918-b0da-b3fd152d9d1a (api:48) 2023-06-09 21:01:30,811+0000 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=74b1a1cf-fab1-4918-b0da-b3fd152d9d1a (api:54) 2023-06-09 21:01:30,811+0000 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723) I've also checked the firewall and it is still disabled. systemctl status libvirtd ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/libvirtd.service.d └─unlimited-core.conf Active: active (running) since Fri 2023-06-09 20:51:11 UTC; 16min ago Docs: man:libvirtd(8) https://libvirt.org Main PID: 4984 (libvirtd) Tasks: 17 (limit: 32768) Memory: 39.7M CGroup: /system.slice/libvirtd.service └─4984 /usr/sbin/libvirtd --listen Jun 09 20:51:11 hlkvm01 systemd[1]: Starting Virtualization daemon... Jun 09 20:51:11 hlkvm01 systemd[1]: Started Virtualization daemon. ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2023-06-09 20:53:11 UTC; 14min ago Process: 10496 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 10587 (vdsmd) Tasks: 39 Memory: 79.5M CGroup: /system.slice/vdsmd.service └─10587 /usr/bin/python2 /usr/share/vdsm/vdsmd Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|596001c3-33e7-44a4-bdf9-0b53ab1dd810' args={'596001c3-33e7-44a4-bdf9-0b53ab1dd810': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-2283890943663580625', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '596001c3-33e7-44a4-bdf9-0b53ab1dd810', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893978', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|87155499-1e10-4228-aa69-7c487007746e' args={'87155499-1e10-4228-aa69-7c487007746e': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-5453960159391982695', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '87155499-1e10-4228-aa69-7c487007746e', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893973', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|0ec7a66d-fac2-4a4a-a939-e05fc7b097b7' args={'0ec7a66d-fac2-4a4a-a939-e05fc7b097b7': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-1793949836195780752', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '0ec7a66d-fac2-4a4a-a939-e05fc7b097b7', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893976', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|9c8802c3-c7c9-473c-bbfb-abb0bd0f8fdb' args={'9c8802c3-c7c9-473c-bbfb-abb0bd0f8fdb': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-1144924804541449415', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '9c8802c3-c7c9-473c-bbfb-abb0bd0f8fdb', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893971', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|f799c326-9969-4892-8d67-3b1229baf0ef' args={'f799c326-9969-4892-8d67-3b1229baf0ef': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '5564598485369155833', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': 'f799c326-9969-4892-8d67-3b1229baf0ef', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893980', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|e9311d9f-d770-458b-b5ad-cdc2eb35f1bd' args={'e9311d9f-d770-458b-b5ad-cdc2eb35f1bd': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-5622951617346770490', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': 'e9311d9f-d770-458b-b5ad-cdc2eb35f1bd', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893972', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|1fc4ddad-203f-4cdf-9cb3-c3d66fb97c87' args={'1fc4ddad-203f-4cdf-9cb3-c3d66fb97c87': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-1397731328049024241', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '1fc4ddad-203f-4cdf-9cb3-c3d66fb97c87', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893981', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|321183ed-b0a6-42c7-bbee-2ad46a5f37ae' args={'321183ed-b0a6-42c7-bbee-2ad46a5f37ae': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '4398712824561987912', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '321183ed-b0a6-42c7-bbee-2ad46a5f37ae', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893970', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|731a11e8-62ba-4639-bdee-8c44b5790d82' args={'731a11e8-62ba-4639-bdee-8c44b5790d82': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-1278467655696539707', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '731a11e8-62ba-4639-bdee-8c44b5790d82', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893977', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} Jun 09 20:53:14 hlkvm01 vdsm[10587]: WARN Not ready yet, ignoring event '|virt|VM_status|411a97e6-41c7-473e-819b-04aa10bc2bf0' args={'411a97e6-41c7-473e-819b-04aa10bc2bf0': {'status': 'Down', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': '-1'}], 'hash': '-11964682092647781', 'exitMessage': 'VM terminated with error', 'cpuUser': '0.00', 'monitorResponse': '0', 'vmId': '411a97e6-41c7-473e-819b-04aa10bc2bf0', 'exitReason': 1, 'cpuUsage': '0.00', 'elapsedTime': '893975', 'cpuSys': '0.00', 'timeOffset': '0', 'clientIp': '', 'exitCode': 1}} This has been going on for hours. On the management VM I am seeing the following over and over again 2023-06-09 13:59:25,129-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM hlkvm01 command Get Host Capabilities failed: Message timeout which can be caused by communication issues 2023-06-09 13:59:25,129-07 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [] Unable to RefreshCapabilities: VDSNetworkE xception: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues I am able to restart the Host from the management VM (Web Console). When I try to put the host in maintenance mode I get "Error while executing action. Cannot switch Host to Maintenance mode. Hose still has running VMs on it and is in Non Responsive state" If I try to set the "Confirm host has been rebooted" I get an error saying that another power management action is already in progress. Can someone please help me out here? Is there a way to set the manage for all of the VMs to down? Anything I can do to get the storage domain back up? Thanks

In case someone else comes across this here is what I did to resolve the issue. - Restarted ovirt-engine (Management VM) - The restart caused the host that had the issue to show as Up status even though Storage Domain for it was down - Put host in maintenance mode - Reinstalled software - Storage domain is now up as well as host. Issue resolved. Thanks
participants (1)
-
jsmith1299@live.com