How restore nodes ovirt UP from NonResponsive and VMs executing

Hello, The node (ovirt2) however is having consistent problems. The follow sequence of events is reproducible and is causing the host to enter a "NonOperational" state on the cluster: * Host ovirt2 installed * VDSM ovirt2 command ConnectStorageServerVDS failed: Message timeout which can be caused by communication issues * Host ovirt2 is not responding. Host cannot be fenced automatically because power management for the host is disabled. * Host ovirt2 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center DataCenter1. Setting Host state to Non-Operational. (5/27/1912:43:22 PM) * (Banner appears in GUI) Failed Activating Host ovirt2.witsconsult.com * Failed to connect Host ovirt2 to Storage Pool DataCenter1 (5/27/1912:47:07 PM) * Host ovirt2 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center DataCenter1. Setting Host state to Non-Operational. (5/27/1912:47:07 PM) * Host ovirt2 is not responding. Host cannot be fenced automatically because power management for the host is disabled. (5/27/1912:47:07 PM) * VDSM ovirt2 command ConnectStorageServerVDS failed: Message timeout which can be caused by communication issues (5/27/1912:47:07 PM) I can then re-activate ovirt2, which appears as green for approximately 5 minutes and then repeats all of the above issues. What can I do to troubleshoot this?

Hello Carlos, Can I ask what kind of shared storage you have for these nodes? If iSCSI, are your targets all configured to allow multi-initiator access? If you're on iSCSI and don't allow multi-initiator, then only one node at a time can be connected. Regards, Mark

Hello Carlos, Based on the sequence of events you provided, it appears to be related to communication and storage connectivity problems. To troubleshoot this issue, here are a few steps you can take: *Verify network connectivity: Ensure that the network configuration on the ovirt2 node is correct and that it can communicate with other nodes and the storage domain. Check for any network issues or misconfigurations that could be causing the communication problems. *Check storage connectivity: Verify that the ovirt2 node can access the Storage Domain(s) attached to the Data Center. Ensure that the storage configuration is correct and that there are no connectivity issues between the node and the storage devices. *Review power management settings: As mentioned in the logs, power management for the ovirt2 host is disabled. Consider enabling power management and configuring it properly to allow for automatic fencing of non-responsive hosts. *Investigate communication issues: The timeout errors could indicate communication problems between the ovirt2 node and other components. Check for any firewall rules, network restrictions, or DNS issues that could be causing communication disruptions. *Monitor resource usage: Keep an eye on the resource usage of the ovirt2 node, such as CPU, memory, and storage. High resource utilization or potential bottlenecks can impact the node's performance and stability. *Check system logs: Review the system logs on the ovirt2 node, including VDSM logs and relevant log files, for any error messages or warnings that might shed light on the issue. I hope this helps in resolving the issue.
participants (3)
-
Arlo Hawthorne
-
carlos.mendes@mgo.cv
-
Mark R