Non-responsive vm's due to crashed host and hosted vm liveliness check fails

Dear Community: The local drive on a host running ovirt-node-ng-4.1.9 in a three node cluster failed. I have production JIRA and Postres running on it at the time, not in HA, just simple vm's. Storage is via NFS on a Synology NAS. Hosted Engine was on a different host, but JIRA and Postgres vm's showed nonresponsive. I tried different things but then stupidly thought that upgrading my Hosted Engine would reinitialize the vm's that were on the failed host. Update of HE seemed to go well (output below) but now my Hosted Engine, while up, fails the liveliness check and the web management console is unavailable. I cannot console into the HE from the host it is running on. Below are the results of my attempts to console into the Hosted Engine. Please help! I have search forums, lists and Google but have not been able to fix this. My coworkers and manager are anxious. --- When I try "hosted-engine --console" after setting the console password I get The engine VM is running on this host Connected to domain HostedEngine Escape character is ^] _ The prompt is non-responsive except for the escape character key combo. --- "virsh -r list" gives ID 3, Name: HostedEngine, State: running "virsh -r console HostedEngine" gives Connected to domain HostedEngine Escape character is ^] error: operation forbidden: read only access prevents virDomainOpenConsole "virsh -r vncdisplay HostedEngine" gives "0:0" and returns me to prompt --- I am SSHed into the host running my Hosted Engine from a CentOS7 minimal install with packages xorg-x11-server-Xorg, xorg-x11-xauth and xorg-x11-apps installed. The result of "grep -i X11Forwarding /etc/ssh/sshd_config" shows it set to "Yes". I SSH into the host using "ssh -Y root@xxx.xxx.xxx.xxx" I am logged into the CentOS7 minimal install as root. I know root is poor practice but was trying to minimize anything that could be causing an issue. --- Below are the results of my attempt to update Hosted Engine (slightly redacted to remove personal info): --== CONFIGURATION PREVIEW ==-- Default SAN wipe after delete : False Firewall manager : firewalld Update Firewall : True Host FQDN : ovengineint.xdomainx.tld Upgrade packages : True Engine database secured connection : False Engine database user name : engine Engine database name : engine Engine database host : localhost Engine database port : 5432 Engine database host name validation : False Engine installation : True PKI organization : xdomainx.tld Set up ovirt-provider-ovn : True Configure WebSocket Proxy : True DWH installation : True DWH database secured connection : False DWH database host : localhost DWH database user name : ovirt_engine_history DWH database name : ovirt_engine_history DWH database port : 5432 DWH database host name validation : False Configure Image I/O Proxy : True Configure VMConsole Proxy : True --== SUMMARY ==-- [ INFO ] Restarting httpd Web access is enabled at: http://ovengineint.xdomainx.tld:80/ovirt-engine https://ovengineint.xdomainx.tld:443/ovirt-engine Internal CA XX:XX:XX:XX... SSH fingerprint: SHA256:xxxxxxxxxx... --== END OF SUMMARY ==-- [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20180502165652-88pkpi.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20180502170149-setup.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ INFO ] Execution of setup completed successfully Again, thank you so very much for any suggestions! I have found many answers on this mailing list archive to be of great insight and help. Respectfully, Charles

Hi, I am still working to resolve my issue - is there any further detail or clarification I can provide that might help? I really appreciate your time. Thank you, Charles

Hi, You could try accessing the engine VM using the VNC. First, set the VNC password using 'hosted-engine --add-console-password', and then connect to the host using a VNC viewer, for example: 'remove-viewer vnc://HOST_IP:5900' The liveliness check just checks if the web UI is running and reachable. If the VNC works, try checking if the 'ovirt-engine' service is running on the VM and that the network is configured properly. Andrej On 30 May 2018 at 18:31, <clam2718@gmail.com> wrote:
Hi,
I am still working to resolve my issue - is there any further detail or clarification I can provide that might help? I really appreciate your time.
Thank you, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/XE37L37BGTWIWXOYP5PTYM64I2NNKRO5/

Hi,
You could try accessing the engine VM using the VNC. First, set the VNC password using 'hosted-engine --add-console-password', and then connect to the host using a VNC viewer, for example: 'remove-viewer vnc://HOST_IP:5900'
The liveliness check just checks if the web UI is running and reachable. If the VNC works, try checking if the 'ovirt-engine' service is running on the VM and that the network is configured properly.
Andrej
On 30 May 2018 at 18:31, <clam2718(a)gmail.com> wrote: Thank you very much Andrej. When I do as you suggest I am still unable to console in - I receive "Gtk-WARNING **: cannot open display:"
Any other ideas? Thanks, Charles

This is an error from the VNC client, probably related to X forwarding. Try setting the DISPLAY environment variable before running remote-viewer: 'export DISPLAY=0:0' Or alternatively, you could setup port forwarding when connecting the the host, and then run the VNC client on your local machine: ssh -L 5900:localhost:5900 root@xxx.xxx.xxx.xxx remote-viewer vnc://localhost:5900 Andrej On 1 June 2018 at 05:51, <clam2718@gmail.com> wrote:
Hi,
You could try accessing the engine VM using the VNC. First, set the VNC password using 'hosted-engine --add-console-password', and then connect to the host using a VNC viewer, for example: 'remove-viewer vnc://HOST_IP:5900'
The liveliness check just checks if the web UI is running and reachable. If the VNC works, try checking if the 'ovirt-engine' service is running on the VM and that the network is configured properly.
Andrej
On 30 May 2018 at 18:31, <clam2718(a)gmail.com> wrote: Thank you very much Andrej. When I do as you suggest I am still unable to console in - I receive "Gtk-WARNING **: cannot open display:"
Any other ideas?
Thanks, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/5W4N6BGVCCABUQTFZFOAE6ITYVCD2E2H/

Andrej, thank you so much!!! I knew I was having issues with X11, and was trying to troubleshoot that here. Your suggestion to setup port forwarding, with some tweaks, worked!!! Thank you thank you. I am now able to see that HostedEngine is not booting, it states "cannot allocate kernel buffer", which I was able to find referenced in this forum post https://lists.ovirt.org/pipermail/users/2017-December/085631.html Apparently /var/run/ovirt-hosted-engine-ha/vm.conf got changed to boot HostedEngine with only 4MB of RAM, not 4GB. I am now trying to get HostedEngine to boot with 4GB. I am sure I will need some more assistance before I am through here! Thanks again and I will repost with update. Respectfully, Charles

I cannot get /var/run/ovirt-hosted-engine-ha/vm.conf to keep my edit of "memSize=4" to "memSize=4096" --> it keeps reverting. I have confirmed that HE is down --> suggestions? Thanks, Charles

OK, I think the inability to change /var/run/ovirt-hosted-engine-ha/vm.conf might have something to do with HE configuration having been moved to shared storage? (https://ovirt.org/develop/release-management/features/sla/hosted-engine-conf...) I am using NFS shares on a Synology cluster Any ideas how to fix memSize=4 and successfully boot HE? Thanks yet again, Charles

The local vm.conf is overriden by the configuration on the shared storage, but the bug with VM having 4 MB of RAM instead of 4 GB is already fixed: https://bugzilla.redhat.com/1524331 Try upgrading the ovirt-hosted-engine-ha and ovirt-hosted-engine-setup packages on the ovirt-node. Specifically, the fix is in ovirt-hosted-engine-ha version 2.1.9 # yum upgrade ovirt-hosted-engine-ha ovirt-hosted-engine-setup I'm not sure if the version 2.1.9 is available in ovirt-4.1 repositories. If not, you could install the ovirt-4.1-snapshot repositories on the ovirt-node: # yum install https://resources.ovirt.org/pub/yum-repo/ovirt-release41-snapshot.rpm The snapshot repository may be a little unstable, so I would recommend upgrading hosts to 4.2, when the engine is running. Andrej On 2 June 2018 at 20:06, <clam2718@gmail.com> wrote:
OK, I think the inability to change /var/run/ovirt-hosted-engine-ha/vm.conf might have something to do with HE configuration having been moved to shared storage? (https://ovirt.org/develop/release-management/features/ sla/hosted-engine-configuration-on-shared-storage/)
I am using NFS shares on a Synology cluster
Any ideas how to fix memSize=4 and successfully boot HE?
Thanks yet again, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/ message/V3KKIKWANCOZWPMXQAAZVZTOUBDGXM5J/

Thank you again so very much Andrej! I am going to try this right now... Respectfully, Charles

Hi all, we decided to issue an updated package to the oVirt 4.1 repository that should fix this for all users. We still consider 4.1 an EOL release, but we think this upgrade path should be fixed anyway. Metadata are refreshing as we speak. You can also download the package manually from the repositories: For example the URL for CentOS 7 based installation: http://resources.ovirt.org/pub/ovirt-4.1/rpm/el7/noarch/ovirt-hosted-engine-... Best regards -- Martin Sivak SLA / oVirt On Mon, Jun 4, 2018 at 2:45 PM, <clam2718@gmail.com> wrote:
Thank you again so very much Andrej! I am going to try this right now...
Respectfully, Charles _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RM45ON3RADJD5W...

Dear Andrej and Martin: I just wanted to follow-up and thank you both so very much. I was able to update to the latest 4.1.9 and that resolved my issues - engine started just fine, all vm's are up. My apologies for the delayed update, I had back surgery and was out a while. Again, I really appreciate everyone's responses - most helpful!!! Sincerely, Charles
participants (3)
-
Andrej Krejcir
-
clam2718@gmail.com
-
Martin Sivak