
Trying to get all the 3 node cluster back fully working... clearing out all the errors. I noted that the HCI wizard.. I thing SHOULD have deployed a hosted engine on the nodes, but this is not the case .. Only thor... the first node in cluster has hosted engine. I tried to redeploy this via the Cockpit wizard to add engine to host.. but I think this may not have been correct repair path. Now node in cluster shows all bricks... green (so it detects after reboot that host is back up and working.. but hosts lists it as "red triangle" with error "unregistered" I also just tried on third node to "edit" -> Host Engine -> and click drop box and choose "deploy" only log in event is "Host medusa configuration was updated by admin@internal-authz. 9/24/208:49:24 PM" but nothing changes. I then ran on odin (node with error) ovirt-hosted-engine-cleanup but no change ################ [root@odin ~]# ovirt-hosted-engine-cleanup This will de-configure the host to run ovirt-hosted-engine-setup from scratch. Caution, this operation should be used with care. Are you sure you want to proceed? [y/n] y -=== Destroy hosted-engine VM ===- You must run deploy first error: failed to get domain 'HostedEngine' -=== Stop HA services ===- -=== Shutdown sanlock ===- shutdown force 1 wait 0 shutdown done 0 -=== Disconnecting the hosted-engine storage domain ===- You must run deploy first -=== De-configure VDSM networks ===- ovirtmgmt ovirtmgmt A previously configured management bridge has been found on the system, this will try to de-configure it. Under certain circumstances you can loose network connection. Caution, this operation should be used with care. Are you sure you want to proceed? [y/n] y -=== Stop other services ===- Warning: Stopping libvirtd.service, but it can still be activated by: libvirtd-ro.socket libvirtd.socket libvirtd-admin.socket -=== De-configure external daemons ===- Removing database file /var/lib/vdsm/storage/managedvolume.db -=== Removing configuration files ===- ? /etc/init/libvirtd.conf already missing - removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml ? /etc/ovirt-hosted-engine/answers.conf already missing ? /etc/ovirt-hosted-engine/hosted-engine.conf already missing - removing /etc/vdsm/vdsm.conf - removing /etc/pki/vdsm/certs/cacert.pem - removing /etc/pki/vdsm/certs/vdsmcert.pem - removing /etc/pki/vdsm/keys/vdsmkey.pem - removing /etc/pki/vdsm/libvirt-migrate/ca-cert.pem - removing /etc/pki/vdsm/libvirt-migrate/server-cert.pem - removing /etc/pki/vdsm/libvirt-migrate/server-key.pem - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-key.pem - removing /etc/pki/vdsm/libvirt-vnc/ca-cert.pem - removing /etc/pki/vdsm/libvirt-vnc/server-cert.pem - removing /etc/pki/vdsm/libvirt-vnc/server-key.pem - removing /etc/pki/CA/cacert.pem - removing /etc/pki/libvirt/clientcert.pem - removing /etc/pki/libvirt/private/clientkey.pem ? /etc/pki/ovirt-vmconsole/*.pem already missing - removing /var/cache/libvirt/qemu ? /var/run/ovirt-hosted-engine-ha/* already missing - removing /var/tmp/localvm69i1jxnd - removing /var/tmp/localvmfyg59713 - removing /var/tmp/localvmmg5y6g52 -=== Removing IP Rules ===- [root@odin ~]# [root@odin ~]# ################ Ideas on how to repair engine install issues on nodes? -- penguinpages

After googling around ... it seemed that with the error "reinstall" noted for the server running engine.. it was just time to try that. I ran "ovirt-hosted-engine-cleanup" on all three nodes. Then made sure gluster bricks happy.. and reran setup ovirt-engine wizard from cockpit. After a few gluster option adjustments.... deployment completed. It came up and noted that I had two more nodes to setup that it detected via gluster.. I tried that wizard but it just hung... so I closed it. i then set CPU (one of my systems is older sandybridge so clsuter set to that) and other cluster parameters... Went to add node(s) and get error: "Error while executing action: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details." Tested SSH between all nodes and works without password. Told it in UI to "fetch" ssh key.. Enter host fingerprint or fetch manually from host Error in fetching fingerprint I googled around and noted stuff about ovn key issues. Any ideas on where this key issue is from? How to root cause why I can't re-add nodes.

No root cause but found work around. Even though hosts to be added are fully in /etc/hosts and able to resolve and ssh passwordless login is fine... [root@thor ~]# ping odin PING odin.penguinpages.local (172.16.100.102) 56(84) bytes of data. 64 bytes from odin.penguinpages.local (172.16.100.102): icmp_seq=1 ttl=64 time=0.085 ms ^C --- odin.penguinpages.local ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.085/0.085/0.085/0.000 ms [root@thor ~]# ping odin.penguinpages.local PING odin.penguinpages.local (172.16.100.102) 56(84) bytes of data. 64 bytes from odin.penguinpages.local (172.16.100.102): icmp_seq=1 ttl=64 time=0.083 ms ^C --- odin.penguinpages.local ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.083/0.083/0.083/0.000 ms change target system to IP address allows host to be added Hmm.. my guess is something is hard coded to IP and not right here.. I will post if I find more.

"Error while executing action: Cannot add Host. Connecting to host via SSH >has failed, verify that the host is reachable (IP address, routable address >etc.) You may refer to the engine.log file for further details."
Tested SSH between all nodes and works without password. Engine is not running in the host, it is running in a VM called HostedEngine and that VM has to be able to reach the host over ssh.
Did you do any ssh hardening ? Best Regards, Strahil Nikolov
participants (3)
-
Jeremey Wise
-
penguin pages
-
Strahil Nikolov