On Wed, May 23, 2018 at 6:06 AM, Eitan Raviv <eraviv@redhat.com> wrote:
If the engine manages to 'reboot' the host as you say, it seems that
the engine can communicate with the host.

So it might be that you have a bit of a chicken and egg problem: the
host is missing a required network and therefore goes back into
non-responsive.
If you have other (non mgmt) 'required' networks on the host, set the
networks to non-required under Compute | Clusters | <cluster> |
Logical Networks | Manager Networks temporarily.

In Compute | Hosts | <host> | networks | setup networks
can you attach 'ovirtmgmt' to the host in its current condition?

Try to reboot the host.

HTH


The configuration has only 2 logical networks: the ovirtmgmt and another one that is already marked as not required
I just noticed that I configured wrong my first address of dns servers for this host. This could be a source of timeout/problems
So I put the correct value for it.

NOTE: from cockpit web interface it seems it didn't maintain the config, it continued to reverted with the wrong first entry of DNS servers; I had to connect to host via ssh and execute

nmcli con mod "Bond connection 1" ipv4.dns "172.16.1.2,172.16.1.20"
nmcli con down "Bond connection 1"; nmcli con up "Bond connection 1"

and the resolv.conf configuration survived also a reboot of the node.

Then at this point I was able to put host into maintenance without werrors, but not to activate it (acceptable because id doesn't have ovirtmgmt configured yet), or better a reinstall, because I got now an "invalid fingerprint" event for the node.
Indeed the ng-node os was re-installed from scratch with the same parameters but I had not a backup of the ssh keys generated during the first install.
But anyway at least now I could remove the host (the "Remove" button is not grayed out as before) and install as a new host with the same name/ip without problems.

It remains the question if a node fails and then remains inaccessible, if I'm not wrong the "Remove" button is not a possible choice.
I can try to reproduce and verify.

Gianluca