info about reinstall dead node

Hello, I have an host dead in a 4.1.9 environment (broken raid controller compromised both internal disks, don't ask me how it could happen...). I have replaced controller/disks and reinstalled ng-node on the same hw with the same parameters and same version (but ovirtmgmt bridge not present yet, only a bond0 that has to become backed up by the ovirtmgmt bridge when installed). In web admin gui it had been set as not responsive when failed. Now I cannot reinstall it as new. If I try to put it into maintenance it is correctly rebooted but then it remains non responsive. I think I have to "force remove" in some way the previous instance of this node, but the option is greyed out... How can I clean the node and then try to install as new again? Thanks, Gianluca

If the engine manages to 'reboot' the host as you say, it seems that the engine can communicate with the host. So it might be that you have a bit of a chicken and egg problem: the host is missing a required network and therefore goes back into non-responsive. If you have other (non mgmt) 'required' networks on the host, set the networks to non-required under Compute | Clusters | <cluster> | Logical Networks | Manager Networks temporarily. In Compute | Hosts | <host> | networks | setup networks can you attach 'ovirtmgmt' to the host in its current condition? Try to reboot the host. HTH On Tue, May 22, 2018 at 7:25 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello, I have an host dead in a 4.1.9 environment (broken raid controller compromised both internal disks, don't ask me how it could happen...). I have replaced controller/disks and reinstalled ng-node on the same hw with the same parameters and same version (but ovirtmgmt bridge not present yet, only a bond0 that has to become backed up by the ovirtmgmt bridge when installed). In web admin gui it had been set as not responsive when failed. Now I cannot reinstall it as new. If I try to put it into maintenance it is correctly rebooted but then it remains non responsive. I think I have to "force remove" in some way the previous instance of this node, but the option is greyed out... How can I clean the node and then try to install as new again?
Thanks, Gianluca
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org

On Wed, May 23, 2018 at 6:06 AM, Eitan Raviv <eraviv@redhat.com> wrote:
If the engine manages to 'reboot' the host as you say, it seems that the engine can communicate with the host.
So it might be that you have a bit of a chicken and egg problem: the host is missing a required network and therefore goes back into non-responsive. If you have other (non mgmt) 'required' networks on the host, set the networks to non-required under Compute | Clusters | <cluster> | Logical Networks | Manager Networks temporarily.
In Compute | Hosts | <host> | networks | setup networks can you attach 'ovirtmgmt' to the host in its current condition?
Try to reboot the host.
HTH
The configuration has only 2 logical networks: the ovirtmgmt and another one that is already marked as not required I just noticed that I configured wrong my first address of dns servers for this host. This could be a source of timeout/problems So I put the correct value for it. NOTE: from cockpit web interface it seems it didn't maintain the config, it continued to reverted with the wrong first entry of DNS servers; I had to connect to host via ssh and execute nmcli con mod "Bond connection 1" ipv4.dns "172.16.1.2,172.16.1.20" nmcli con down "Bond connection 1"; nmcli con up "Bond connection 1" and the resolv.conf configuration survived also a reboot of the node. Then at this point I was able to put host into maintenance without werrors, but not to activate it (acceptable because id doesn't have ovirtmgmt configured yet), or better a reinstall, because I got now an "invalid fingerprint" event for the node. Indeed the ng-node os was re-installed from scratch with the same parameters but I had not a backup of the ssh keys generated during the first install. But anyway at least now I could remove the host (the "Remove" button is not grayed out as before) and install as a new host with the same name/ip without problems. It remains the question if a node fails and then remains inaccessible, if I'm not wrong the "Remove" button is not a possible choice. I can try to reproduce and verify. Gianluca
participants (2)
-
Eitan Raviv
-
Gianluca Cecchi