
Good afternoon all, Ovirt version: 4.14.4.10.7-1.el8 Centos version: Linux version 4.18.0-365.el8.x86_64 ( mockbuild@kbuilder.bsys.centos.org) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-10) (GCC)) #1 SMP Thu Feb 10 16:11:23 UTC 2022 Background: We had a mother board fail in our storage device. I was able to migrate the storage domain to the backup device before it failed completely, and have been running on the backup device for several weeks while we purchased a replacement main storage. Today I shut everything down cleanly, replaced the main storage, and restarted the cluster. We did disconnect and reconnect the network on all of the devices as we shuffled equipment in the rack. One of the hosts in the cluster refuses to come back up.I am able to connect to the host via putty. Ovirt gui reporting: Setting Host ovirt-host-03.maxisinc.net to Non-Operational mode. Completed: Jun 11, 2022, 4:59:57 PM Activating Host ovirt-host-03.maxisinc.net Completed: Jun 11, 2022, 4:59:57 PM Invoking Activate Host ovirt-host-03.maxisinc.net Completed: Jun 11, 2022, 4:57:40 PM Installing Host ovirt-host-03.maxisinc.net log from host is 5:09 PM GetManagedObjects() failed: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. pulseaudio 4:55 PM bondscan-DGwC1l: option lacp_active: mode dependency failed, not supported in mode balance-alb(6) kernel 4:55 PM bondscan-DGwC1l: option arp_all_targets: invalid value (2) kernel 4:55 PM bondscan-DGwC1l: option fail_over_mac: invalid value (3) kernel 4:55 PM bondscan-DGwC1l: option primary_reselect: invalid value (3) kernel 4:55 PM bondscan-DGwC1l: option ad_select: invalid value (3) kernel 4:55 PM

Update: I identified that the ovirtmgt network configuration was out of whack on the non-functioning host. Problem 1: The GUI will not allow me to set the network back to its proper state. Problem 2: I accidentally changed the IP address of ovirtmgt network the host that was working, so both hosts are now down. Even though I set the IP address back to its original value, the cluster manager is unable to see the host that was working. Until I rebooted, the host could ping out, but didn't respond to pings. I am able to see the host console. At this point I have no functioning hosts, so there is no access to the storage domain. I suspect the solution to either issue will be the solution to both issues. Thank you. On Sat, Jun 11, 2022 at 5:42 PM David Johnson <djohnson@maxistechnology.com> wrote:
Good afternoon all,
Ovirt version: 4.14.4.10.7-1.el8 Centos version: Linux version 4.18.0-365.el8.x86_64 ( mockbuild@kbuilder.bsys.centos.org) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-10) (GCC)) #1 SMP Thu Feb 10 16:11:23 UTC 2022
Background:
We had a mother board fail in our storage device. I was able to migrate the storage domain to the backup device before it failed completely, and have been running on the backup device for several weeks while we purchased a replacement main storage.
Today I shut everything down cleanly, replaced the main storage, and restarted the cluster. We did disconnect and reconnect the network on all of the devices as we shuffled equipment in the rack.
One of the hosts in the cluster refuses to come back up.I am able to connect to the host via putty.
Ovirt gui reporting:
Setting Host ovirt-host-03.maxisinc.net to Non-Operational mode. Completed: Jun 11, 2022, 4:59:57 PM Activating Host ovirt-host-03.maxisinc.net Completed: Jun 11, 2022, 4:59:57 PM Invoking Activate Host ovirt-host-03.maxisinc.net Completed: Jun 11, 2022, 4:57:40 PM Installing Host ovirt-host-03.maxisinc.net
log from host is
5:09 PM GetManagedObjects() failed: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. pulseaudio 4:55 PM bondscan-DGwC1l: option lacp_active: mode dependency failed, not supported in mode balance-alb(6) kernel 4:55 PM bondscan-DGwC1l: option arp_all_targets: invalid value (2) kernel 4:55 PM bondscan-DGwC1l: option fail_over_mac: invalid value (3) kernel 4:55 PM bondscan-DGwC1l: option primary_reselect: invalid value (3) kernel 4:55 PM bondscan-DGwC1l: option ad_select: invalid value (3) kernel 4:55 PM

Further information: The hosts can ping each other and the router appliance that I am VPN'ed in through. ### Other host [root@ovirt-host-04 ~]# ping 192.168.2.18 PING 192.168.2.18 (192.168.2.18) 56(84) bytes of data. 64 bytes from 192.168.2.18: icmp_seq=1 ttl=64 time=0.283 ms 64 bytes from 192.168.2.18: icmp_seq=2 ttl=64 time=0.234 ms 64 bytes from 192.168.2.18: icmp_seq=3 ttl=64 time=0.240 ms 64 bytes from 192.168.2.18: icmp_seq=4 ttl=64 time=0.246 ms 64 bytes from 192.168.2.18: icmp_seq=5 ttl=64 time=0.251 ms ^C --- 192.168.2.18 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4110ms rtt min/avg/max/mdev = 0.234/0.250/0.283/0.026 ms ### router (which I am VPN'ed through) [root@ovirt-host-04 ~]# ping 192.168.2.1 PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data. 64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=0.225 ms 64 bytes from 192.168.2.1: icmp_seq=2 ttl=64 time=0.188 ms 64 bytes from 192.168.2.1: icmp_seq=3 ttl=64 time=0.200 ms 64 bytes from 192.168.2.1: icmp_seq=4 ttl=64 time=0.187 ms ### to ovirt controller [root@ovirt-host-04 ~]# ping 192.168.2.10 PING 192.168.2.10 (192.168.2.10) 56(84) bytes of data. ^C --- 192.168.2.10 ping statistics --- 15 packets transmitted, 0 received, 100% packet loss, time 14359ms ### from ovirt controller to the original host that started this thread [root@ovirt1 ~]# ping 192.168.2.18 PING 192.168.2.18 (192.168.2.18) 56(84) bytes of data. 64 bytes from 192.168.2.18: icmp_seq=1 ttl=64 time=0.273 ms 64 bytes from 192.168.2.18: icmp_seq=2 ttl=64 time=0.240 ms 64 bytes from 192.168.2.18: icmp_seq=3 ttl=64 time=0.241 ms 64 bytes from 192.168.2.18: icmp_seq=4 ttl=64 time=0.248 ms ^C --- 192.168.2.18 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3099ms rtt min/avg/max/mdev = 0.240/0.250/0.273/0.020 ms ### from ovirt controller to the host that was working but now is not [root@ovirt1 ~]# ping 192.168.2.19 PING 192.168.2.19 (192.168.2.19) 56(84) bytes of data. ^C --- 192.168.2.19 ping statistics --- 14 packets transmitted, 0 received, 100% packet loss, time 13329ms [root@ovirt1 ~]#

Ok, I know "how" to fix it, but not how to fix it. The network definitions on the engine and the hosts are out of sync. Somehow the engine thinks the ovirtmgt network is attached to the storage network, which is a different physical network with a different IP address range. The engine is physically unable to connect to the storage network. I have been able to "fix" it on the engines, regain communications, and get things started. However, the engine keeps updating the hosts and blowing them off the network. When I attempt to fix the work mappings via the engine, it refuses to save the mappings because it can't communicate with the host on its old mappings. The same thing happens when I try to reinstall. I can't imagine I'm the only one who's run into this. Please advise.

I'd delete the host (in engine) and start fresh with a working network configuration. I'd even consider reinstalling the host completely and start off with the proper network configuration from that side as well :) Greetings Klaas On 6/12/22 07:10, David Johnson wrote:
Ok, I know "how" to fix it, but not how to fix it.
The network definitions on the engine and the hosts are out of sync. Somehow the engine thinks the ovirtmgt network is attached to the storage network, which is a different physical network with a different IP address range. The engine is physically unable to connect to the storage network.
I have been able to "fix" it on the engines, regain communications, and get things started. However, the engine keeps updating the hosts and blowing them off the network. When I attempt to fix the work mappings via the engine, it refuses to save the mappings because it can't communicate with the host on its old mappings. The same thing happens when I try to reinstall.
I can't imagine I'm the only one who's run into this. Please advise.
_______________________________________________ Users mailing list --users@ovirt.org To unsubscribe send an email tousers-leave@ovirt.org Privacy Statement:https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct:https://www.ovirt.org/community/about/community-guidelines/ List Archives:https://lists.ovirt.org/archives/list/users@ovirt.org/message/2MU2STJSCMYI6R...

Thank you Klass, I tried that, and things just got stickier. The hosts could not be removed, and the re-installs failed. I've backed up the engine database and am trying to move forward. I am now in the middle of trying to re-install the ovirt 4.4 engine on a clean centos 8 (no way back at this point), and stuck because the engine-setup expects an init.d script, which postgres12 install on centos does not create. See other thread in this list. On Sun, Jun 12, 2022 at 12:38 PM David Johnson <djohnson@maxistechnology.com> wrote:
Greetings everyone,
My engine is well and truly hosed now. We can close this line of inquiry for the time being.
I am attempting to reload ovirt from scratch then restore to an older copy of my DB.
I'll follow up with my problems in another email chain.
participants (2)
-
David Johnson
-
Klaas Demter