
According to another post in the mailing list, the Engine Hosts (that has ovirt-ha-agent/ovirt-ha-broker running) is checking http://{fqdn}/ovirt-engine/services/health As the IP is changed, I think you need to check the URL before and after thr mifgration. Best Regards, Strahil NikolovOn Jul 23, 2019 16:41, Derek Atkins <derek@ihtfp.com> wrote:
Hi,
If I understand it correctly, the HE Hosts try to ping (or SSH, or otherwise reach) the Engine host. If it reaches it, then it passes the liveness check. If it cannot reach it, then it fails. So to me this error means that there is some configuration, somewhere, that is trying to reach the engine on the old address (which fails when the engine has the new address).
I do not know where in the *host* configuration this data lives, so I cannot suggest where you need to change it.
Can 10.16.248.x reach 10.8.236.x and vice-versa?
Maybe multi-home the engine on both networks for now until you figure it out?
-derek
On Tue, July 23, 2019 9:13 am, carl langlois wrote:
Hi,
We have managed to stabilize the DNS udpate in out network. Now the current situation is. I have 3 hosts that can run the engine (hosted-engine). They were all in the 10.8.236.x. Now i have moved one of them in the 10.16.248.x.
If i boot the engine on one of the host that is in the 10.8.236.x the engine is going up with status "good". I can access the engine UI. I can see all my hosts even the one in the 10.16.248.x network.
But if i boot the engine on the hosted-engine host that was switch to the 10.16.248.x the engine is booting. I can ssh to it but the status is always " fail for liveliness check". The main difference is that when i boot on the host that is in the 10.16.248.x network the engine gets a address in the 248.x network.
On the engine i have this in the /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log 019-07-23 09:05:30|MFzehi|YYTDiS|jTq2w8|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 the engine.log seems okey.
So i need to understand what this " liveliness check" do(or try to do) so i can investigate why the engine status is not becoming good.
The initial deployment was done in the 10.8.236.x network. Maybe is as something to do with that.
Thanks & Regards
Carl
On Thu, Jul 18, 2019 at 8:53 AM Miguel Duarte de Mora Barroso < mdbarroso@redhat.com> wrote:
On Thu, Jul 18, 2019 at 2:50 PM Miguel Duarte de Mora Barroso <mdbarroso@redhat.com> wrote:
On Thu, Jul 18, 2019 at 1:57 PM carl langlois <crl.langlois@gmail.com>
Hi Miguel,
I have managed to change the config for the ovn-controler. with those commands ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=ssl:
10.16.248.74:6642
ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=10.16.248.65 and restating the services
Yes, that's what the script is supposed to do, check [0].
Not sure why running vdsm-tool didn't work for you.
But even with this i still have the "fail for liveliness check" when
starting the ovirt engine. But one thing i notice with our new network is
wrote: that the reverse DNS does not work(IP -> hostname). The forward is working fine. I am trying to see with our IT why it is not working.
Do you guys use OVN? If not, you could disable the provider, install the hosted-engine VM, then, if needed, re-add / re-activate it .
I'm assuming it fails for the same reason you've stated initially - i.e. ovn-controller is involved; if it is not, disregard this msg :)
[0] -
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup...
Regards. Carl

Hi At one point we did have issue with DNS resolution(mainly the reverse lookup). But that was fix. Yes we can ping both network and vice-versa. Not sure how to multi-home the engine. Will do some research on that. I did find something in the error_log on the engine. In the /etc/httpd/logs/error_log i always get this messages. [Tue Jul 23 11:21:52.430555 2019] [proxy:error] [pid 3189] AH00959: ap_proxy_connect_backend disabling worker for (127.0.0.1) for 5s [Tue Jul 23 11:21:52.430562 2019] [proxy_ajp:error] [pid 3189] [client 10.16.248.65:35154] AH00896: failed to make connection to backend: 127.0.0.1 The 10.16.248.65 is the new address of the host that was move to the new network. Thanks & Regards Carl On Tue, Jul 23, 2019 at 11:52 AM Strahil <hunter86_bg@yahoo.com> wrote:
According to another post in the mailing list, the Engine Hosts (that has ovirt-ha-agent/ovirt-ha-broker running) is checking http:// {fqdn}/ovirt-engine/services/health
As the IP is changed, I think you need to check the URL before and after thr mifgration.
Best Regards, Strahil NikolovOn Jul 23, 2019 16:41, Derek Atkins <derek@ihtfp.com> wrote:
Hi,
If I understand it correctly, the HE Hosts try to ping (or SSH, or otherwise reach) the Engine host. If it reaches it, then it passes the liveness check. If it cannot reach it, then it fails. So to me this
means that there is some configuration, somewhere, that is trying to reach the engine on the old address (which fails when the engine has the new address).
I do not know where in the *host* configuration this data lives, so I cannot suggest where you need to change it.
Can 10.16.248.x reach 10.8.236.x and vice-versa?
Maybe multi-home the engine on both networks for now until you figure it out?
-derek
On Tue, July 23, 2019 9:13 am, carl langlois wrote:
Hi,
We have managed to stabilize the DNS udpate in out network. Now the current situation is. I have 3 hosts that can run the engine (hosted-engine). They were all in the 10.8.236.x. Now i have moved one of them in the 10.16.248.x.
If i boot the engine on one of the host that is in the 10.8.236.x the engine is going up with status "good". I can access the engine UI. I can see all my hosts even the one in the 10.16.248.x network.
But if i boot the engine on the hosted-engine host that was switch to
error the
10.16.248.x the engine is booting. I can ssh to it but the status is always " fail for liveliness check". The main difference is that when i boot on the host that is in the 10.16.248.x network the engine gets a address in the 248.x network.
On the engine i have this in the /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log 019-07-23
09:05:30|MFzehi|YYTDiS|jTq2w8|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can
not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 the engine.log seems okey.
So i need to understand what this " liveliness check" do(or try to do) so i can investigate why the engine status is not becoming good.
The initial deployment was done in the 10.8.236.x network. Maybe is as something to do with that.
Thanks & Regards
Carl
On Thu, Jul 18, 2019 at 8:53 AM Miguel Duarte de Mora Barroso < mdbarroso@redhat.com> wrote:
On Thu, Jul 18, 2019 at 2:50 PM Miguel Duarte de Mora Barroso <mdbarroso@redhat.com> wrote:
On Thu, Jul 18, 2019 at 1:57 PM carl langlois <
Hi Miguel,
I have managed to change the config for the ovn-controler. with those commands ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=ssl:
10.16.248.74:6642
ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=10.16.248.65 and restating the services
Yes, that's what the script is supposed to do, check [0].
Not sure why running vdsm-tool didn't work for you.
But even with this i still have the "fail for liveliness check"
when starting the ovirt engine. But one thing i notice with our new network is
crl.langlois@gmail.com> wrote: that the reverse DNS does not work(IP -> hostname). The forward is working fine. I am trying to see with our IT why it is not working.
Do you guys use OVN? If not, you could disable the provider,
install
the hosted-engine VM, then, if needed, re-add / re-activate it .
I'm assuming it fails for the same reason you've stated initially - i.e. ovn-controller is involved; if it is not, disregard this msg :)
[0] -
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup...
Regards. Carl

If i try to access http://ovengine/ovirt-engine/services/health i always get "Service Unavailable" in the browser and each time i it reload in the browser i get in the error_log [proxy_ajp:error] [pid 1868] [client 10.8.1.76:63512] AH00896: failed to make connection to backend: 127.0.0.1 [Tue Jul 23 14:04:10.074023 2019] [proxy:error] [pid 1416] (111)Connection refused: AH00957: AJP: attempt to connect to 127.0.0.1:8702 (127.0.0.1) failed Thanks & Regards Carl On Tue, Jul 23, 2019 at 12:59 PM carl langlois <crl.langlois@gmail.com> wrote:
Hi At one point we did have issue with DNS resolution(mainly the reverse lookup). But that was fix. Yes we can ping both network and vice-versa.
Not sure how to multi-home the engine. Will do some research on that.
I did find something in the error_log on the engine.
In the /etc/httpd/logs/error_log i always get this messages.
[Tue Jul 23 11:21:52.430555 2019] [proxy:error] [pid 3189] AH00959: ap_proxy_connect_backend disabling worker for (127.0.0.1) for 5s [Tue Jul 23 11:21:52.430562 2019] [proxy_ajp:error] [pid 3189] [client 10.16.248.65:35154] AH00896: failed to make connection to backend: 127.0.0.1
The 10.16.248.65 is the new address of the host that was move to the new network.
Thanks & Regards Carl
On Tue, Jul 23, 2019 at 11:52 AM Strahil <hunter86_bg@yahoo.com> wrote:
According to another post in the mailing list, the Engine Hosts (that has ovirt-ha-agent/ovirt-ha-broker running) is checking http:// {fqdn}/ovirt-engine/services/health
As the IP is changed, I think you need to check the URL before and after thr mifgration.
Best Regards, Strahil NikolovOn Jul 23, 2019 16:41, Derek Atkins <derek@ihtfp.com> wrote:
Hi,
If I understand it correctly, the HE Hosts try to ping (or SSH, or otherwise reach) the Engine host. If it reaches it, then it passes the liveness check. If it cannot reach it, then it fails. So to me this
means that there is some configuration, somewhere, that is trying to reach the engine on the old address (which fails when the engine has the new address).
I do not know where in the *host* configuration this data lives, so I cannot suggest where you need to change it.
Can 10.16.248.x reach 10.8.236.x and vice-versa?
Maybe multi-home the engine on both networks for now until you figure it out?
-derek
On Tue, July 23, 2019 9:13 am, carl langlois wrote:
Hi,
We have managed to stabilize the DNS udpate in out network. Now the current situation is. I have 3 hosts that can run the engine (hosted-engine). They were all in the 10.8.236.x. Now i have moved one of them in the 10.16.248.x.
If i boot the engine on one of the host that is in the 10.8.236.x the engine is going up with status "good". I can access the engine UI. I can see all my hosts even the one in the 10.16.248.x network.
But if i boot the engine on the hosted-engine host that was switch to
error the
10.16.248.x the engine is booting. I can ssh to it but the status is always " fail for liveliness check". The main difference is that when i boot on the host that is in the 10.16.248.x network the engine gets a address in the 248.x network.
On the engine i have this in the /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log 019-07-23
09:05:30|MFzehi|YYTDiS|jTq2w8|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can
not sample data, oVirt Engine is not updating the statistics. Please check your oVirt Engine status.|9704 the engine.log seems okey.
So i need to understand what this " liveliness check" do(or try to do) so i can investigate why the engine status is not becoming good.
The initial deployment was done in the 10.8.236.x network. Maybe is as something to do with that.
Thanks & Regards
Carl
On Thu, Jul 18, 2019 at 8:53 AM Miguel Duarte de Mora Barroso < mdbarroso@redhat.com> wrote:
On Thu, Jul 18, 2019 at 2:50 PM Miguel Duarte de Mora Barroso <mdbarroso@redhat.com> wrote:
On Thu, Jul 18, 2019 at 1:57 PM carl langlois <
> > Hi Miguel, > > I have managed to change the config for the ovn-controler. > with those commands > ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=ssl: 10.16.248.74:6642 > ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=10.16.248.65 > and restating the services
Yes, that's what the script is supposed to do, check [0].
Not sure why running vdsm-tool didn't work for you.
> > But even with this i still have the "fail for liveliness check" when starting the ovirt engine. But one thing i notice with our new network is
crl.langlois@gmail.com> wrote: that the reverse DNS does not work(IP -> hostname). The forward is working fine. I am trying to see with our IT why it is not working.
Do you guys use OVN? If not, you could disable the provider,
install
the hosted-engine VM, then, if needed, re-add / re-activate it .
I'm assuming it fails for the same reason you've stated initially - i.e. ovn-controller is involved; if it is not, disregard this msg :)
[0] -
https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup...
> > Regards. > Carl

Hi, carl langlois <crl.langlois@gmail.com> writes:
If i try to access http://ovengine/ovirt-engine/services/health i always get "Service Unavailable" in the browser and each time i it reload in the browser i get in the error_log
[proxy_ajp:error] [pid 1868] [client 10.8.1.76:63512] AH00896: failed to make connection to backend: 127.0.0.1 [Tue Jul 23 14:04:10.074023 2019] [proxy:error] [pid 1416] (111)Connection refused: AH00957: AJP: attempt to connect to 127.0.0.1:8702 (127.0.0.1) failed
Sounds like a service isn't running on port 8702.
Thanks & Regards
Carl
-derek -- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant

A healthy engine should report:[root@ovirt1 ~]# curl --cacert CA https://engine.localdomain/ovirt-engine/services/health;echoDB Up!Welcome to Health Status! Of course you can use the '-k' switch to verify the situation. Best Regards,Strahil Nikolov В сряда, 24 юли 2019 г., 17:43:59 ч. Гринуич+3, Derek Atkins <derek@ihtfp.com> написа: Hi, carl langlois <crl.langlois@gmail.com> writes:
If i try to access http://ovengine/ovirt-engine/services/health i always get "Service Unavailable" in the browser and each time i it reload in the browser i get in the error_log
[proxy_ajp:error] [pid 1868] [client 10.8.1.76:63512] AH00896: failed to make connection to backend: 127.0.0.1 [Tue Jul 23 14:04:10.074023 2019] [proxy:error] [pid 1416] (111)Connection refused: AH00957: AJP: attempt to connect to 127.0.0.1:8702 (127.0.0.1) failed
Sounds like a service isn't running on port 8702.
Thanks & Regards
Carl
-derek -- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant

Strahil, not sure what to put for the --cacert. Yes Derek your are right at one point the port 8702 stop listening. tcp6 0 0 127.0.0.1:8702 :::* LISTEN 1607/ovirt-engine After some time the line above disappear. I am trying to figure why this port is being close after some time when the engine is running on the host on the 248.x network. On the 236.x network this port is kept alive all the time. If you have any hint on why this port is closing do not hesitate because i am starting to be out of ideas. :-) Thanks & Regards Carl On Wed, Jul 24, 2019 at 11:11 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
A healthy engine should report: [root@ovirt1 ~]# curl --cacert CA https://engine.localdomain/ovirt-engine/services/health;echo DB Up!Welcome to Health Status!
Of course you can use the '-k' switch to verify the situation.
Best Regards, Strahil Nikolov
В сряда, 24 юли 2019 г., 17:43:59 ч. Гринуич+3, Derek Atkins < derek@ihtfp.com> написа:
Hi,
carl langlois <crl.langlois@gmail.com> writes:
If i try to access http://ovengine/ovirt-engine/services/health i always get "Service Unavailable" in the browser and each time i it reload in the browser i get in the error_log
[proxy_ajp:error] [pid 1868] [client 10.8.1.76:63512] AH00896: failed to make connection to backend: 127.0.0.1 [Tue Jul 23 14:04:10.074023 2019] [proxy:error] [pid 1416] (111)Connection refused: AH00957: AJP: attempt to connect to 127.0.0.1:8702 (127.0.0.1) failed
Sounds like a service isn't running on port 8702.
Thanks & Regards
Carl
-derek
-- Derek Atkins 617-623-3745
derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant

Hi, carl langlois <crl.langlois@gmail.com> writes:
Strahil, not sure what to put for the --cacert.
Yes Derek your are right at one point the port 8702 stop listening.
tcp6 0 0 127.0.0.1:8702 :::* LISTEN 1607/ovirt-engine
Can you try running 'lsof' to figure out what application has that port open? Then you can figure out why it's dying.
After some time the line above disappear. I am trying to figure why this port is being close after some time when the engine is running on the host on the 248.x network. On the 236.x network this port is kept alive all the time. If you have any hint on why this port is closing do not hesitate because i am starting to be out of ideas. :-)
Thanks & Regards
Carl
-derek -- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant
participants (4)
-
carl langlois
-
Derek Atkins
-
Strahil
-
Strahil Nikolov