HostedEngine Unreachable

Hi, Our cluster was running fine, until we moved it to the new network. Looking at the agent.log file, it stills pings the old gateway. Not sure if this is the reason it's failing the liveliness check. Please help. On Thu, Feb 21, 2019 at 4:39 PM Sakhi Hadebe <sakhi@sanren.ac.za> wrote:
Hi,
I need some help. We had a working ovirt cluster in the testing environment. We have just moved it to the production environment with the same network settings. The only thing we changed is the public VLAN. In production we're using a different subnet.
The problem is we can't get the HostedEngine Up. It does come up but it fails the LIVELINESS CHECK and its health status is bad. We can't ping even ping it. It is on the same subnet as host machines: 192.168.x.x/24:
*HostedEngine VM status:*
[root@garlic qemu]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : goku.sanren.ac.za Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 57b2ece9 local_conf_timestamp : 8463 Host timestamp : 8463 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8463 (Thu Feb 21 16:32:29 2019) host-id=1 score=3400 vm_conf_refresh_time=8463 (Thu Feb 21 16:32:29 2019) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : garlic.sanren.ac.za Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Powering down"} Score : 3400 stopped : False Local maintenance : False crc32 : 71dc3daf local_conf_timestamp : 8540 Host timestamp : 8540 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8540 (Thu Feb 21 16:32:31 2019) host-id=2 score=3400 vm_conf_refresh_time=8540 (Thu Feb 21 16:32:31 2019) conf_on_shared_storage=True maintenance=False state=EngineStop stopped=False timeout=Thu Jan 1 04:24:29 1970
--== Host 3 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : gohan.sanren.ac.za Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 49645620 local_conf_timestamp : 5480 Host timestamp : 5480 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=5480 (Thu Feb 21 16:32:22 2019) host-id=3 score=3400 vm_conf_refresh_time=5480 (Thu Feb 21 16:32:22 2019) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False You have new mail in /var/spool/mail/root
The service are running but with errors: *vdsmd.service:* [root@garlic qemu]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2019-02-21 16:12:12 SAST; 3min 31s ago Process: 40117 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 40121 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 40224 (vdsmd) Tasks: 47 CGroup: /system.slice/vdsmd.service ├─40224 /usr/bin/python2 /usr/share/vdsm/vdsmd └─40346 /usr/libexec/ioprocess --read-pipe-fd 65 --write-pipe-fd 64 --max-threads 10 --max-queued-requests 10
Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running nwfilter Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: libvirt: Network Filter Driver error : Requested operation is not valid: nwfilter is in use Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running dummybr Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running tune_system Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running test_space Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running test_lo Feb 21 16:12:12 garlic.sanren.ac.za systemd[1]: Started Virtual Desktop Server Manager. Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN MOM not available. Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN MOM not available, KSM stats will be missing. Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN Not ready yet, ignoring event '|virt|VM_status|e2608f14-39fe-4ab6-b6be-9c60679e8c76' args={'e2608f14-39fe-4ab6-b6be-9c606..., 'type': ' Hint: Some lines were ellipsized, use -l to show in full.
*libvirtd.service:* [root@garlic qemu]# systemctl status libvirtd ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/libvirtd.service.d └─unlimited-core.conf Active: active (running) since Thu 2019-02-21 16:06:50 SAST; 9min ago Docs: man:libvirtd(8) https://libvirt.org Main PID: 38485 (libvirtd) Tasks: 17 (limit: 32768) CGroup: /system.slice/libvirtd.service └─38485 /usr/sbin/libvirtd --listen
Feb 21 16:06:50 garlic.sanren.ac.za systemd[1]: Starting Virtualization daemon... Feb 21 16:06:50 garlic.sanren.ac.za systemd[1]: Started Virtualization daemon. Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:07:43.033+0000: 38485: info : libvirt version: 3.9.0, package: 14.el7_5.8 (CentOS BuildSystem <http://bugs.c...centos.org) Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:07:43.033+0000: 38485: info : hostname: garlic.sanren.ac.za Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:07:43.033+0000: 38485: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Feb 21 16:12:08 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:12:08.791+0000: 38485: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Hint: Some lines were ellipsized, use -l to show in full.
*ovirt-ha-broker & ovirt-ha-agent services:* [root@garlic qemu]# systemctl restart ovirt-ha-broker ^[[A[root@garlic qemu]# systemctl status ovirt-ha-broker ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2019-02-21 16:18:43 SAST; 30s ago Main PID: 41493 (ovirt-ha-broker) Tasks: 13 CGroup: /system.slice/ovirt-ha-broker.service ├─41493 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker ├─41688 /bin/sh /usr/sbin/hosted-engine --check-liveliness └─41689 python -m ovirt_hosted_engine_setup.check_liveliness
Feb 21 16:18:43 garlic.sanren.ac.za systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Feb 21 16:18:43 garlic.sanren.ac.za systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... [root@garlic qemu]# systemctl status ovirt-ha-agent ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2019-02-21 16:18:53 SAST; 25s ago Main PID: 41581 (ovirt-ha-agent) Tasks: 2 CGroup: /system.slice/ovirt-ha-agent.service └─41581 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Feb 21 16:18:53 garlic.sanren.ac.za systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Feb 21 16:18:53 garlic.sanren.ac.za systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...
Attached are logs of files that might contain some useful information to troubleshoot.
Your assistance will be highly appreciated
-- Regards, Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, Meraka, CSIR
Tel: +27 12 841 2308 <+27128414213> Fax: +27 12 841 4223 <+27128414223> Cell: +27 71 331 9622 <+27823034657> Email: sakhi@sanren.ac.za <shadebe@csir.co.za>
-- Regards, Sakhi Hadebe Engineer: South African National Research Network (SANReN)Competency Area, Meraka, CSIR Tel: +27 12 841 2308 <+27128414213> Fax: +27 12 841 4223 <+27128414223> Cell: +27 71 331 9622 <+27823034657> Email: sakhi@sanren.ac.za <shadebe@csir.co.za>

On Mon, Feb 25, 2019 at 8:07 AM Sakhi Hadebe <sakhi@sanren.ac.za> wrote:
Hi,
Our cluster was running fine, until we moved it to the new network. Looking at the agent.log file, it stills pings the old gateway. Not sure if this is the reason it's failing the liveliness check.
You can update the gateway address used in network test with something like: hosted-engine --set-shared-config gateway 192.168.1.1 --type=he_local hosted-engine --set-shared-config gateway 192.168.1.1 --type=he_shared The first command will update the value used on the current host; the second one will update the master copy on the shared storage use by default on the host you are going to deploy in the future. But engine health check doesn't really depend on that: it's a kind of application level ping sent over http to the engine. You can manually check it with: curl http://$(grep fqdn /etc/ovirt-hosted-engine/hosted-engine.conf | cut -d= -f2)/ovirt-engine/services/health I'd suggest to check name resolution.
Please help.
On Thu, Feb 21, 2019 at 4:39 PM Sakhi Hadebe <sakhi@sanren.ac.za> wrote:
Hi,
I need some help. We had a working ovirt cluster in the testing environment. We have just moved it to the production environment with the same network settings. The only thing we changed is the public VLAN. In production we're using a different subnet.
The problem is we can't get the HostedEngine Up. It does come up but it fails the LIVELINESS CHECK and its health status is bad. We can't ping even ping it. It is on the same subnet as host machines: 192.168.x.x/24:
*HostedEngine VM status:*
[root@garlic qemu]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : goku.sanren.ac.za Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 57b2ece9 local_conf_timestamp : 8463 Host timestamp : 8463 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8463 (Thu Feb 21 16:32:29 2019) host-id=1 score=3400 vm_conf_refresh_time=8463 (Thu Feb 21 16:32:29 2019) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : garlic.sanren.ac.za Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Powering down"} Score : 3400 stopped : False Local maintenance : False crc32 : 71dc3daf local_conf_timestamp : 8540 Host timestamp : 8540 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=8540 (Thu Feb 21 16:32:31 2019) host-id=2 score=3400 vm_conf_refresh_time=8540 (Thu Feb 21 16:32:31 2019) conf_on_shared_storage=True maintenance=False state=EngineStop stopped=False timeout=Thu Jan 1 04:24:29 1970
--== Host 3 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : gohan.sanren.ac.za Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 49645620 local_conf_timestamp : 5480 Host timestamp : 5480 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=5480 (Thu Feb 21 16:32:22 2019) host-id=3 score=3400 vm_conf_refresh_time=5480 (Thu Feb 21 16:32:22 2019) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False You have new mail in /var/spool/mail/root
The service are running but with errors: *vdsmd.service:* [root@garlic qemu]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2019-02-21 16:12:12 SAST; 3min 31s ago Process: 40117 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 40121 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 40224 (vdsmd) Tasks: 47 CGroup: /system.slice/vdsmd.service ├─40224 /usr/bin/python2 /usr/share/vdsm/vdsmd └─40346 /usr/libexec/ioprocess --read-pipe-fd 65 --write-pipe-fd 64 --max-threads 10 --max-queued-requests 10
Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running nwfilter Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: libvirt: Network Filter Driver error : Requested operation is not valid: nwfilter is in use Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running dummybr Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running tune_system Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running test_space Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running test_lo Feb 21 16:12:12 garlic.sanren.ac.za systemd[1]: Started Virtual Desktop Server Manager. Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN MOM not available. Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN MOM not available, KSM stats will be missing. Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN Not ready yet, ignoring event '|virt|VM_status|e2608f14-39fe-4ab6-b6be-9c60679e8c76' args={'e2608f14-39fe-4ab6-b6be-9c606..., 'type': ' Hint: Some lines were ellipsized, use -l to show in full.
*libvirtd.service:* [root@garlic qemu]# systemctl status libvirtd ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/libvirtd.service.d └─unlimited-core.conf Active: active (running) since Thu 2019-02-21 16:06:50 SAST; 9min ago Docs: man:libvirtd(8) https://libvirt.org Main PID: 38485 (libvirtd) Tasks: 17 (limit: 32768) CGroup: /system.slice/libvirtd.service └─38485 /usr/sbin/libvirtd --listen
Feb 21 16:06:50 garlic.sanren.ac.za systemd[1]: Starting Virtualization daemon... Feb 21 16:06:50 garlic.sanren.ac.za systemd[1]: Started Virtualization daemon. Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:07:43.033+0000: 38485: info : libvirt version: 3.9.0, package: 14.el7_5.8 (CentOS BuildSystem <http://bugs.c...centos.org) Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:07:43.033+0000: 38485: info : hostname: garlic.sanren.ac.za Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:07:43.033+0000: 38485: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Feb 21 16:12:08 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:12:08.791+0000: 38485: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Hint: Some lines were ellipsized, use -l to show in full.
*ovirt-ha-broker & ovirt-ha-agent services:* [root@garlic qemu]# systemctl restart ovirt-ha-broker ^[[A[root@garlic qemu]# systemctl status ovirt-ha-broker ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2019-02-21 16:18:43 SAST; 30s ago Main PID: 41493 (ovirt-ha-broker) Tasks: 13 CGroup: /system.slice/ovirt-ha-broker.service ├─41493 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker ├─41688 /bin/sh /usr/sbin/hosted-engine --check-liveliness └─41689 python -m ovirt_hosted_engine_setup.check_liveliness
Feb 21 16:18:43 garlic.sanren.ac.za systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Feb 21 16:18:43 garlic.sanren.ac.za systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... [root@garlic qemu]# systemctl status ovirt-ha-agent ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2019-02-21 16:18:53 SAST; 25s ago Main PID: 41581 (ovirt-ha-agent) Tasks: 2 CGroup: /system.slice/ovirt-ha-agent.service └─41581 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Feb 21 16:18:53 garlic.sanren.ac.za systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Feb 21 16:18:53 garlic.sanren.ac.za systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...
Attached are logs of files that might contain some useful information to troubleshoot.
Your assistance will be highly appreciated
-- Regards, Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, Meraka, CSIR
Tel: +27 12 841 2308 <+27128414213> Fax: +27 12 841 4223 <+27128414223> Cell: +27 71 331 9622 <+27823034657> Email: sakhi@sanren.ac.za <shadebe@csir.co.za>
-- Regards, Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, Meraka, CSIR
Tel: +27 12 841 2308 <+27128414213> Fax: +27 12 841 4223 <+27128414223> Cell: +27 71 331 9622 <+27823034657> Email: sakhi@sanren.ac.za <shadebe@csir.co.za>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y67XZWKVPCBKCH...

Hi Simone, Thank you for your response. Executing the command below, gives this: [root@ovirt-host]# curl http://$(grep fqdn /etc/ovirt-hosted-engine/hosted-engine.conf | cut -d= -f2)/ovirt-engine/services/health curl: (7) Failed to connect to engine.sanren.ac.za:80; No route to host I tried to enable http traffic on the ovirt-host, but the error persists

On Mon, Feb 25, 2019 at 10:27 AM Sakhi Hadebe <sakhi@sanren.ac.za> wrote:
Hi Simone,
Thank you for your response. Executing the command below, gives this:
[root@ovirt-host]# curl http://$(grep fqdn /etc/ovirt-hosted-engine/hosted-engine.conf | cut -d= -f2)/ovirt-engine/services/health curl: (7) Failed to connect to engine.sanren.ac.za:80; No route to host
I tried to enable http traffic on the ovirt-host, but the error persists
Run, on your hosts, getent ahosts engine.sanren.ac.za and ensure that it got resolved as you wish. Fix name resolution and routing on your hosts in a coherent manner.
participants (2)
-
Sakhi Hadebe
-
Simone Tiraboschi