Hi,

 Our cluster was running fine, until we moved it to the new network. Looking at the agent.log file, it stills pings the old gateway. Not sure if this is the reason it's failing the liveliness check.

Please help.

On Thu, Feb 21, 2019 at 4:39 PM Sakhi Hadebe <sakhi@sanren.ac.za> wrote:
Hi,

I need some help. We had a working ovirt cluster in the testing environment. We have just moved it to the production environment with the same network settings. The only thing we changed is the public VLAN. In production we're using a different subnet.

The problem is we can't get the HostedEngine Up. It does come up but it fails the LIVELINESS CHECK and its health status is bad. We can't ping even ping it. It is on the same subnet as host machines: 192.168.x.x/24:

HostedEngine VM status:

[root@garlic qemu]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : goku.sanren.ac.za
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 57b2ece9
local_conf_timestamp               : 8463
Host timestamp                     : 8463
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=8463 (Thu Feb 21 16:32:29 2019)
        host-id=1
        score=3400
        vm_conf_refresh_time=8463 (Thu Feb 21 16:32:29 2019)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : garlic.sanren.ac.za
Host ID                            : 2
Engine status                      : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Powering down"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 71dc3daf
local_conf_timestamp               : 8540
Host timestamp                     : 8540
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=8540 (Thu Feb 21 16:32:31 2019)
        host-id=2
        score=3400
        vm_conf_refresh_time=8540 (Thu Feb 21 16:32:31 2019)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineStop
        stopped=False
        timeout=Thu Jan  1 04:24:29 1970


--== Host 3 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : gohan.sanren.ac.za
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 49645620
local_conf_timestamp               : 5480
Host timestamp                     : 5480
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=5480 (Thu Feb 21 16:32:22 2019)
        host-id=3
        score=3400
        vm_conf_refresh_time=5480 (Thu Feb 21 16:32:22 2019)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
You have new mail in /var/spool/mail/root

The service are running but with errors:
vdsmd.service:
[root@garlic qemu]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-02-21 16:12:12 SAST; 3min 31s ago
  Process: 40117 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 40121 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 40224 (vdsmd)
    Tasks: 47
   CGroup: /system.slice/vdsmd.service
           ├─40224 /usr/bin/python2 /usr/share/vdsm/vdsmd
           └─40346 /usr/libexec/ioprocess --read-pipe-fd 65 --write-pipe-fd 64 --max-threads 10 --max-queued-requests 10

Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running nwfilter
Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: libvirt: Network Filter Driver error : Requested operation is not valid: nwfilter is in use
Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running dummybr
Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running tune_system
Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running test_space
Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm: Running test_lo
Feb 21 16:12:12 garlic.sanren.ac.za systemd[1]: Started Virtual Desktop Server Manager.
Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN MOM not available.
Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN MOM not available, KSM stats will be missing.
Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN Not ready yet, ignoring event '|virt|VM_status|e2608f14-39fe-4ab6-b6be-9c60679e8c76' args={'e2608f14-39fe-4ab6-b6be-9c606..., 'type': '
Hint: Some lines were ellipsized, use -l to show in full.

libvirtd.service:
[root@garlic qemu]# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/libvirtd.service.d
           └─unlimited-core.conf
   Active: active (running) since Thu 2019-02-21 16:06:50 SAST; 9min ago
     Docs: man:libvirtd(8)
           https://libvirt.org
 Main PID: 38485 (libvirtd)
    Tasks: 17 (limit: 32768)
   CGroup: /system.slice/libvirtd.service
           └─38485 /usr/sbin/libvirtd --listen

Feb 21 16:06:50 garlic.sanren.ac.za systemd[1]: Starting Virtualization daemon...
Feb 21 16:06:50 garlic.sanren.ac.za systemd[1]: Started Virtualization daemon.
Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:07:43.033+0000: 38485: info : libvirt version: 3.9.0, package: 14.el7_5.8 (CentOS BuildSystem <http://bugs.c...centos.org)
Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:07:43.033+0000: 38485: info : hostname: garlic.sanren.ac.za
Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:07:43.033+0000: 38485: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Feb 21 16:12:08 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21 14:12:08.791+0000: 38485: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Hint: Some lines were ellipsized, use -l to show in full.

ovirt-ha-broker & ovirt-ha-agent services:
[root@garlic qemu]# systemctl restart ovirt-ha-broker
^[[A[root@garlic qemu]# systemctl status ovirt-ha-broker
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-02-21 16:18:43 SAST; 30s ago
 Main PID: 41493 (ovirt-ha-broker)
    Tasks: 13
   CGroup: /system.slice/ovirt-ha-broker.service
           ├─41493 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
           ├─41688 /bin/sh /usr/sbin/hosted-engine --check-liveliness
           └─41689 python -m ovirt_hosted_engine_setup.check_liveliness

Feb 21 16:18:43 garlic.sanren.ac.za systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
Feb 21 16:18:43 garlic.sanren.ac.za systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...
[root@garlic qemu]# systemctl status ovirt-ha-agent
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-02-21 16:18:53 SAST; 25s ago
 Main PID: 41581 (ovirt-ha-agent)
    Tasks: 2
   CGroup: /system.slice/ovirt-ha-agent.service
           └─41581 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent

Feb 21 16:18:53 garlic.sanren.ac.za systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
Feb 21 16:18:53 garlic.sanren.ac.za systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...

Attached are logs of files that might contain some useful information to troubleshoot.


Your assistance will be highly appreciated

--
Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, Meraka, CSIR
 
Tel:   +27 12 841 2308
Fax:   +27 12 841 4223
Cell:  +27 71 331 9622
Email: sakhi@sanren.ac.za


--
Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, Meraka, CSIR
 
Tel:   +27 12 841 2308
Fax:   +27 12 841 4223
Cell:  +27 71 331 9622
Email: sakhi@sanren.ac.za