Hi,
Our cluster was running fine, until we moved it to the new network.
Looking at the agent.log file, it stills pings the old gateway. Not sure if
this is the reason it's failing the liveliness check.
You can update the gateway address used in network test with something
like:
hosted-engine --set-shared-config gateway 192.168.1.1 --type=he_local
hosted-engine --set-shared-config gateway 192.168.1.1 --type=he_shared
The first command will update the value used on the current host; the
second one will update the master copy on the shared storage use by default
on the host you are going to deploy in the future.
But engine health check doesn't really depend on that: it's a kind of
application level ping sent over http to the engine.
You can manually check it with:
curl http://$(grep fqdn /etc/ovirt-hosted-engine/hosted-engine.conf | cut
-d= -f2)/ovirt-engine/services/health
I'd suggest to check name resolution.
Please help.
On Thu, Feb 21, 2019 at 4:39 PM Sakhi Hadebe <sakhi(a)sanren.ac.za> wrote:
> Hi,
>
> I need some help. We had a working ovirt cluster in the testing
> environment. We have just moved it to the production environment with the
> same network settings. The only thing we changed is the public VLAN. In
> production we're using a different subnet.
>
> The problem is we can't get the HostedEngine Up. It does come up but it
> fails the LIVELINESS CHECK and its health status is bad. We can't ping even
> ping it. It is on the same subnet as host machines: 192.168.x.x/24:
>
> *HostedEngine VM status:*
>
> [root@garlic qemu]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : True
> Hostname : goku.sanren.ac.za
> Host ID : 1
> Engine status : {"reason": "vm not running on
this
> host", "health": "bad", "vm": "down",
"detail": "unknown"}
> Score : 3400
> stopped : False
> Local maintenance : False
> crc32 : 57b2ece9
> local_conf_timestamp : 8463
> Host timestamp : 8463
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=8463 (Thu Feb 21 16:32:29 2019)
> host-id=1
> score=3400
> vm_conf_refresh_time=8463 (Thu Feb 21 16:32:29 2019)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineDown
> stopped=False
>
>
> --== Host 2 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : True
> Hostname : garlic.sanren.ac.za
> Host ID : 2
> Engine status : {"reason": "failed liveliness
> check", "health": "bad", "vm": "up",
"detail": "Powering down"}
> Score : 3400
> stopped : False
> Local maintenance : False
> crc32 : 71dc3daf
> local_conf_timestamp : 8540
> Host timestamp : 8540
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=8540 (Thu Feb 21 16:32:31 2019)
> host-id=2
> score=3400
> vm_conf_refresh_time=8540 (Thu Feb 21 16:32:31 2019)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineStop
> stopped=False
> timeout=Thu Jan 1 04:24:29 1970
>
>
> --== Host 3 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date : True
> Hostname : gohan.sanren.ac.za
> Host ID : 3
> Engine status : {"reason": "vm not running on
this
> host", "health": "bad", "vm": "down",
"detail": "unknown"}
> Score : 3400
> stopped : False
> Local maintenance : False
> crc32 : 49645620
> local_conf_timestamp : 5480
> Host timestamp : 5480
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=5480 (Thu Feb 21 16:32:22 2019)
> host-id=3
> score=3400
> vm_conf_refresh_time=5480 (Thu Feb 21 16:32:22 2019)
> conf_on_shared_storage=True
> maintenance=False
> state=EngineDown
> stopped=False
> You have new mail in /var/spool/mail/root
>
> The service are running but with errors:
> *vdsmd.service:*
> [root@garlic qemu]# systemctl status vdsmd
> ● vdsmd.service - Virtual Desktop Server Manager
> Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor
> preset: enabled)
> Active: active (running) since Thu 2019-02-21 16:12:12 SAST; 3min 31s
> ago
> Process: 40117 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh
> --post-stop (code=exited, status=0/SUCCESS)
> Process: 40121 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh
> --pre-start (code=exited, status=0/SUCCESS)
> Main PID: 40224 (vdsmd)
> Tasks: 47
> CGroup: /system.slice/vdsmd.service
> ├─40224 /usr/bin/python2 /usr/share/vdsm/vdsmd
> └─40346 /usr/libexec/ioprocess --read-pipe-fd 65
> --write-pipe-fd 64 --max-threads 10 --max-queued-requests 10
>
> Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm:
> Running nwfilter
> Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]:
> libvirt: Network Filter Driver error : Requested operation is not valid:
> nwfilter is in use
> Feb 21 16:12:11 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm:
> Running dummybr
> Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm:
> Running tune_system
> Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm:
> Running test_space
> Feb 21 16:12:12 garlic.sanren.ac.za vdsmd_init_common.sh[40121]: vdsm:
> Running test_lo
> Feb 21 16:12:12 garlic.sanren.ac.za systemd[1]: Started Virtual Desktop
> Server Manager.
> Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN MOM not available.
> Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN MOM not available,
> KSM stats will be missing.
> Feb 21 16:12:13 garlic.sanren.ac.za vdsm[40224]: WARN Not ready yet,
> ignoring event '|virt|VM_status|e2608f14-39fe-4ab6-b6be-9c60679e8c76'
> args={'e2608f14-39fe-4ab6-b6be-9c606..., 'type': '
> Hint: Some lines were ellipsized, use -l to show in full.
>
> *libvirtd.service:*
> [root@garlic qemu]# systemctl status libvirtd
> ● libvirtd.service - Virtualization daemon
> Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled;
> vendor preset: enabled)
> Drop-In: /etc/systemd/system/libvirtd.service.d
> └─unlimited-core.conf
> Active: active (running) since Thu 2019-02-21 16:06:50 SAST; 9min ago
> Docs: man:libvirtd(8)
>
https://libvirt.org
> Main PID: 38485 (libvirtd)
> Tasks: 17 (limit: 32768)
> CGroup: /system.slice/libvirtd.service
> └─38485 /usr/sbin/libvirtd --listen
>
> Feb 21 16:06:50 garlic.sanren.ac.za systemd[1]: Starting Virtualization
> daemon...
> Feb 21 16:06:50 garlic.sanren.ac.za systemd[1]: Started Virtualization
> daemon.
> Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21
> 14:07:43.033+0000: 38485: info : libvirt version: 3.9.0, package:
> 14.el7_5.8 (CentOS BuildSystem <
http://bugs.c...centos.org)
> Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21
> 14:07:43.033+0000: 38485: info : hostname: garlic.sanren.ac.za
> Feb 21 16:07:43 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21
> 14:07:43.033+0000: 38485: error : virNetSocketReadWire:1808 : End of file
> while reading data: Input/output error
> Feb 21 16:12:08 garlic.sanren.ac.za libvirtd[38485]: 2019-02-21
> 14:12:08.791+0000: 38485: error : virNetSocketReadWire:1808 : End of file
> while reading data: Input/output error
> Hint: Some lines were ellipsized, use -l to show in full.
>
> *ovirt-ha-broker & ovirt-ha-agent services:*
> [root@garlic qemu]# systemctl restart ovirt-ha-broker
> ^[[A[root@garlic qemu]# systemctl status ovirt-ha-broker
> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
> Communications Broker
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
> enabled; vendor preset: disabled)
> Active: active (running) since Thu 2019-02-21 16:18:43 SAST; 30s ago
> Main PID: 41493 (ovirt-ha-broker)
> Tasks: 13
> CGroup: /system.slice/ovirt-ha-broker.service
> ├─41493 /usr/bin/python
> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
> ├─41688 /bin/sh /usr/sbin/hosted-engine --check-liveliness
> └─41689 python -m ovirt_hosted_engine_setup.check_liveliness
>
> Feb 21 16:18:43 garlic.sanren.ac.za systemd[1]: Started oVirt Hosted
> Engine High Availability Communications Broker.
> Feb 21 16:18:43 garlic.sanren.ac.za systemd[1]: Starting oVirt Hosted
> Engine High Availability Communications Broker...
> [root@garlic qemu]# systemctl status ovirt-ha-agent
> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
> Monitoring Agent
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
> enabled; vendor preset: disabled)
> Active: active (running) since Thu 2019-02-21 16:18:53 SAST; 25s ago
> Main PID: 41581 (ovirt-ha-agent)
> Tasks: 2
> CGroup: /system.slice/ovirt-ha-agent.service
> └─41581 /usr/bin/python
> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>
> Feb 21 16:18:53 garlic.sanren.ac.za systemd[1]: Started oVirt Hosted
> Engine High Availability Monitoring Agent.
> Feb 21 16:18:53 garlic.sanren.ac.za systemd[1]: Starting oVirt Hosted
> Engine High Availability Monitoring Agent...
>
> Attached are logs of files that might contain some useful information to
> troubleshoot.
>
>
> Your assistance will be highly appreciated
>
> --
> Regards,
> Sakhi Hadebe
>
> Engineer: South African National Research Network (SANReN)Competency Area, Meraka,
CSIR
>
> Tel: +27 12 841 2308 <+27128414213>
> Fax: +27 12 841 4223 <+27128414223>
> Cell: +27 71 331 9622 <+27823034657>
> Email: sakhi(a)sanren.ac.za <shadebe(a)csir.co.za>
>
>
--
Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, Meraka, CSIR
Tel: +27 12 841 2308 <+27128414213>
Fax: +27 12 841 4223 <+27128414223>
Cell: +27 71 331 9622 <+27823034657>
Email: sakhi(a)sanren.ac.za <shadebe(a)csir.co.za>
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y67XZWKVPCB...