
Hi, I am having a couple of issues with fresh ovirt 4.3.7 HCI setup with 3 nodes ------------------------------------------------------------------------------------------------------------------------------------------------------------ 1.-vdsm is showing the following errors for HOST1 and HOST2 (HOST3 seems to be ok): ------------------------------------------------------------------------------------------------------------------------------------------------------------ service vdsmd status Redirecting to /bin/systemctl status vdsmd.service ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2020-02-11 18:50:28 PST; 28min ago Process: 25457 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 25549 (vdsmd) Tasks: 76 CGroup: /system.slice/vdsmd.service ├─25549 /usr/bin/python2 /usr/share/vdsm/vdsmd ├─25707 /usr/libexec/ioprocess --read-pipe-fd 52 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10 ├─26314 /usr/libexec/ioprocess --read-pipe-fd 92 --write-pipe-fd 86 --max-threads 10 --max-queued-requests 10 ├─26325 /usr/libexec/ioprocess --read-pipe-fd 96 --write-pipe-fd 93 --max-threads 10 --max-queued-requests 10 └─26333 /usr/libexec/ioprocess --read-pipe-fd 102 --write-pipe-fd 101 --max-threads 10 --max-queued-requests 10 Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_space Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_lo Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local systemd[1]: Started Virtual Desktop Server Manager. Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available. Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available, KSM stats will be missing. Feb 11 18:51:25 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:34 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:35 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:43 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:56:32 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN ping was deprecated in favor of ping2 and confirmConnectivity ------------------------------------------------------------------------------------------------------------------------------------------------------------ 2.-"gluster vol engine heal info" is showing the following and it never finishes healing ------------------------------------------------------------------------------------------------------------------------------------------------------------ [root@host2~]# gluster vol heal engine info Brick host1:/gluster_bricks/engine/engine /7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta /7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta Status: Connected Number of entries: 2 Brick host2:/gluster_bricks/engine/engine /7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta /7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta Status: Connected Number of entries: 2 Brick host3:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 ------------------------------------------------------------------------------------------------------------------------------------------------------------ 3.-Every hour I see the following entries/errors ------------------------------------------------------------------------------------------------------------------------------------------------------------ VDSM command SetVolumeDescriptionVDS failed: Could not acquire resource. Probably resource factory threw an exception.: () ------------------------------------------------------------------------------------------------------------------------------------------------------------ 4.- I am also seeing the following pertaining to the engine volume ------------------------------------------------------------------------------------------------------------------------------------------------------------ Failed to update OVF disks 86196e10-8103-4b00-bd3e-0f577a8bb5b2, OVF data isn't updated on those OVF stores (Data Center Default, Storage Domain hosted_storage). ------------------------------------------------------------------------------------------------------------------------------------------------------------ 5.-hosted-engine --vm-status ------------------------------------------------------------------------------------------------------------------------------------------------------------ --== Host host1 (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : host1 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : be592659 local_conf_timestamp : 480218 Host timestamp : 480217 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=480217 (Tue Feb 11 19:22:20 2020) host-id=1 score=3400 vm_conf_refresh_time=480218 (Tue Feb 11 19:22:21 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host host3 (id: 2) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : host3 Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1f4a8597 local_conf_timestamp : 436681 Host timestamp : 436681 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=436681 (Tue Feb 11 19:22:18 2020) host-id=2 score=3400 vm_conf_refresh_time=436681 (Tue Feb 11 19:22:18 2020) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host host2 (id: 3) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : host2 Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_missing", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : ca5c1918 local_conf_timestamp : 479644 Host timestamp : 479644 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=479644 (Tue Feb 11 19:22:21 2020) host-id=3 score=3400 vm_conf_refresh_time=479644 (Tue Feb 11 19:22:22 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False ------------------------------------------------------------------------------------------------------------------------------------------------------------ Any ideas on what might be going ?