Hi,
I am having a couple of issues with fresh ovirt 4.3.7 HCI setup with 3 nodes
------------------------------------------------------------------------------------------------------------------------------------------------------------
1.-vdsm is showing the following errors for HOST1 and HOST2 (HOST3 seems to be ok):
------------------------------------------------------------------------------------------------------------------------------------------------------------
service vdsmd status
Redirecting to /bin/systemctl status vdsmd.service
● vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset:
enabled)
Active: active (running) since Tue 2020-02-11 18:50:28 PST; 28min ago
Process: 25457 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start
(code=exited, status=0/SUCCESS)
Main PID: 25549 (vdsmd)
Tasks: 76
CGroup: /system.slice/vdsmd.service
├─25549 /usr/bin/python2 /usr/share/vdsm/vdsmd
├─25707 /usr/libexec/ioprocess --read-pipe-fd 52 --write-pipe-fd 51
--max-threads 10 --max-queued-requests 10
├─26314 /usr/libexec/ioprocess --read-pipe-fd 92 --write-pipe-fd 86
--max-threads 10 --max-queued-requests 10
├─26325 /usr/libexec/ioprocess --read-pipe-fd 96 --write-pipe-fd 93
--max-threads 10 --max-queued-requests 10
└─26333 /usr/libexec/ioprocess --read-pipe-fd 102 --write-pipe-fd 101
--max-threads 10 --max-queued-requests 10
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm:
Running test_space
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm:
Running test_lo
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local systemd[1]: Started Virtual Desktop
Server Manager.
Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available.
Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available, KSM
stats will be missing.
Feb 11 18:51:25 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve
Hosted Engine HA score
Traceback (most recent call
last):
File
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo...
Feb 11 18:51:34 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve
Hosted Engine HA score
Traceback (most recent call
last):
File
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo...
Feb 11 18:51:35 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve
Hosted Engine HA score
Traceback (most recent call
last):
File
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo...
Feb 11 18:51:43 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve
Hosted Engine HA score
Traceback (most recent call
last):
File
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo...
Feb 11 18:56:32 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN ping was deprecated in
favor of ping2 and confirmConnectivity
------------------------------------------------------------------------------------------------------------------------------------------------------------
2.-"gluster vol engine heal info" is showing the following and it never finishes
healing
------------------------------------------------------------------------------------------------------------------------------------------------------------
[root@host2~]# gluster vol heal engine info
Brick host1:/gluster_bricks/engine/engine
/7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
Status: Connected
Number of entries: 2
Brick host2:/gluster_bricks/engine/engine
/7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
Status: Connected
Number of entries: 2
Brick host3:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
------------------------------------------------------------------------------------------------------------------------------------------------------------
3.-Every hour I see the following entries/errors
------------------------------------------------------------------------------------------------------------------------------------------------------------
VDSM command SetVolumeDescriptionVDS failed: Could not acquire resource. Probably resource
factory threw an exception.: ()
------------------------------------------------------------------------------------------------------------------------------------------------------------
4.- I am also seeing the following pertaining to the engine volume
------------------------------------------------------------------------------------------------------------------------------------------------------------
Failed to update OVF disks 86196e10-8103-4b00-bd3e-0f577a8bb5b2, OVF data isn't
updated on those OVF stores (Data Center Default, Storage Domain hosted_storage).
------------------------------------------------------------------------------------------------------------------------------------------------------------
5.-hosted-engine --vm-status
------------------------------------------------------------------------------------------------------------------------------------------------------------
--== Host host1 (id: 1) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : host1
Host ID : 1
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down",
"detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : be592659
local_conf_timestamp : 480218
Host timestamp : 480217
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=480217 (Tue Feb 11 19:22:20 2020)
host-id=1
score=3400
vm_conf_refresh_time=480218 (Tue Feb 11 19:22:21 2020)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host host3 (id: 2) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : host3
Host ID : 2
Engine status : {"health": "good",
"vm": "up", "detail": "Up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 1f4a8597
local_conf_timestamp : 436681
Host timestamp : 436681
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=436681 (Tue Feb 11 19:22:18 2020)
host-id=2
score=3400
vm_conf_refresh_time=436681 (Tue Feb 11 19:22:18 2020)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
--== Host host2 (id: 3) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : host2
Host ID : 3
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down_missing",
"detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : ca5c1918
local_conf_timestamp : 479644
Host timestamp : 479644
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=479644 (Tue Feb 11 19:22:21 2020)
host-id=3
score=3400
vm_conf_refresh_time=479644 (Tue Feb 11 19:22:22 2020)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
------------------------------------------------------------------------------------------------------------------------------------------------------------
Any ideas on what might be going ?