oVirt 4.3.7 and Gluster 6.6 multiple issues

12 Feb 2020

      Hi, 
I am having a couple of issues with fresh ovirt 4.3.7  HCI setup with 3 nodes

------------------------------------------------------------------------------------------------------------------------------------------------------------
1.-vdsm is showing the following errors for HOST1 and HOST2 (HOST3 seems to be ok):
------------------------------------------------------------------------------------------------------------------------------------------------------------
     service vdsmd status
Redirecting to /bin/systemctl status vdsmd.service
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-02-11 18:50:28 PST; 28min ago
  Process: 25457 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 25549 (vdsmd)
    Tasks: 76
   CGroup: /system.slice/vdsmd.service
           ├─25549 /usr/bin/python2 /usr/share/vdsm/vdsmd
           ├─25707 /usr/libexec/ioprocess --read-pipe-fd 52 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10
           ├─26314 /usr/libexec/ioprocess --read-pipe-fd 92 --write-pipe-fd 86 --max-threads 10 --max-queued-requests 10
           ├─26325 /usr/libexec/ioprocess --read-pipe-fd 96 --write-pipe-fd 93 --max-threads 10 --max-queued-requests 10
           └─26333 /usr/libexec/ioprocess --read-pipe-fd 102 --write-pipe-fd 101 --max-threads 10 --max-queued-requests 10

Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_space
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_lo
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local systemd[1]: Started Virtual Desktop Server Manager.
Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available.
Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available, KSM stats will be missing.
Feb 11 18:51:25 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score
                                                               Traceback (most recent call last):
                                                                 File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo...
Feb 11 18:51:34 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score
                                                               Traceback (most recent call last):
                                                                 File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo...
Feb 11 18:51:35 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score
                                                               Traceback (most recent call last):
                                                                 File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo...
Feb 11 18:51:43 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score
                                                               Traceback (most recent call last):
                                                                 File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo...
Feb 11 18:56:32 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN ping was deprecated in favor of ping2 and confirmConnectivity

------------------------------------------------------------------------------------------------------------------------------------------------------------
2.-"gluster vol engine heal info" is showing the following and it never finishes healing
------------------------------------------------------------------------------------------------------------------------------------------------------------
[root@host2~]# gluster vol heal engine info
Brick host1:/gluster_bricks/engine/engine
/7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta 
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta 
Status: Connected
Number of entries: 2

Brick host2:/gluster_bricks/engine/engine
/7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta 
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta 
Status: Connected
Number of entries: 2

Brick host3:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0

------------------------------------------------------------------------------------------------------------------------------------------------------------
3.-Every hour I see the following entries/errors
------------------------------------------------------------------------------------------------------------------------------------------------------------
VDSM command SetVolumeDescriptionVDS failed: Could not acquire resource. Probably resource factory threw an exception.: ()

------------------------------------------------------------------------------------------------------------------------------------------------------------
4.- I am also seeing the following pertaining to the engine volume
------------------------------------------------------------------------------------------------------------------------------------------------------------
Failed to update OVF disks 86196e10-8103-4b00-bd3e-0f577a8bb5b2, OVF data isn't updated on those OVF stores (Data Center Default, Storage Domain hosted_storage).

------------------------------------------------------------------------------------------------------------------------------------------------------------
5.-hosted-engine --vm-status
------------------------------------------------------------------------------------------------------------------------------------------------------------
--== Host host1 (id: 1) status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : host1
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : be592659
local_conf_timestamp               : 480218
Host timestamp                     : 480217
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=480217 (Tue Feb 11 19:22:20 2020)
	host-id=1
	score=3400
	vm_conf_refresh_time=480218 (Tue Feb 11 19:22:21 2020)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineDown
	stopped=False

--== Host host3 (id: 2) status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : host3
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "Up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 1f4a8597
local_conf_timestamp               : 436681
Host timestamp                     : 436681
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=436681 (Tue Feb 11 19:22:18 2020)
	host-id=2
	score=3400
	vm_conf_refresh_time=436681 (Tue Feb 11 19:22:18 2020)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineUp
	stopped=False

--== Host host2 (id: 3) status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : host2
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down_missing", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : ca5c1918
local_conf_timestamp               : 479644
Host timestamp                     : 479644
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=479644 (Tue Feb 11 19:22:21 2020)
	host-id=3
	score=3400
	vm_conf_refresh_time=479644 (Tue Feb 11 19:22:22 2020)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineDown
	stopped=False

------------------------------------------------------------------------------------------------------------------------------------------------------------

Any ideas on what might be going ?

adrianquintero＠gmail.com

Edward Berger

Strahil Nikolov

adrianquintero＠gmail.com

adrianquintero＠gmail.com

Strahil Nikolov

Adrian Quintero

Strahil Nikolov

adrianquintero＠gmail.com

tags

participants (4)