[ovirt-users] Re: oVirt 4.3.7 and Gluster 6.6 multiple issues

12 Feb 2020

      On February 12, 2020 5:30:42 AM GMT+02:00, adrianquintero@gmail.com wrote:
...
Hi, 
I am having a couple of issues with fresh ovirt 4.3.7  HCI setup with 3
nodes
------------------------------------------------------------------------------------------------------------------------------------------------------------
1.-vdsm is showing the following errors for HOST1 and HOST2 (HOST3
seems to be ok):
------------------------------------------------------------------------------------------------------------------------------------------------------------
    service vdsmd status
Redirecting to /bin/systemctl status vdsmd.service
● vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor
preset: enabled)
 Active: active (running) since Tue 2020-02-11 18:50:28 PST; 28min ago
Process: 25457 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh
--pre-start (code=exited, status=0/SUCCESS)
Main PID: 25549 (vdsmd)
   Tasks: 76
  CGroup: /system.slice/vdsmd.service
          ├─25549 /usr/bin/python2 /usr/share/vdsm/vdsmd
├─25707 /usr/libexec/ioprocess --read-pipe-fd 52 --write-pipe-fd 51
--max-threads 10 --max-queued-requests 10
├─26314 /usr/libexec/ioprocess --read-pipe-fd 92 --write-pipe-fd 86
--max-threads 10 --max-queued-requests 10
├─26325 /usr/libexec/ioprocess --read-pipe-fd 96 --write-pipe-fd 93
--max-threads 10 --max-queued-requests 10
└─26333 /usr/libexec/ioprocess --read-pipe-fd 102 --write-pipe-fd 101
--max-threads 10 --max-queued-requests 10
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local
vdsmd_init_common.sh[25457]: vdsm: Running test_space
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local
vdsmd_init_common.sh[25457]: vdsm: Running test_lo
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local systemd[1]: Started
Virtual Desktop Server Manager.
Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM
not available.
Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM
not available, KSM stats will be missing.
Feb 11 18:51:25 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR
failed to retrieve Hosted Engine HA score
                                    Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
_getHaInfo...
Feb 11 18:51:34 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR
failed to retrieve Hosted Engine HA score
                                    Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
_getHaInfo...
Feb 11 18:51:35 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR
failed to retrieve Hosted Engine HA score
                                    Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
_getHaInfo...
Feb 11 18:51:43 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR
failed to retrieve Hosted Engine HA score
                                    Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
_getHaInfo...
Feb 11 18:56:32 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN
ping was deprecated in favor of ping2 and confirmConnectivity
------------------------------------------------------------------------------------------------------------------------------------------------------------
2.-"gluster vol engine heal info" is showing the following and it never
finishes healing
------------------------------------------------------------------------------------------------------------------------------------------------------------
[root@host2~]# gluster vol heal engine info
Brick host1:/gluster_bricks/engine/engine
/7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
Status: Connected
Number of entries: 2
Brick host2:/gluster_bricks/engine/engine
/7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
Status: Connected
Number of entries: 2
Brick host3:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
------------------------------------------------------------------------------------------------------------------------------------------------------------
3.-Every hour I see the following entries/errors
------------------------------------------------------------------------------------------------------------------------------------------------------------
VDSM command SetVolumeDescriptionVDS failed: Could not acquire
resource. Probably resource factory threw an exception.: ()
------------------------------------------------------------------------------------------------------------------------------------------------------------
4.- I am also seeing the following pertaining to the engine volume
------------------------------------------------------------------------------------------------------------------------------------------------------------
Failed to update OVF disks 86196e10-8103-4b00-bd3e-0f577a8bb5b2, OVF
data isn't updated on those OVF stores (Data Center Default, Storage
Domain hosted_storage).
------------------------------------------------------------------------------------------------------------------------------------------------------------
5.-hosted-engine --vm-status
------------------------------------------------------------------------------------------------------------------------------------------------------------
--== Host host1 (id: 1) status ==--
conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : host1
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : be592659
local_conf_timestamp               : 480218
Host timestamp                     : 480217
Extra metadata (valid at timestamp):
  metadata_parse_version=1
  metadata_feature_version=1
  timestamp=480217 (Tue Feb 11 19:22:20 2020)
  host-id=1
  score=3400
  vm_conf_refresh_time=480218 (Tue Feb 11 19:22:21 2020)
  conf_on_shared_storage=True
  maintenance=False
  state=EngineDown
  stopped=False
--== Host host3 (id: 2) status ==--
conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : host3
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up",
"detail": "Up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 1f4a8597
local_conf_timestamp               : 436681
Host timestamp                     : 436681
Extra metadata (valid at timestamp):
  metadata_parse_version=1
  metadata_feature_version=1
  timestamp=436681 (Tue Feb 11 19:22:18 2020)
  host-id=2
  score=3400
  vm_conf_refresh_time=436681 (Tue Feb 11 19:22:18 2020)
  conf_on_shared_storage=True
  maintenance=False
  state=EngineUp
  stopped=False
--== Host host2 (id: 3) status ==--
conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : host2
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this
host", "health": "bad", "vm": "down_missing", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : ca5c1918
local_conf_timestamp               : 479644
Host timestamp                     : 479644
Extra metadata (valid at timestamp):
  metadata_parse_version=1
  metadata_feature_version=1
  timestamp=479644 (Tue Feb 11 19:22:21 2020)
  host-id=3
  score=3400
  vm_conf_refresh_time=479644 (Tue Feb 11 19:22:22 2020)
  conf_on_shared_storage=True
  maintenance=False
  state=EngineDown
  stopped=False
------------------------------------------------------------------------------------------------------------------------------------------------------------
Any ideas on what might be going ?
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZIHE4E7RIVLPA3...
The meta file issue is a bug which will soon be fixed.
The easiest way to recover is to compare the contents of the file on all Bricks and then rsync the newest file (usually only timestamp inside is increased) to the other bricks and issue a full heal.

Try to stop and then start ovirt-ha-broker & ovirt-ha-agent services .

For point 4 - I guess you will need find out if the file is there and  if there are errors on 'sanlock.service' .

Best Regards,
Strahil Nikolov