oVirt 4.3.7 and Gluster 6.6 multiple issues

Hi, I am having a couple of issues with fresh ovirt 4.3.7 HCI setup with 3 nodes ------------------------------------------------------------------------------------------------------------------------------------------------------------ 1.-vdsm is showing the following errors for HOST1 and HOST2 (HOST3 seems to be ok): ------------------------------------------------------------------------------------------------------------------------------------------------------------ service vdsmd status Redirecting to /bin/systemctl status vdsmd.service ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2020-02-11 18:50:28 PST; 28min ago Process: 25457 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 25549 (vdsmd) Tasks: 76 CGroup: /system.slice/vdsmd.service ├─25549 /usr/bin/python2 /usr/share/vdsm/vdsmd ├─25707 /usr/libexec/ioprocess --read-pipe-fd 52 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10 ├─26314 /usr/libexec/ioprocess --read-pipe-fd 92 --write-pipe-fd 86 --max-threads 10 --max-queued-requests 10 ├─26325 /usr/libexec/ioprocess --read-pipe-fd 96 --write-pipe-fd 93 --max-threads 10 --max-queued-requests 10 └─26333 /usr/libexec/ioprocess --read-pipe-fd 102 --write-pipe-fd 101 --max-threads 10 --max-queued-requests 10 Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_space Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_lo Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local systemd[1]: Started Virtual Desktop Server Manager. Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available. Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available, KSM stats will be missing. Feb 11 18:51:25 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:34 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:35 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:43 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:56:32 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN ping was deprecated in favor of ping2 and confirmConnectivity ------------------------------------------------------------------------------------------------------------------------------------------------------------ 2.-"gluster vol engine heal info" is showing the following and it never finishes healing ------------------------------------------------------------------------------------------------------------------------------------------------------------ [root@host2~]# gluster vol heal engine info Brick host1:/gluster_bricks/engine/engine /7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta /7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta Status: Connected Number of entries: 2 Brick host2:/gluster_bricks/engine/engine /7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta /7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta Status: Connected Number of entries: 2 Brick host3:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 ------------------------------------------------------------------------------------------------------------------------------------------------------------ 3.-Every hour I see the following entries/errors ------------------------------------------------------------------------------------------------------------------------------------------------------------ VDSM command SetVolumeDescriptionVDS failed: Could not acquire resource. Probably resource factory threw an exception.: () ------------------------------------------------------------------------------------------------------------------------------------------------------------ 4.- I am also seeing the following pertaining to the engine volume ------------------------------------------------------------------------------------------------------------------------------------------------------------ Failed to update OVF disks 86196e10-8103-4b00-bd3e-0f577a8bb5b2, OVF data isn't updated on those OVF stores (Data Center Default, Storage Domain hosted_storage). ------------------------------------------------------------------------------------------------------------------------------------------------------------ 5.-hosted-engine --vm-status ------------------------------------------------------------------------------------------------------------------------------------------------------------ --== Host host1 (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : host1 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : be592659 local_conf_timestamp : 480218 Host timestamp : 480217 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=480217 (Tue Feb 11 19:22:20 2020) host-id=1 score=3400 vm_conf_refresh_time=480218 (Tue Feb 11 19:22:21 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host host3 (id: 2) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : host3 Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1f4a8597 local_conf_timestamp : 436681 Host timestamp : 436681 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=436681 (Tue Feb 11 19:22:18 2020) host-id=2 score=3400 vm_conf_refresh_time=436681 (Tue Feb 11 19:22:18 2020) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host host2 (id: 3) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : host2 Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_missing", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : ca5c1918 local_conf_timestamp : 479644 Host timestamp : 479644 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=479644 (Tue Feb 11 19:22:21 2020) host-id=3 score=3400 vm_conf_refresh_time=479644 (Tue Feb 11 19:22:22 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False ------------------------------------------------------------------------------------------------------------------------------------------------------------ Any ideas on what might be going ?

For number 2, I'd look at the actual gluster file directories, in which I'd expect to see host3 is missing the files. I'd rsync the files from one of the other hosts to the same location on host3 and then run the "gluster heal volume engine". Since its the engine volume, I wouldn't be surprised if that's the cause of the other errors. In the past, I've seen where I had to manually restart glusterd and other gluster related services to make everything happy. My latest changes were to update to 4.3.8 and add /etc/host entries for the gluster hosts on every node, and gluster seems to be stable in cases where volumes would seem to go offline before. On Tue, Feb 11, 2020 at 10:31 PM <adrianquintero@gmail.com> wrote:
Hi, I am having a couple of issues with fresh ovirt 4.3.7 HCI setup with 3 nodes
------------------------------------------------------------------------------------------------------------------------------------------------------------ 1.-vdsm is showing the following errors for HOST1 and HOST2 (HOST3 seems to be ok):
------------------------------------------------------------------------------------------------------------------------------------------------------------ service vdsmd status Redirecting to /bin/systemctl status vdsmd.service ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2020-02-11 18:50:28 PST; 28min ago Process: 25457 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 25549 (vdsmd) Tasks: 76 CGroup: /system.slice/vdsmd.service ├─25549 /usr/bin/python2 /usr/share/vdsm/vdsmd ├─25707 /usr/libexec/ioprocess --read-pipe-fd 52 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10 ├─26314 /usr/libexec/ioprocess --read-pipe-fd 92 --write-pipe-fd 86 --max-threads 10 --max-queued-requests 10 ├─26325 /usr/libexec/ioprocess --read-pipe-fd 96 --write-pipe-fd 93 --max-threads 10 --max-queued-requests 10 └─26333 /usr/libexec/ioprocess --read-pipe-fd 102 --write-pipe-fd 101 --max-threads 10 --max-queued-requests 10
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_space Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_lo Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local systemd[1]: Started Virtual Desktop Server Manager. Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available. Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available, KSM stats will be missing. Feb 11 18:51:25 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:34 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:35 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:43 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:56:32 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN ping was deprecated in favor of ping2 and confirmConnectivity
------------------------------------------------------------------------------------------------------------------------------------------------------------ 2.-"gluster vol engine heal info" is showing the following and it never finishes healing
------------------------------------------------------------------------------------------------------------------------------------------------------------ [root@host2~]# gluster vol heal engine info Brick host1:/gluster_bricks/engine/engine /7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
Status: Connected Number of entries: 2
Brick host2:/gluster_bricks/engine/engine /7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
Status: Connected Number of entries: 2
Brick host3:/gluster_bricks/engine/engine Status: Connected Number of entries: 0
------------------------------------------------------------------------------------------------------------------------------------------------------------ 3.-Every hour I see the following entries/errors
------------------------------------------------------------------------------------------------------------------------------------------------------------ VDSM command SetVolumeDescriptionVDS failed: Could not acquire resource. Probably resource factory threw an exception.: ()
------------------------------------------------------------------------------------------------------------------------------------------------------------ 4.- I am also seeing the following pertaining to the engine volume
------------------------------------------------------------------------------------------------------------------------------------------------------------ Failed to update OVF disks 86196e10-8103-4b00-bd3e-0f577a8bb5b2, OVF data isn't updated on those OVF stores (Data Center Default, Storage Domain hosted_storage).
------------------------------------------------------------------------------------------------------------------------------------------------------------ 5.-hosted-engine --vm-status
------------------------------------------------------------------------------------------------------------------------------------------------------------ --== Host host1 (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : host1 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : be592659 local_conf_timestamp : 480218 Host timestamp : 480217 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=480217 (Tue Feb 11 19:22:20 2020) host-id=1 score=3400 vm_conf_refresh_time=480218 (Tue Feb 11 19:22:21 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host host3 (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : host3 Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1f4a8597 local_conf_timestamp : 436681 Host timestamp : 436681 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=436681 (Tue Feb 11 19:22:18 2020) host-id=2 score=3400 vm_conf_refresh_time=436681 (Tue Feb 11 19:22:18 2020) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host host2 (id: 3) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : host2 Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_missing", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : ca5c1918 local_conf_timestamp : 479644 Host timestamp : 479644 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=479644 (Tue Feb 11 19:22:21 2020) host-id=3 score=3400 vm_conf_refresh_time=479644 (Tue Feb 11 19:22:22 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
------------------------------------------------------------------------------------------------------------------------------------------------------------
Any ideas on what might be going ? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZIHE4E7RIVLPA3...

On February 12, 2020 5:30:42 AM GMT+02:00, adrianquintero@gmail.com wrote:
Hi, I am having a couple of issues with fresh ovirt 4.3.7 HCI setup with 3 nodes
------------------------------------------------------------------------------------------------------------------------------------------------------------ 1.-vdsm is showing the following errors for HOST1 and HOST2 (HOST3 seems to be ok): ------------------------------------------------------------------------------------------------------------------------------------------------------------ service vdsmd status Redirecting to /bin/systemctl status vdsmd.service ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2020-02-11 18:50:28 PST; 28min ago Process: 25457 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 25549 (vdsmd) Tasks: 76 CGroup: /system.slice/vdsmd.service ├─25549 /usr/bin/python2 /usr/share/vdsm/vdsmd ├─25707 /usr/libexec/ioprocess --read-pipe-fd 52 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10 ├─26314 /usr/libexec/ioprocess --read-pipe-fd 92 --write-pipe-fd 86 --max-threads 10 --max-queued-requests 10 ├─26325 /usr/libexec/ioprocess --read-pipe-fd 96 --write-pipe-fd 93 --max-threads 10 --max-queued-requests 10 └─26333 /usr/libexec/ioprocess --read-pipe-fd 102 --write-pipe-fd 101 --max-threads 10 --max-queued-requests 10
Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_space Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local vdsmd_init_common.sh[25457]: vdsm: Running test_lo Feb 11 18:50:28 tij-059-ovirt1.grupolucerna.local systemd[1]: Started Virtual Desktop Server Manager. Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available. Feb 11 18:50:29 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN MOM not available, KSM stats will be missing. Feb 11 18:51:25 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:34 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:35 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:51:43 tij-059-ovirt1.grupolucerna.local vdsm[25549]: ERROR failed to retrieve Hosted Engine HA score Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo... Feb 11 18:56:32 tij-059-ovirt1.grupolucerna.local vdsm[25549]: WARN ping was deprecated in favor of ping2 and confirmConnectivity
------------------------------------------------------------------------------------------------------------------------------------------------------------ 2.-"gluster vol engine heal info" is showing the following and it never finishes healing ------------------------------------------------------------------------------------------------------------------------------------------------------------ [root@host2~]# gluster vol heal engine info Brick host1:/gluster_bricks/engine/engine /7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
Status: Connected Number of entries: 2
Brick host2:/gluster_bricks/engine/engine /7a68956e-3736-46d1-8932-8576f8ee8882/images/86196e10-8103-4b00-bd3e-0f577a8bb5b2/98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta
/7a68956e-3736-46d1-8932-8576f8ee8882/images/b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd/ed569aed-005e-40fd-9297-dd54a1e4946c.meta
Status: Connected Number of entries: 2
Brick host3:/gluster_bricks/engine/engine Status: Connected Number of entries: 0
------------------------------------------------------------------------------------------------------------------------------------------------------------ 3.-Every hour I see the following entries/errors ------------------------------------------------------------------------------------------------------------------------------------------------------------ VDSM command SetVolumeDescriptionVDS failed: Could not acquire resource. Probably resource factory threw an exception.: ()
------------------------------------------------------------------------------------------------------------------------------------------------------------ 4.- I am also seeing the following pertaining to the engine volume ------------------------------------------------------------------------------------------------------------------------------------------------------------ Failed to update OVF disks 86196e10-8103-4b00-bd3e-0f577a8bb5b2, OVF data isn't updated on those OVF stores (Data Center Default, Storage Domain hosted_storage).
------------------------------------------------------------------------------------------------------------------------------------------------------------ 5.-hosted-engine --vm-status ------------------------------------------------------------------------------------------------------------------------------------------------------------ --== Host host1 (id: 1) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : host1 Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : be592659 local_conf_timestamp : 480218 Host timestamp : 480217 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=480217 (Tue Feb 11 19:22:20 2020) host-id=1 score=3400 vm_conf_refresh_time=480218 (Tue Feb 11 19:22:21 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host host3 (id: 2) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : host3 Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1f4a8597 local_conf_timestamp : 436681 Host timestamp : 436681 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=436681 (Tue Feb 11 19:22:18 2020) host-id=2 score=3400 vm_conf_refresh_time=436681 (Tue Feb 11 19:22:18 2020) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
--== Host host2 (id: 3) status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname : host2 Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_missing", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : ca5c1918 local_conf_timestamp : 479644 Host timestamp : 479644 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=479644 (Tue Feb 11 19:22:21 2020) host-id=3 score=3400 vm_conf_refresh_time=479644 (Tue Feb 11 19:22:22 2020) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
------------------------------------------------------------------------------------------------------------------------------------------------------------
Any ideas on what might be going ? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZIHE4E7RIVLPA3...
The meta file issue is a bug which will soon be fixed. The easiest way to recover is to compare the contents of the file on all Bricks and then rsync the newest file (usually only timestamp inside is increased) to the other bricks and issue a full heal. Try to stop and then start ovirt-ha-broker & ovirt-ha-agent services . For point 4 - I guess you will need find out if the file is there and if there are errors on 'sanlock.service' . Best Regards, Strahil Nikolov

Ok so I ran and rsync from host2 over to host3 for the .meta files only and that seems to have worked: 98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta ed569aed-005e-40fd-9297-dd54a1e4946c.meta [root@host1 ~]# gluster vol heal engine info Brick host1.grupolucerna.local:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick host3:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick host3:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 In this case I did not have to stop/start the ovirt-ha-broker and ovirt-ha-agent I still see the issue of the OVF, wondering if I should just rsync the whole /gluster_bricks/engine/engine directory from host3 over to host1 and 2 because of the following 1969 timestamps: I 1969 as the timestamps on some directories in host1: /gluster_bricks/engine/engine/7a68956e-3736-46d1-8932-8576f8ee8882/images: drwxr-xr-x. 2 vdsm kvm 8.0K Dec 31 1969 b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd drwxr-xr-x. 2 vdsm kvm 8.0K Dec 31 1969 86196e10-8103-4b00-bd3e-0f577a8bb5b2 on host2 I see the same: drwxr-xr-x. 2 vdsm kvm 8.0K Dec 31 1969 b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd drwxr-xr-x. 2 vdsm kvm 8.0K Dec 31 1969 86196e10-8103-4b00-bd3e-0f577a8bb5b2 but for host3: I see a valid timestamp drwxr-xr-x. 2 vdsm kvm 149 Feb 16 09:43 86196e10-8103-4b00-bd3e-0f577a8bb5b2 drwxr-xr-x. 2 vdsm kvm 149 Feb 16 09:45 b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd If we take a close look it seems that host3 has a valid timestamp but host1 and host2 have a 1969 date. Thoughts? thanks, Adrian

After a couple of hours all looking good and seems that the timestamps corrected themselves and OVF errors are no more. Thank you all for the help. Regards, Adrian

On February 16, 2020 9:16:14 PM GMT+02:00, adrianquintero@gmail.com wrote:
After a couple of hours all looking good and seems that the timestamps corrected themselves and OVF errors are no more.
Thank you all for the help.
Regards,
Adrian _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/57T3ELDXT5DYZA...
Hi Adrian, Did you rsync the brticks ? Best Regards, Strahil Nikolov

Hi Strahil, no, just the, .meta files and that solved everything. On Sun, Feb 16, 2020, 8:36 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On February 16, 2020 9:16:14 PM GMT+02:00, adrianquintero@gmail.com wrote:
After a couple of hours all looking good and seems that the timestamps corrected themselves and OVF errors are no more.
Thank you all for the help.
Regards,
Adrian _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/57T3ELDXT5DYZA...
Hi Adrian,
Did you rsync the brticks ?
Best Regards, Strahil Nikolov

On February 16, 2020 8:03:17 PM GMT+02:00, adrianquintero@gmail.com wrote:
Ok so I ran and rsync from host2 over to host3 for the .meta files only and that seems to have worked:
98d64fb4-df01-4981-9e5e-62be6ca7e07c.meta ed569aed-005e-40fd-9297-dd54a1e4946c.meta
[root@host1 ~]# gluster vol heal engine info Brick host1.grupolucerna.local:/gluster_bricks/engine/engine Status: Connected Number of entries: 0
Brick host3:/gluster_bricks/engine/engine Status: Connected Number of entries: 0
Brick host3:/gluster_bricks/engine/engine Status: Connected Number of entries: 0
In this case I did not have to stop/start the ovirt-ha-broker and ovirt-ha-agent
I still see the issue of the OVF, wondering if I should just rsync the whole /gluster_bricks/engine/engine directory from host3 over to host1 and 2 because of the following 1969 timestamps:
I 1969 as the timestamps on some directories in host1: /gluster_bricks/engine/engine/7a68956e-3736-46d1-8932-8576f8ee8882/images: drwxr-xr-x. 2 vdsm kvm 8.0K Dec 31 1969 b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd drwxr-xr-x. 2 vdsm kvm 8.0K Dec 31 1969 86196e10-8103-4b00-bd3e-0f577a8bb5b2
on host2 I see the same: drwxr-xr-x. 2 vdsm kvm 8.0K Dec 31 1969 b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd drwxr-xr-x. 2 vdsm kvm 8.0K Dec 31 1969 86196e10-8103-4b00-bd3e-0f577a8bb5b2
but for host3: I see a valid timestamp drwxr-xr-x. 2 vdsm kvm 149 Feb 16 09:43 86196e10-8103-4b00-bd3e-0f577a8bb5b2 drwxr-xr-x. 2 vdsm kvm 149 Feb 16 09:45 b8ce22c5-8cbd-4d7f-b544-9ce930e04dcd
If we take a close look it seems that host3 has a valid timestamp but host1 and host2 have a 1969 date.
Thoughts?
thanks,
Adrian _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZXL3YXZLO4QGFX...
Hi Adrian, If you are fully in sync , you can sync the brick without gluster directories : For example: rsync -avP /gluster_bricks/engine/engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/ node2:/gluster_bricks/engine/engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/ Best Regards, Sttrahil Nikolov

Thanks Strahil, I did the Rsync only for the .meta files and that seemed to have done the trick, I just waited a couple of hours and the OVF error resolved itself and since it was the engine OVF I think Edward was right and the rest of the issues got resolved. regards, Adrian
participants (4)
-
Adrian Quintero
-
adrianquintero@gmail.com
-
Edward Berger
-
Strahil Nikolov