Dear Nir,
the issue with the 'The method does not exist or is not available: {'method': u'GlusterHost.list'}, code = -32601' is not related to the sanlock. I don't know why the 'vdsm-gluster' package was not installed as a dependency.
Please file a bug about this.
> Can you share your sanlock log?
>
I'm attaching the contents of /var/log , but here is a short snippet:
About the sanlock issue - it reappeared with errors like :
2019-01-31 13:33:10 27551 [17279]: leader1 delta_acquire_begin error -223 lockspace hosted-engine host_id 1
As I said, the error is not -233, but -223, which make sense - this error means sanlock did not
find the magic number for a delta lease area, which means the area was not formatted, or
corrupted.
2019-01-31 13:33:10 27551 [17279]: leader2 path /var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fed8ac83b offset 0
2019-01-31 13:33:10 27551 [17279]: leader3 m 0 v 30003 ss 512 nh 0 mh 1 oi 0 og 0 lv 0
2019-01-31 13:33:10 27551 [17279]: leader4 sn hosted-engine rn ts 0 cs 60346c59
2019-01-31 13:33:11 27551 [21482]: s6 add_lockspace fail result -223
2019-01-31 13:33:16 27556 [21482]: s7 lockspace hosted-engine:1:/var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fe
d8ac83b:0
I have managed to fix it by running the following immediately after the ha services were started by ansible:
cd /rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/ha_agent/
This is not a path managed by vdsm, so I guess the issue is with hosted enigne
specific lockspace that is managed by hosted engine, not by vdsm.
sanlock direct init -s hosted-engine:0:hosted-engine.lockspace:0
This formats the lockspace, and is expected to fix this issue.
systemctl stop ovirt-ha-agent ovirt-ha-broker
systemctl status vdsmd
systemctl start ovirt-ha-broker ovirt-ha-agent
Once the VM started - ansible managed to finish the deployment without any issues.
I hope someone can check the sanlock init stuff , as it is really frustrating.
I'd suggest to avoid directly playing with the managed in the middle of the deployment to avoid further issues.
If I understand the flow correctly, you create a new environment from scratch, so this is
an issue with hosted engine deploymnet, not initializing the lockspace.
I think filing a bug with the info in this thread is the first step.
Simone, can you take a look at this?
On our CI env everything is working as expected and the lockspace volume got initialised as expected.
In the attached logs a log of steps got skipped since a lot of things were already up and running so they are not really useful.
Strahil, can you please retry on a really clean environment and eventually attach the relevant logs if you are able to reproduce the issue?