On Thu, Jan 31, 2019 at 2:48 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Jan 31, 2019 at 2:52 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Dear Nir,

the issue with the 'The method does not exist or is not available: {'method': u'GlusterHost.list'}, code = -32601' is not related to the sanlock. I don't know why the 'vdsm-gluster' package was not installed as a dependency.

Please file a bug about this.

> Can you share your sanlock log?
>
I'm attaching the contents of /var/log , but here is a short snippet:

About the sanlock issue - it reappeared with errors like :
2019-01-31 13:33:10 27551 [17279]: leader1 delta_acquire_begin error -223 lockspace hosted-engine host_id 1

As I said, the error is not -233, but -223, which make sense - this error means sanlock did not
find the magic number for a delta lease area, which means the area was not formatted, or
corrupted.
 
2019-01-31 13:33:10 27551 [17279]: leader2 path /var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fed8ac83b offset 0
2019-01-31 13:33:10 27551 [17279]: leader3 m 0 v 30003 ss 512 nh 0 mh 1 oi 0 og 0 lv 0
2019-01-31 13:33:10 27551 [17279]: leader4 sn hosted-engine rn  ts 0 cs 60346c59
2019-01-31 13:33:11 27551 [21482]: s6 add_lockspace fail result -223
2019-01-31 13:33:16 27556 [21482]: s7 lockspace hosted-engine:1:/var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fe
d8ac83b:0


I have managed to fix it by running the following immediately after the ha services were started by ansible:

cd /rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/ha_agent/

This is not a path managed by vdsm, so I guess the issue is with hosted enigne
specific lockspace that is managed by hosted engine, not by vdsm.
 
sanlock direct init -s hosted-engine:0:hosted-engine.lockspace:0 

This formats the lockspace, and is expected to fix this issue.

 
systemctl stop ovirt-ha-agent ovirt-ha-broker
systemctl status vdsmd
systemctl start ovirt-ha-broker ovirt-ha-agent

Once the VM started - ansible managed to finish the deployment without any issues.
I hope someone can check the sanlock init stuff , as it is really frustrating.

I'd suggest to avoid directly playing with the managed in the middle of the deployment to avoid further issues.
 

If I understand the flow correctly, you create a new environment from scratch, so this is 
an issue with hosted engine deploymnet, not initializing the lockspace.

I think filing a bug with the info in this thread is the first step.

Simone, can you take a look at this?

On our CI env everything is working as expected and the lockspace volume got initialised as expected.
In the attached logs a log of steps got skipped since a lot of things were already up and running so they are not really useful.
Strahil, can you please retry on a really clean environment and eventually attach the relevant logs if you are able to reproduce the issue?