Sanlock volume corrupted on deployment

On Sat, Jan 26, 2019 at 5:13 PM Strahil <hunter86_bg@yahoo.com> wrote:
Hey guys,
I have noticed that with 4.2.8 the sanlock issue (during deployment) is still not fixed. Am I the only one with bad luck or there is something broken there ?
Hi, I'm not aware on anything breaking hosted-engine deployment on 4.2.8. Which kind of storage are you using? Can you please share your logs?
The sanlock service reports code 's7 add_lockspace fail result -233' 'leader1 delta_acquire_begin error -233 lockspace hosted-engine host_id 1'.
Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SZMF5KKHSXOUTL...

Hi Simone, I will reinstall the nodes and will provide an update. Best Regards,Strahil Nikolov On Sat, Jan 26, 2019 at 5:13 PM Strahil <hunter86_bg@yahoo.com> wrote: Hey guys, I have noticed that with 4.2.8 the sanlock issue (during deployment) is still not fixed.Am I the only one with bad luck or there is something broken there ? Hi,I'm not aware on anything breaking hosted-engine deployment on 4.2.8.Which kind of storage are you using?Can you please share your logs? The sanlock service reports code 's7 add_lockspace fail result -233' 'leader1 delta_acquire_begin error -233 lockspace hosted-engine host_id 1'. Best Regards,Strahil Nikolov_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SZMF5KKHSXOUTL...

On Sat, Jan 26, 2019 at 6:13 PM Strahil <hunter86_bg@yahoo.com> wrote:
Hey guys,
I have noticed that with 4.2.8 the sanlock issue (during deployment) is still not fixed. Am I the only one with bad luck or there is something broken there ?
The sanlock service reports code 's7 add_lockspace fail result -233' 'leader1 delta_acquire_begin error -233 lockspace hosted-engine host_id 1'.
Sanlock does not have such error code - are you sure this is -233? Here sanlock return values: https://pagure.io/sanlock/blob/master/f/src/sanlock_rv.h Can you share your sanlock log?
Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SZMF5KKHSXOUTL...

Dear All, I have rebuilt the gluster cluster , but it seems that with the latest updates (I started over from scratch) I am not able to complete the "Prepare VM" phase and thus I cannot reach to the last phase where the sanlock issue happens. I have checked the contents of " /var/log/ovirt-hosted-engine-setup/engine-logs-2019-01-31T06:54:22Z/ovirt-engine/engine.log" and the only errors I see are: [root@ovirt1 ovirt-engine]# grep ERROR engine.log 2019-01-31 08:56:33,326+02 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [3806b629] Failed in 'GlusterServersListVDS' method 2019-01-31 08:56:33,343+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [3806b629] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirt1.localdomain command GlusterServersListVDS failed: The method does not exist or is not available: {'method': u'GlusterHost.list'} 2019-01-31 08:56:33,344+02 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [3806b629] Command 'GlusterServersListVDSCommand(HostName = ovirt1.localdomain, VdsIdVDSCommandParametersBase:{hostId='07c6b36a-6939-4059-8dd3-4e47ea094538'})' execution failed: VDSGenericException: VDSErrorException: Failed to GlusterServersListVDS, error = The method does not exist or is not available: {'method': u'GlusterHost.list'}, code = -32601 2019-01-31 08:56:33,591+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-55) [51bf8a11] EVENT_ID: GLUSTER_COMMAND_FAILED(4,035), Gluster command [<UNKNOWN>] failed on server <UNKNOWN>. 2019-01-31 08:56:34,856+02 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-60) [3ee4bd51] Failed in 'GlusterServersListVDS' method 2019-01-31 08:56:34,857+02 ERROR [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-60) [3ee4bd51] Command 'GlusterServersListVDSCommand(HostName = ovirt1.localdomain, VdsIdVDSCommandParametersBase:{hostId='07c6b36a-6939-4059-8dd3-4e47ea094538'})' execution failed: VDSGenericException: VDSErrorException: Failed to GlusterServersListVDS, error = The method does not exist or is not available: {'method': u'GlusterHost.list'}, code = -32601 2019-01-31 08:56:35,191+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-60) [3fd826e] EVENT_ID: GLUSTER_COMMAND_FAILED(4,035), Gluster command [<UNKNOWN>] failed on server <UNKNOWN>. Any hint how to proceed further ? Best Regards,Strahil Nikolov В вторник, 29 януари 2019 г., 14:01:17 ч. Гринуич+2, Strahil <hunter86_bg@yahoo.com> написа: Dear Nir, According to redhat solution 1179163 'add_lockspace fail result -233' indicates corrupted ids lockspace. During the install, the VM fails to get up.In order to fix it, I stop:ovirt-ha-agent, ovirt-ha-broker, vdsmd, supervdsmd, sanlockThen reinitialize the lockspace via 'sanlock direct init -s' (used bugreport 1116469 as guidance).Once the init is successful and all the services are up - the VM is started but the deployment was long over and the setup needs additional cleaning up. I will rebuild the gluster cluster and then will repeat the deployment. Can you guide me what information will be needed , as I'm quite new in ovirt/RHV ? Best Regards,Strahil Nikolov On Jan 28, 2019 20:34, Nir Soffer <nsoffer@redhat.com> wrote: On Sat, Jan 26, 2019 at 6:13 PM Strahil <hunter86_bg@yahoo.com> wrote: Hey guys, I have noticed that with 4.2.8 the sanlock issue (during deployment) is still not fixed.Am I the only one with bad luck or there is something broken there ? The sanlock service reports code 's7 add_lockspace fail result -233' 'leader1 delta_acquire_begin error -233 lockspace hosted-engine host_id 1'. Sanlock does not have such error code - are you sure this is -233? Here sanlock return values:https://pagure.io/sanlock/blob/master/f/src/sanlock_rv.h Can you share your sanlock log? Best Regards,Strahil Nikolov_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SZMF5KKHSXOUTL... Dear Nir, According to redhat solution 1179163 'add_lockspace fail result -233' indicates corrupted ids lockspace. During the install, the VM fails to get up. In order to fix it, I stop: ovirt-ha-agent, ovirt-ha-broker, vdsmd, supervdsmd, sanlock Then reinitialize the lockspace via 'sanlock direct init -s' (used bugreport 1116469 as guidance). Once the init is successful and all the services are up - the VM is started but the deployment was long over and the setup needs additional cleaning up. I will rebuild the gluster cluster and then will repeat the deployment. Can you guide me what information will be needed , as I'm quite new in ovirt/RHV ? Best Regards, Strahil Nikolov On Jan 28, 2019 20:34, Nir Soffer <nsoffer@redhat.com> wrote:
On Sat, Jan 26, 2019 at 6:13 PM Strahil <hunter86_bg@yahoo.com> wrote:
Hey guys,
I have noticed that with 4.2.8 the sanlock issue (during deployment) is still not fixed. Am I the only one with bad luck or there is something broken there ?
The sanlock service reports code 's7 add_lockspace fail result -233' 'leader1 delta_acquire_begin error -233 lockspace hosted-engine host_id 1'.
Sanlock does not have such error code - are you sure this is -233?
Here sanlock return values: https://pagure.io/sanlock/blob/master/f/src/sanlock_rv.h
Can you share your sanlock log?
Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SZMF5KKHSXOUTL...

Dear Nir, the issue with the 'The method does not exist or is not available: {'method': u'GlusterHost.list'}, code = -32601' is not related to the sanlock. I don't know why the 'vdsm-gluster' package was not installed as a dependency.
Can you share your sanlock log?
I'm attaching the contents of /var/log , but here is a short snippet: About the sanlock issue - it reappeared with errors like :2019-01-31 13:33:10 27551 [17279]: leader1 delta_acquire_begin error -223 lockspace hosted-engine host_id 1 2019-01-31 13:33:10 27551 [17279]: leader2 path /var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fed8ac83b offset 0 2019-01-31 13:33:10 27551 [17279]: leader3 m 0 v 30003 ss 512 nh 0 mh 1 oi 0 og 0 lv 0 2019-01-31 13:33:10 27551 [17279]: leader4 sn hosted-engine rn ts 0 cs 60346c59 2019-01-31 13:33:11 27551 [21482]: s6 add_lockspace fail result -223 2019-01-31 13:33:16 27556 [21482]: s7 lockspace hosted-engine:1:/var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fe d8ac83b:0 I have managed to fix it by running the following immediately after the ha services were started by ansible: cd /rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/ha_agent/ sanlock direct init -s hosted-engine:0:hosted-engine.lockspace:0 systemctl stop ovirt-ha-agent ovirt-ha-broker systemctl status vdsmd systemctl start ovirt-ha-broker ovirt-ha-agent Once the VM started - ansible managed to finish the deployment without any issues.I hope someone can check the sanlock init stuff , as it is really frustrating. Best Regards,Strahil Nikolov

On Thu, Jan 31, 2019 at 2:52 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Dear Nir,
the issue with the 'The method does not exist or is not available: {'method': u'GlusterHost.list'}, code = -32601' is not related to the sanlock. I don't know why the 'vdsm-gluster' package was not installed as a dependency.
Please file a bug about this.
Can you share your sanlock log?
I'm attaching the contents of /var/log , but here is a short snippet:
About the sanlock issue - it reappeared with errors like : 2019-01-31 13:33:10 27551 [17279]: leader1 delta_acquire_begin error -223 lockspace hosted-engine host_id 1
As I said, the error is not -233, but -223, which make sense - this error means sanlock did not find the magic number for a delta lease area, which means the area was not formatted, or corrupted.
2019-01-31 13:33:10 27551 [17279]: leader2 path /var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fed8ac83b offset 0 2019-01-31 13:33:10 27551 [17279]: leader3 m 0 v 30003 ss 512 nh 0 mh 1 oi 0 og 0 lv 0 2019-01-31 13:33:10 27551 [17279]: leader4 sn hosted-engine rn ts 0 cs 60346c59 2019-01-31 13:33:11 27551 [21482]: s6 add_lockspace fail result -223 2019-01-31 13:33:16 27556 [21482]: s7 lockspace hosted-engine:1:/var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fe d8ac83b:0
I have managed to fix it by running the following immediately after the ha services were started by ansible:
cd /rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/ha_agent/
This is not a path managed by vdsm, so I guess the issue is with hosted enigne specific lockspace that is managed by hosted engine, not by vdsm.
sanlock direct init -s hosted-engine:0:hosted-engine.lockspace:0
This formats the lockspace, and is expected to fix this issue.
systemctl stop ovirt-ha-agent ovirt-ha-broker systemctl status vdsmd systemctl start ovirt-ha-broker ovirt-ha-agent
Once the VM started - ansible managed to finish the deployment without any issues. I hope someone can check the sanlock init stuff , as it is really frustrating.
If I understand the flow correctly, you create a new environment from scratch, so this is an issue with hosted engine deploymnet, not initializing the lockspace. I think filing a bug with the info in this thread is the first step. Simone, can you take a look at this?

On Thu, Jan 31, 2019 at 2:48 PM Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Jan 31, 2019 at 2:52 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Dear Nir,
the issue with the 'The method does not exist or is not available: {'method': u'GlusterHost.list'}, code = -32601' is not related to the sanlock. I don't know why the 'vdsm-gluster' package was not installed as a dependency.
Please file a bug about this.
Can you share your sanlock log?
I'm attaching the contents of /var/log , but here is a short snippet:
About the sanlock issue - it reappeared with errors like : 2019-01-31 13:33:10 27551 [17279]: leader1 delta_acquire_begin error -223 lockspace hosted-engine host_id 1
As I said, the error is not -233, but -223, which make sense - this error means sanlock did not find the magic number for a delta lease area, which means the area was not formatted, or corrupted.
2019-01-31 13:33:10 27551 [17279]: leader2 path /var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fed8ac83b offset 0 2019-01-31 13:33:10 27551 [17279]: leader3 m 0 v 30003 ss 512 nh 0 mh 1 oi 0 og 0 lv 0 2019-01-31 13:33:10 27551 [17279]: leader4 sn hosted-engine rn ts 0 cs 60346c59 2019-01-31 13:33:11 27551 [21482]: s6 add_lockspace fail result -223 2019-01-31 13:33:16 27556 [21482]: s7 lockspace hosted-engine:1:/var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fe d8ac83b:0
I have managed to fix it by running the following immediately after the ha services were started by ansible:
cd /rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/ha_agent/
This is not a path managed by vdsm, so I guess the issue is with hosted enigne specific lockspace that is managed by hosted engine, not by vdsm.
sanlock direct init -s hosted-engine:0:hosted-engine.lockspace:0
This formats the lockspace, and is expected to fix this issue.
systemctl stop ovirt-ha-agent ovirt-ha-broker systemctl status vdsmd systemctl start ovirt-ha-broker ovirt-ha-agent
Once the VM started - ansible managed to finish the deployment without any issues. I hope someone can check the sanlock init stuff , as it is really frustrating.
I'd suggest to avoid directly playing with the managed in the middle of the deployment to avoid further issues.
If I understand the flow correctly, you create a new environment from scratch, so this is an issue with hosted engine deploymnet, not initializing the lockspace.
I think filing a bug with the info in this thread is the first step.
Simone, can you take a look at this?
On our CI env everything is working as expected and the lockspace volume got initialised as expected. In the attached logs a log of steps got skipped since a lot of things were already up and running so they are not really useful. Strahil, can you please retry on a really clean environment and eventually attach the relevant logs if you are able to reproduce the issue?

On Tue, Jan 29, 2019 at 2:00 PM Strahil <hunter86_bg@yahoo.com> wrote:
Dear Nir,
According to redhat solution 1179163 'add_lockspace fail result -233' indicates corrupted ids lockspace.
Good work finding the solution! Note that the page mention error -223, not -233: 2014-08-27 14:26:42+0000 2244 [14497]: s30 add_lockspace fail result -223 #<-- corrupted ids lockspace
During the install, the VM fails to get up. In order to fix it, I stop: ovirt-ha-agent, ovirt-ha-broker, vdsmd, supervdsmd, sanlock Then reinitialize the lockspace via 'sanlock direct init -s' (used bugreport 1116469 as guidance). Once the init is successful and all the services are up - the VM is started but the deployment was long over and the setup needs additional cleaning up.
I will rebuild the gluster cluster and then will repeat the deployment.
Can you guide me what information will be needed , as I'm quite new in ovirt/RHV ?
Best Regards, Strahil Nikolov
On Jan 28, 2019 20:34, Nir Soffer <nsoffer@redhat.com> wrote:
On Sat, Jan 26, 2019 at 6:13 PM Strahil <hunter86_bg@yahoo.com> wrote:
Hey guys,
I have noticed that with 4.2.8 the sanlock issue (during deployment) is still not fixed. Am I the only one with bad luck or there is something broken there ?
The sanlock service reports code 's7 add_lockspace fail result -233' 'leader1 delta_acquire_begin error -233 lockspace hosted-engine host_id 1'.
Sanlock does not have such error code - are you sure this is -233?
Here sanlock return values: https://pagure.io/sanlock/blob/master/f/src/sanlock_rv.h
Can you share your sanlock log?
Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SZMF5KKHSXOUTL...
participants (4)
-
Nir Soffer
-
Simone Tiraboschi
-
Strahil
-
Strahil Nikolov