----- Le 12 Avr 16, à 17:29, Simone Tiraboschi stirabos(a)redhat.com a écrit :
On Tue, Apr 12, 2016 at 5:12 PM, Martin Sivak
<msivak(a)redhat.com> wrote:
> Hi,
>
> thanks for the summary, this is what I was suspecting.
>
> Just a clarification about the hosted engine host-id and lockspace.
> Hosted engine has a separate lockspace from VDSM and uses
> hosted-engine's host-id there consistently to protect a metadata
> whiteboard. It has nothing to do with the VM and there is no conflict
> here.
>
>
> The issue seems to be that the VDSM lockspace is being used when
> connect storage domain is called and both hosted engine and
> ovirt-engine can call the connect command. Unfortunately hosted engine
> does not know the vds_spm_id when mounting the volume for the first
> time (even before ovirt-engine VM is started) and uses the host-id for
> that.
>
> Now, there is probably no issue when all hosts accessing that storage
> domain are hosted engine enabled right from the start as the storage
> domain is mounted to all hosts before the engine starts and the
> locking uses consistent id (hosted engine host-id).
>
> The problem surfaces on a host where the engine manages to call the
> "connect hosted engine storage domain" first. Because engine uses the
> vds_spm_id for the requested lease and a collision happens.
>
> I do not see any easy fix at this moment, maybe except telling engine
> to use hosted engine id when it tries to connect the hosted engine
> storage domain. That feels like a hack, but might work.
>
> There also seems to be a bug for this issue now:
>
https://bugzilla.redhat.com/show_bug.cgi?id=1322849
IMHO, it seems that this bug is related.
>
> Simone/Nir can you please comment on the issue to confirm that our
> findings are correct?
I feel so but probably the solution you proposed is not enough since
we also allow to mix hosted-engine enabled hosts and regular hosts
(where you don't have any hosted-engine id) in the same cluster and,
once the hosted-engine-storage domain got imported by engine, the
engine will going to connect it on all of them.
More than in one cluster, i think that problem is data-center wide. As you said as the
hosted-engine-storage domain got imported by engine, any host in the DC can connect to it
(in my case virt4 and virt7 are hosts from different clusters, but in the same DC).
> Thanks
>
> Regards
>
> --
> Martin Sivak
> SLA / oVirt
>
> On Tue, Apr 12, 2016 at 4:31 PM, Baptiste Agasse
> <baptiste.agasse(a)lyra-network.com> wrote:
>> Hi all,
>>
>> Last week we had problem on our ovirt infrastructure. The hosted engine
didn't
>> came up after the reboot of the host which hosted it. With the help of some
>> people on #ovirt IRC channel (msivak, nsoffer and some others, thank to all of
>> them) i managed to have my hosted engine up and running, but the underlying
>> problem is still there. I think there is an inconsistency between sanlock ID of
>> the hosts.
>>
>> Some background:
>>
>> We installed ovirt in 3.5 on CentOS 7 about 9 month ago. We have one DC with two
>> clusters:
>>
>> cluster 1: 4 hosts (virt1, virt2, virt3, virt4) that were installed with
>> 'hosted-engine --deploy' so there are capable to run the engine VM.
>> cluster 2: 2 hosts (virt6 and virt7) that were installed via the webui, so are
>> 'normal' ovirt hosts.
>>
>> Since that we have successfully upgraded ovirt to 3.6 and set our cluster to 3.6
>> compatibility mode.
>>
>> Some weeks after something broke and the virt4 host rebooted. After some help on
>> the IRC channel, i managed to get the engine vm up and running. After that i
>> dug into the problem that seems to be around the sanlock part.
>>
>> After explanations, that i understand is:
>>
>> sanlock manage locks at DC level. there is an hosted_engine lock to manage who
>> run the VM and there is a vdsm level lock on the hosted_engine disk (or any
>> other VM disks) to know who can write to the disk.
>>
>> The problem in my case is that on some hosts that were installed in 3.5, the
>> hosted_engine ID and the vds_spm_id are not the same, and some other host have
>> it vds_spm_id identical to some other host hosted_engine_id. So in some case,
>> some host can't acquire the lock on some disks and have different IDs in the
>> sanlock space.
>>
>> Example, im my case:
>>
>> #
>> # For the hosted_engine hosts:
>> #
>> [root@virt1 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf
>> host_id=1
>>
>> [root@virt2 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf
>> host_id=2
>>
>> [root@virt3 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf
>> host_id=3
>>
>> [root@virt4 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf
>> host_id=4
>>
>> #
>> # For all hosts, including hosted engine:
>> #
>> [root@virt1 ~]# sanlock client status
>> daemon 3a99892c-5d3a-4d3d-bac7-d35259363c98.virt1
>> p -1 helper
>> p -1 listener
>> p -1 status
>> s
>>
hosted-engine:1:/var/run/vdsm/storage/377ae8e8-0eeb-4591-b50f-3d21298b4146/607719dd-b71e-4527-814a-964ed0c1f8ea/6a0b878d-fe7e-4fb6-bd5d-1254bebb0ca0:0
>> s
>>
295207d7-41ea-4cda-a028-f860c357d46b:1:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
>> s
>>
daf1b53c-7e29-4b18-a9e2-910605cc7080:1:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
>> s
>>
680d5ed1-ed70-4340-a430-ddfa39ee3052:1:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
>> s
>>
350e5736-41c0-4017-a8fd-9866edad3333:1:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
>> s
>>
377ae8e8-0eeb-4591-b50f-3d21298b4146:1:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
>>
>> [root@virt2 ~]# sanlock client status
>> daemon 48fe11a1-6c64-4a56-abf0-6f9690e6a8c2.virt2
>> p -1 helper
>> p -1 listener
>> p -1 status
>> s
>>
hosted-engine:2:/var/run/vdsm/storage/377ae8e8-0eeb-4591-b50f-3d21298b4146/607719dd-b71e-4527-814a-964ed0c1f8ea/6a0b878d-fe7e-4fb6-bd5d-1254bebb0ca0:0
>> s
>>
377ae8e8-0eeb-4591-b50f-3d21298b4146:2:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
>> s
>>
295207d7-41ea-4cda-a028-f860c357d46b:3:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
>> s
>>
daf1b53c-7e29-4b18-a9e2-910605cc7080:3:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
>> s
>>
680d5ed1-ed70-4340-a430-ddfa39ee3052:3:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
>> s
>>
350e5736-41c0-4017-a8fd-9866edad3333:3:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
>> r
>>
350e5736-41c0-4017-a8fd-9866edad3333:SDM:/dev/350e5736-41c0-4017-a8fd-9866edad3333/leases:1048576:26
>> p 9304
>> r
>>
377ae8e8-0eeb-4591-b50f-3d21298b4146:d704cf05-e294-4ada-9627-920c9997cf22:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/leases:111149056:21
>> p 32747
>>
>> [root@virt3 ~]# sanlock client status
>> daemon 3388d8e5-922d-45ab-8ecb-6e321a7a8a4a.virt3
>> p -1 helper
>> p -1 listener
>> p -1 status
>> s
>>
daf1b53c-7e29-4b18-a9e2-910605cc7080:2:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
>> s
>>
680d5ed1-ed70-4340-a430-ddfa39ee3052:2:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
>> s
>>
295207d7-41ea-4cda-a028-f860c357d46b:2:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
>> s
>>
350e5736-41c0-4017-a8fd-9866edad3333:2:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
>> s
>>
377ae8e8-0eeb-4591-b50f-3d21298b4146:2:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
>> ADD
>>
>> [root@virt4 ~]# sanlock client status
>> daemon 3ec5f49e-9920-48a5-97a7-2e900ae374ed.virt4
>> p -1 helper
>> p -1 listener
>> p -1 status
>> s
>>
377ae8e8-0eeb-4591-b50f-3d21298b4146:4:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
>> s
>>
daf1b53c-7e29-4b18-a9e2-910605cc7080:6:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
>> s
>>
295207d7-41ea-4cda-a028-f860c357d46b:6:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
>> s
>>
680d5ed1-ed70-4340-a430-ddfa39ee3052:6:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
>> s
>>
350e5736-41c0-4017-a8fd-9866edad3333:6:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
>> s
>>
hosted-engine:4:/var/run/vdsm/storage/377ae8e8-0eeb-4591-b50f-3d21298b4146/607719dd-b71e-4527-814a-964ed0c1f8ea/6a0b878d-fe7e-4fb6-bd5d-1254bebb0ca0:0
>>
>> [root@virt6 bagasse]# sanlock client status
>> daemon 031a9126-52ac-497a-8403-cd8c3f2db1c1.virt6
>> p -1 helper
>> p -1 listener
>> p -1 status
>> s
>>
377ae8e8-0eeb-4591-b50f-3d21298b4146:5:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
>> s
>>
295207d7-41ea-4cda-a028-f860c357d46b:5:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
>> s
>>
daf1b53c-7e29-4b18-a9e2-910605cc7080:5:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
>> s
>>
680d5ed1-ed70-4340-a430-ddfa39ee3052:5:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
>> s
>>
350e5736-41c0-4017-a8fd-9866edad3333:5:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
>>
>> [root@virt7 ~]# sanlock client status
>> daemon 3ef87845-975a-443c-af71-0df1981fb8d4.virt7
>> p -1 helper
>> p -1 listener
>> p -1 status
>> s
>>
350e5736-41c0-4017-a8fd-9866edad3333:4:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
>> s
>>
daf1b53c-7e29-4b18-a9e2-910605cc7080:4:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
>> ADD
>> s
>>
377ae8e8-0eeb-4591-b50f-3d21298b4146:4:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
>> ADD
>> s
>>
295207d7-41ea-4cda-a028-f860c357d46b:4:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
>> ADD
>> s
>>
680d5ed1-ed70-4340-a430-ddfa39ee3052:4:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
>> ADD
>>
>> #
>> # The output that I've found in engine database:
>> #
>> engine=# SELECT
>>
vds_spm_id_map.storage_pool_id,vds_spm_id_map.vds_spm_id,vds_spm_id_map.vds_id,vds.vds_name
>> FROM vds_spm_id_map, vds WHERE vds_spm_id_map.vds_id = vds.vds_id;
>> storage_pool_id | vds_spm_id | vds_id
|
>> vds_name
>>
--------------------------------------+------------+--------------------------------------+---------------------------
>> 00000002-0002-0002-0002-000000000208 | 6 |
>> c6aef4f9-e972-40a0-916e-4ed296de46db | virt4
>> 00000002-0002-0002-0002-000000000208 | 5 |
>> 5922e88b-c6de-41ce-ab64-046f66c8d08e | virt6
>> 00000002-0002-0002-0002-000000000208 | 4 |
>> fcd962ea-3158-468d-a0b9-d7bb864ba959 | virt7
>> 00000002-0002-0002-0002-000000000208 | 1 |
>> b43933d7-7338-41f6-9a71-f7cd389b9167 | virt1
>> 00000002-0002-0002-0002-000000000208 | 2 |
>> 031526e8-110e-4254-97ef-1a26cb67b835 | virt3
>> 00000002-0002-0002-0002-000000000208 | 3 |
>> 09609537-0c33-437a-93fa-b246f0bb57e4 | virt2
>> (6 rows)
>>
>> So, in this case, for example, the host virt4 have host_id=4 and vds_spm_id=6,
>> so this host have these 2 ids in sanlock
>>
>>
>> So my questions are:
>> * How can i get out of this behaviour ?
>> * As i didn't find the vds_spm_id on any hosts, can i modify this value in
the
>> database to make them identical to host_id ?
>> * This strange behaviour is a possible side effect of the upgrade in ovirt in
>> 3.6 and the import of hosted engine storage in ovirt engine ?
>>
>> Any pointers are welcome.
>>
>>
>> Have a nice day.
>>
>> Regards.
>>
>> --
>> Baptiste
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
--
Baptiste AGASSE
Lyra Network France, Senior GNU/Linux engineer
109 Rue de l'innovation, 31670 Labège - France
Phone: (+33)5.67.22.31.87
Fax: (+33)5.67.22.31.61
E-mail: baptiste.agasse(a)lyra-network.com
Website:
http://www.lyra-network.com