[ovirt-devel] Libvirt secrets management - take 2

Sat Jun 13 13:52:19 UTC 2015

On 12/06/15 08:10 -0400, Nir Soffer wrote:
>Here are more details on the new approach.
>
>A Ceph key is required only when starting a vm or hot-plugging a disk.
>Once the operation is done, libvirt does not need the Ceph key any more.

>A vm operation requiring a secret, will register a Ceph key using new
>random UUID, and remove the libvirt secret as soon as the operation was
>finished or failed.
>
>This scheme does not require secret reference counting. If multiple vms
>need the same Ceph key, we register it multiple times with libvirt,
>using unique UUIDs.
>
>This also avoid possible races when removing a libvirt secret in the
>same time another vm is trying to add it, or updating secret usage id,
>which is currently racy (you must remove the existing secret and
>register a new one).

I really like this new design.  As long as you remove the secrets when
you're done with them then I am not concerned with having an
accumulation of untracked secrets.  I am happy to get rid of the
complexity of secret reference counting.

>Positive flow:
>
>- Engine adds required Ceph keys to vm description
>- Vdsm register keys with libvirt, using new random UUID
>- When the vm operation is done (vm started, disk hot-plugged), remove
>  the temporary secret (e.g. using try-finally)
>
>Negative flows:
>
>- Vm operation fails - temporary secret is unregistered in the finally
>  block
>- Vdsm crash during the operation - temporary secret is removed when
>  vdsm starts again.
>- Libvirt crash during the operation - secret removed since we use
>  ephemeral secrets
>- Host crash during operation - same
>- Libvirt fail to remove secret - we cannot handle this :-)

What about the case where a storage domain becomes unavailable
(causing the VM to pause with -EIO).  When the domainMonitor
reestablishes a connection to the domain would the secrets need to be
renewed?

>
>Flows
>=====
>
>Start vm
>--------
>- Engine add required secrets to vm description
>- Vdsm register temporary secrets with libvirt
>- When vm is up or if operation failed, Vdsm remove the temporary secret
>
>Migrate vm
>----------
>- Engine add required secrets to vm description
>- Vdsm add secrets to the vm description sent to the destination
>- On the destination, Vdsm register temporary secrets with libvirt
>- On the destination, when vm is up or if operation failed, Vdsm remove
>  the temporary secret
>
>Hot-plug disk
>-------------
>- Engine add secret to disk description
>- Vdsm register temporary secret with libvirt
>- When disk is successfully plugged, or if operation failed, Vdsm remove
>  the temporary secret
>
>I think this is the correct direction, assuming that we can get it works
>for migration - I have no idea on that part.

+1 - This is a much improved design in my opinion.

>
>----- Original Message -----
>> From: "Nir Soffer" <nsoffer at redhat.com>
>> To: "devel" <devel at ovirt.org>
>> Cc: "Francesco Romani" <fromani at redhat.com>, "Federico Simoncelli" <fsimonce at redhat.com>, "Dan Kenigsberg"
>> <danken at redhat.com>, "Adam Litke" <alitke at redhat.com>, "Allon Mureinik" <amureini at redhat.com>, "Daniel Erez"
>> <derez at redhat.com>, "Michal Skrivanek" <mskrivan at redhat.com>, "Eric Blake" <eblake at redhat.com>
>> Sent: Friday, June 12, 2015 2:21:46 PM
>> Subject: Libvirt secrets management - take 2
>>
>> Hi all,
>>
>> Recently support for Ceph network disk landed in master. It its possible
>> now to start a vm using Ceph network disk or hot-plug/unplug such disk
>> using Cephx authentication.
>>
>> However, to make it work, you must add the relevant Ceph secret to
>> libvirt manually, in the same way it is done in OpenStack deployment.
>> Our goal is to manage secrets automatically and use ephemeral (safer)
>> secrets.
>>
>> The next patches in the Ceph topic [1], implement secret management in
>> the same way we manage storage domains or server connections:
>>
>> The concept is - all hosts can use all secrets, so you can migrate a vm
>> using Ceph disk to any host in the cluster.
>>
>> 1. When host becomes up, we register the secrets associated with all the
>>    current active domains with libvirt
>>
>> 2. When activating a domain, we register the secrets associated with the
>>    new domain with libvirt
>>
>> 3. When deactivating a domain, we unregister the secrets associated with
>>    the domain from libvirt
>>
>> 4. When moving host to maintenance, we clear all secrets
>>
>> 5. When vdsm shutdown or starts, clear all secrets to ensure that we don't
>> keep
>>    stale or unneeded secrets on a host
>>
>> This system seems to work, but Federico pointed few issues and suggested
>> a new (simpler?) approach.
>>
>> In future libvirt version, libvirt will support the concept of transient
>> secrets so you can start a transient vm using secret without registering
>> the secret with libvirt before starting the vm. The secret will be
>> specified in the vm XML (for starting a vm) or disk XML (for hot-plug).
>> This will make our secret management system and APIs useless.
>>
>> Managing state on multiple hosts is hard; we will probably have to deal
>> with nasty edge cases (e.g. lost messages, network errors), which may
>> lead to host with missing secret, which cannot run some vms. We probably
>> do this right for storage domains (after 8 years?), and we should not
>> assume that we are smarter and secret management will work in the first
>> try.
>>
>> The new approach is to *not* manage state or multiple hosts. Instead,
>> send the required secrets only to the host that starting a vm or
>> hot-plugging a disk that need a libvirt secret:
>>
>> 1. When starting a vm, add the required secrets to the vm description.
>>    On the host, vdsm will register these secrets with libvirt before
>>    starting the vm.
>>
>> 2. When migrating a vm, add the required secrets to the vm description.
>>    On the host, vdsm will send these secrets to the destination host,
>>    and on the destination host, vdsm will register the secrets with libvirt
>>    before starting the vm.
>>
>> 3. When hot-plugging a disk, send the secret if needed in the disk
>>    description.  On the host, vdsm will register the secrets with libvirt.
>>
>> 4. When vdsm shutdown or starts, clear all secrets to ensure that we don't
>> keep
>>    stale or unneeded secrets on a host
>>
>> 5. We never unregister secrets, since they are ephemeral anyway.
>>
>> 6. Alternatively, we can implement secrets reference counting so when a vm
>>    stops or disk is hot-unplugged we decrease the reference count on the
>>    secrets associated with this vm/disk, and if no other vms need the
>>    secret, we can unregister the secret from libvirt.
>>
>> The new approach is simpler, if we avoid the fancy secret reference
>> counting. I believe we can get it merged in couple of weeks with help
>> from the virt team.
>>
>> Please share your thoughts on these alternative solutions.
>>
>> Thanks,
>> Nir
>>
>> [1]
>> https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:ceph

-- 
Adam Litke