On Thu, Feb 24, 2022 at 6:28 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
On Thu, Feb 24, 2022 at 6:10 PM Muli Ben-Yehuda
<muli(a)lightbitslabs.com>
wrote:
>
> On Thu, Feb 24, 2022 at 3:58 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>> On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda <muli(a)lightbitslabs.com>
wrote:
>> >
>> > Thanks for the detailed instructions, Nir. I'm going to scrounge up
some hardware.
>> > By the way, if anyone else would like to work on NVMe/TCP support,
for NVMe/TCP target you can either use Lightbits (talk to me offline for
details) or use the upstream Linux NVMe/TCP target. Lightbits is a
clustered storage system while upstream is a single target, but the client
side should be close enough for vdsm/ovirt purposes.
>>
>> I played with NVMe/TCP a little bit, using qemu to create a virtual
>> NVMe disk, and export
>> it using the kernel on one VM, and consume it on another VM.
>>
https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
>>
>> One question about device naming - do we always get the same name of the
>> device in all hosts?
>
>
> No, we do not, see below how we handle migration in os_brick.
>
>> To support VM migration, every device must have unique name in the
cluster.
>> With multipath we always have unique name, since we disable "friendly
names",
>> so we always have:
>>
>> /dev/mapper/{wwid}
>>
>> With rbd we also do not use /dev/rbdN but a unique path:
>>
>> /dev/rbd/poolname/volume-vol-id
>>
>> How do we ensure cluster-unique device path? If os_brick does not
handle it, we
>> can to do in ovirt, for example:
>>
>> /run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
>>
>> but I think this should be handled in cinderlib, since openstack have
>> the same problem with migration.
>
>
> Indeed. Both the Lightbits LightOS connector and the nvmeof connector do
this through the target provided namespace (LUN) UUID. After connecting to
the target, the connectors wait for the local friendly-named device file
that has the right UUID to show up, and then return the friendly name. So
different hosts will have different friendly names, but the VMs will be
attached to the right namespace since we return the friendly name on the
current host that has the right UUID. Does this also work for you?
It will not work for oVirt.
Migration in oVirt works like this:
1. Attach disks to destination host
2. Send VM XML from source host to destination host, and start the
VM is paused mode
3. Start the migration on the source host
4. When migration is done, start the CPU on the destination host
5. Detach the disks from the source
This will break in step 2, since the source xml refer to nvme device
that does not exist or already used by another VM.
Indeed.
To make this work, the VM XML must use the same path, existing on
both hosts.
The issue can be solved by libvirt hook updating the paths before qemu
is started on the destination, but I think the right way to handle this is
to
have the same path.
You mentioned above that it can be handled in ovirt (c.f.,
/run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42), which seems like a
reasonable approach given the constraint imposed by the oVirt migration
flow you outlined above. What information does vdsm need to create and use
the /var/run/vdsm/managedvolumes/{uuid} link? Today the connector does
(trimmed for brevity):
def connect_volume(self, connection_properties):
device_info = {'type': 'block'}
uuid = connection_properties['uuid']
device_path = self._get_device_by_uuid(uuid)
device_info['path'] = device_path
return device_info
Cheers,
Muli
--
*Lightbits Labs**
*Lead the cloud-native data center
transformation by
delivering *scalable *and *efficient *software
defined storage that is
*easy *to consume.
*This message is sent in confidence for the addressee
only. It
may contain legally privileged information. The contents are not
to be
disclosed to anyone other than the addressee. Unauthorized recipients
are
requested to preserve this confidentiality, advise the sender
immediately of
any error in transmission and delete the email from their
systems.*