On 24/02, Nir Soffer wrote:
On Wed, Feb 23, 2022 at 6:24 PM Muli Ben-Yehuda
<muli(a)lightbitslabs.com> wrote:
>
> Thanks for the detailed instructions, Nir. I'm going to scrounge up some
hardware.
> By the way, if anyone else would like to work on NVMe/TCP support, for NVMe/TCP
target you can either use Lightbits (talk to me offline for details) or use the upstream
Linux NVMe/TCP target. Lightbits is a clustered storage system while upstream is a single
target, but the client side should be close enough for vdsm/ovirt purposes.
I played with NVMe/TCP a little bit, using qemu to create a virtual
NVMe disk, and export
it using the kernel on one VM, and consume it on another VM.
https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
Hi,
You can also use nvmetcli to create nvme-of devices using the kernel's
nvmet.
I haven't tested any cinder NVMe driver with cinderlib yet, but I'll
test it with the LVM driver and nvmet target, since I'm currently
working on improvements/fixes on both the nvmet target and the os-brick
connector.
I have played with both iSCSI and RDMA (using Soft-RoCE) as transport
protocols for NVMe-oF and they worked fine in OpenStack.
Something important to consider when thinking about making it enterprise
ready is that the NVMe-oF connector in os-brick doesn't currently
support any kind of multipathing: native (ANA) or using device mapper.
But it's something we'll be working on.
I'll let you know how the cinderlib testing goes, though I already know
that the LVM with nvmet has problems in the disconnection [1].
[1]:
https://bugs.launchpad.net/os-brick/+bug/1961102
One question about device naming - do we always get the same name of
the
device in all hosts?
Definitely not, depending on the transport protocol used and the
features enabled (such as multipathing), os-brick will return a
different path to the device.
In the case of nvme-of it will return devices like /dev/nvme0n1 ==> This
means controller 0 and namespace 1 in the nvme host system.
And the namespace 1 in the system can actually have a different
namespace id (for example 10). Example from a test system using LVM and
a nvmet target variant I'm working on:
$ sudo nvme list
Node SN Model Namespace Usage Format
FW Rev
-------------- ------------------ ------- --------- -------------------------
---------------- --------
/dev/nvme0n1 9a9bd17b53e6725f Linux 11 1.07 GB / 1.07 GB 512 B +
0 B 4.18.0-2
/dev/nvme0n2 9a9bd17b53e6725f Linux 10 1.07 GB / 1.07 GB 512 B +
0 B 4.18.0-2
To support VM migration, every device must have unique name in the cluster.
With multipath we always have unique name, since we disable "friendly names",
so we always have:
/dev/mapper/{wwid}
With rbd we also do not use /dev/rbdN but a unique path:
/dev/rbd/poolname/volume-vol-id
How do we ensure cluster-unique device path? If os_brick does not handle it, we
can to do in ovirt, for example:
/run/vdsm/mangedvolumes/{uuid} -> /dev/nvme7n42
os-brick will not handle this, but assuming udev rules are working
consistently in both migration systems (source and target) there will be
a symlink in /dev/disk/by-id that is formed using the NVMe UUID of the
volume.
In the example above we have:
$ ls -l /dev/disk/by-id/nvme*
lrwxrwxrwx. 1 root root 13 Feb 24 16:30 /dev/disk/by-id/nvme-Linux_9a9bd17b53e6725f
-> ../../nvme0n2
lrwxrwxrwx. 1 root root 13 Feb 24 16:30
/dev/disk/by-id/nvme-uuid.5310ef24-8301-4e38-a8b8-b61cd61d8b36 -> ../../nvme0n2
lrwxrwxrwx. 1 root root 13 Feb 24 16:30
/dev/disk/by-id/nvme-uuid.e31b8c9c-b943-430e-afa4-55a110341dcb -> ../../nvme0n1
The uuid may not be the volume uuid, it will depend on the cinder
driver, but we can find the uuid for the specific nvme device easily
enough:
$ cat /sys/class/nvme/nvme0/nvme0n2/wwid
uuid.5310ef24-8301-4e38-a8b8-b61cd61d8b36
but I think this should be handled in cinderlib, since openstack
have
the same problem
with migration.
OpenStack doesn't have that problem with migrations.
In OpenStack we don't care where the device appears, because nova knows
the volume id of the volume before calling os-brick to connect to it,
and then when os-brick returns the path it knows it belongs to that
specific volume.
Cheers,
Gorka.
Nir
>
> Cheers,
> Muli
> --
> Muli Ben-Yehuda
> Co-Founder and Chief Scientist @
http://www.lightbitslabs.com
> LightOS: The Special Storage Sauce For Your Cloud
>
>
> On Wed, Feb 23, 2022 at 4:55 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>> On Wed, Feb 23, 2022 at 4:20 PM Muli Ben-Yehuda <muli(a)lightbitslabs.com>
wrote:
>> >
>> > Thanks, Nir and Benny (nice to run into you again, Nir!). I'm a
neophyte in ovirt and vdsm... What's the simplest way to set up a development
environment? Is it possible to set up a "standalone" vdsm environment to hack
support for nvme/tcp or do I need "full ovirt" to make it work?
>>
>> It should be possible to install vdsm on a single host or vm, and use vdsm
>> API to bring the host to the right state, and then attach devices and run
>> vms. But I don't know anyone that can pull this out since simulating what
>> engine is doing is hard.
>>
>> So the best way is to set up at least one host and engine host using the
>> latest 4.5 rpms, and continue from there. Once you have a host, building
>> vdsm on the host and upgrading the rpms is pretty easy.
>>
>> My preferred setup is to create vms using virt-manager for hosts, engine
>> and storage and run all the vms on my laptop.
>>
>> Note that you must have some traditional storage (NFS/iSCSI) to bring up
>> the system even if you plan to use only managed block storage (MBS).
>> Unfortunately when we add MBS support we did have time to fix the huge
>> technical debt so you still need a master storage domain using one of the
>> traditional legacy options.
>>
>> To build a setup, you can use:
>>
>> - engine vm: 6g ram, 2 cpus, centos stream 8
>> - hosts vm: 4g ram, 2 cpus, centos stream 8
>> you can start with one host and add more hosts later if you want to
>> test migration.
>> - storage vm: 2g ram, 2 cpus, any os you like, I use alpine since it
>> takes very little
>> memory and its NFS server is fast.
>>
>> See vdsm README for instructions how to setup a host:
>>
https://github.com/oVirt/vdsm#manual-installation
>>
>> For engine host you can follow:
>>
https://ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_...
>>
>> And after that this should work:
>>
>> dnf install ovirt-engine
>> engine-setup
>>
>> Accepting all the defaults should work.
>>
>> When you have engine running, you can add a new host with
>> the ip address or dns name of you host(s) vm, and engine will
>> do everything for you. Note that you must install the ovirt-release-master
>> rpm on the host before you add it to engine.
>>
>> Nir
>>
>> >
>> > Cheers,
>> > Muli
>> > --
>> > Muli Ben-Yehuda
>> > Co-Founder and Chief Scientist @
http://www.lightbitslabs.com
>> > LightOS: The Special Storage Sauce For Your Cloud
>> >
>> >
>> > On Wed, Feb 23, 2022 at 4:16 PM Nir Soffer <nsoffer(a)redhat.com>
wrote:
>> >>
>> >> On Wed, Feb 23, 2022 at 2:48 PM Benny Zlotnik
<bzlotnik(a)redhat.com> wrote:
>> >> >
>> >> > So I started looking in the logs and tried to follow along with
the
>> >> > code, but things didn't make sense and then I saw it's
ovirt 4.3 which
>> >> > makes things more complicated :)
>> >> > Unfortunately because GUID is sent in the metadata the volume is
>> >> > treated as a vdsm managed volume[2] for the udev rule generation
and
>> >> > it prepends the /dev/mapper prefix to an empty string as a
result.
>> >> > I don't have the vdsm logs, so I am not sure where exactly
this fails,
>> >> > but if it's after [4] it may be possible to workaround it with
a vdsm
>> >> > hook
>> >> >
>> >> > In 4.4.6 we moved the udev rule triggering the volume mapping
phase,
>> >> > before starting the VM. But it could still not work because we
check
>> >> > the driver_volume_type in[1], and I saw it's
"driver_volume_type":
>> >> > "lightos" for lightbits
>> >> > In theory it looks like it wouldn't take much to add support
for your
>> >> > driver in a future release (as it's pretty late for 4.5)
>> >>
>> >> Adding support for nvme/tcp in 4.3 is probably not feasible, but we
will
>> >> be happy to accept patches for 4.5.
>> >>
>> >> To debug such issues vdsm log is the best place to check. We should
see
>> >> the connection info passed to vdsm, and we have pretty simple code
using
>> >> it with os_brick to attach the device to the system and setting up the
udev
>> >> rule (which may need some tweaks).
>> >>
>> >> Nir
>> >>
>> >> > [1]
https://github.com/oVirt/vdsm/blob/500c035903dd35180d71c97791e0ce4356fb77...
>> >> >
>> >> > (4.3)
>> >> > [2]
https://github.com/oVirt/vdsm/blob/b42d4a816b538e00ea4955576a5fe762367be7...
>> >> > [3]
https://github.com/oVirt/vdsm/blob/b42d4a816b538e00ea4955576a5fe762367be7...
>> >> > [4]
https://github.com/oVirt/vdsm/blob/b42d4a816b538e00ea4955576a5fe762367be7...
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Feb 23, 2022 at 12:44 PM Muli Ben-Yehuda
<muli(a)lightbitslabs.com> wrote:
>> >> > >
>> >> > > Certainly, thanks for your help!
>> >> > > I put cinderlib and engine.log here:
http://www.mulix.org/misc/ovirt-logs-20220223123641.tar.gz
>> >> > > If you grep for 'mulivm1' you will see for example:
>> >> > >
>> >> > > 2022-02-22 04:31:04,473-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10)
[36d8a122] Command 'HotPlugDiskVDSCommand(HostName = client1,
HotPlugDiskVDSParameters:{hostId='fc5c2860-36b1-4213-843f-10ca7b35556c',
vmId='e13f73a0-8e20-4ec3-837f-aeacc082c7aa',
diskId='d1e1286b-38cc-4d56-9d4e-f331ffbe830f', addressMap='[bus=0,
controller=0, unit=2, type=drive, target=0]'})' execution failed:
VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = Failed to bind
/dev/mapper/ on to /var/run/libvirt/qemu/21-mulivm1.mapper.: Not a directory, code = 45
>> >> > >
>> >> > > Please let me know what other information will be useful and
I will prove.
>> >> > >
>> >> > > Cheers,
>> >> > > Muli
>> >> > >
>> >> > > On Wed, Feb 23, 2022 at 11:14 AM Benny Zlotnik
<bzlotnik(a)redhat.com> wrote:
>> >> > >>
>> >> > >> Hi,
>> >> > >>
>> >> > >> We haven't tested this, and we do not have any code
to handle nvme/tcp
>> >> > >> drivers, only iscsi and rbd. Given the path seen in the
logs
>> >> > >> '/dev/mapper', it looks like it might require
code changes to support
>> >> > >> this.
>> >> > >> Can you share cinderlib[1] and engine logs to see what is
returned by
>> >> > >> the driver? I may be able to estimate what would be
required (it's
>> >> > >> possible that it would be enough to just change the
handling of the
>> >> > >> path in the engine)
>> >> > >>
>> >> > >> [1] /var/log/ovirt-engine/cinderlib/cinderlib//log
>> >> > >>
>> >> > >> On Wed, Feb 23, 2022 at 10:54 AM
<muli(a)lightbitslabs.com> wrote:
>> >> > >> >
>> >> > >> > Hi everyone,
>> >> > >> >
>> >> > >> > We are trying to set up ovirt (4.3.10 at the moment,
customer preference) to use Lightbits (
https://www.lightbitslabs.com) storage via our
openstack cinder driver with cinderlib. The cinderlib and cinder driver bits are working
fine but when ovirt tries to attach the device to a VM we get the following error:
>> >> > >> >
>> >> > >> > libvirt: error : cannot create file
'/var/run/libvirt/qemu/18-mulivm1.dev/mapper/': Is a directory
>> >> > >> >
>> >> > >> > We get the same error regardless of whether I try to
run the VM or try to attach the device while it is running. The error appears to come from
vdsm which passes /dev/mapper as the prefered device?
>> >> > >> >
>> >> > >> > 2022-02-22 09:50:11,848-0500 INFO (vm/3ae7dcf4)
[vdsm.api] FINISH appropriateDevice return={'path': '/dev/mapper/',
'truesize': '53687091200', 'apparentsize': '53687091200'}
from=internal, task_id=77f40c4e-733d-4d82-b418-aaeb6b912d39 (api:54)
>> >> > >> > 2022-02-22 09:50:11,849-0500 INFO (vm/3ae7dcf4)
[vds] prepared volume path: /dev/mapper/ (clientIF:510)
>> >> > >> >
>> >> > >> > Suggestions for how to debug this further? Is this a
known issue? Did anyone get nvme/tcp storage working with ovirt and/or vdsm?
>> >> > >> >
>> >> > >> > Thanks,
>> >> > >> > Muli
>> >> > >> >
>> >> > >> > _______________________________________________
>> >> > >> > Users mailing list -- users(a)ovirt.org
>> >> > >> > To unsubscribe send an email to
users-leave(a)ovirt.org
>> >> > >> > Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>> >> > >> > oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>> >> > >> > List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I3PAG5HMBHU...
>> >> > >>
>> >> > >
>> >> > > Lightbits Labs
>> >> > > Lead the cloud-native data center transformation by
delivering scalable and efficient software defined storage that is easy to consume.
>> >> > >
>> >> > > This message is sent in confidence for the addressee only.
It may contain legally privileged information. The contents are not to be disclosed to
anyone other than the addressee. Unauthorized recipients are requested to preserve this
confidentiality, advise the sender immediately of any error in transmission and delete the
email from their systems.
>> >> > >
>> >> > >
>> >> > _______________________________________________
>> >> > Users mailing list -- users(a)ovirt.org
>> >> > To unsubscribe send an email to users-leave(a)ovirt.org
>> >> > Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>> >> > oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>> >> > List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DKFOCYQA6E4...
>> >>
>> >
>> > Lightbits Labs
>> > Lead the cloud-native data center transformation by delivering scalable and
efficient software defined storage that is easy to consume.
>> >
>> > This message is sent in confidence for the addressee only. It may contain
legally privileged information. The contents are not to be disclosed to anyone other than
the addressee. Unauthorized recipients are requested to preserve this confidentiality,
advise the sender immediately of any error in transmission and delete the email from their
systems.
>> >
>> >
>>
>
> Lightbits Labs
> Lead the cloud-native data center transformation by delivering scalable and
efficient software defined storage that is easy to consume.
>
> This message is sent in confidence for the addressee only. It may contain legally
privileged information. The contents are not to be disclosed to anyone other than the
addressee. Unauthorized recipients are requested to preserve this confidentiality, advise
the sender immediately of any error in transmission and delete the email from their
systems.
>
>