From: Jason Wang <jasowang(a)redhat.com>
Sent: Wednesday, August 19, 2020 12:19 PM
On 2020/8/19 下午1:26, Parav Pandit wrote:
>
>> From: Jason Wang <jasowang(a)redhat.com>
>> Sent: Wednesday, August 19, 2020 8:16 AM
>
>> On 2020/8/18 下午5:32, Parav Pandit wrote:
>>> Hi Jason,
>>>
>>> From: Jason Wang <jasowang(a)redhat.com>
>>> Sent: Tuesday, August 18, 2020 2:32 PM
>>>
>>>
>>> On 2020/8/18 下午4:55, Daniel P. Berrangé wrote:
>>> On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote:
>>> On 2020/8/14 下午1:16, Yan Zhao wrote:
>>> On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote:
>>> On 2020/8/10 下午3:46, Yan Zhao wrote:
>>> driver is it handled by?
>>> It looks that the devlink is for network device specific, and in
>>> devlink.h, it says include/uapi/linux/devlink.h - Network physical
>>> device Netlink interface, Actually not, I think there used to have
>>> some discussion last year and the conclusion is to remove this
>>> comment.
>>>
>>> [...]
>>>
>>>> Yes, but it could be hard. E.g vDPA will chose to use devlink
>>>> (there's a long
>> debate on sysfs vs devlink). So if we go with sysfs, at least two
>> APIs needs to be supported ...
>>> We had internal discussion and proposal on this topic.
>>> I wanted Eli Cohen to be back from vacation on Wed 8/19, but since
>>> this is
>> active discussion right now, I will share the thoughts anyway.
>>> Here are the initial round of thoughts and proposal.
>>>
>>> User requirements:
>>> ---------------------------
>>> 1. User might want to create one or more vdpa devices per PCI PF/VF/SF.
>>> 2. User might want to create one or more vdpa devices of type
>>> net/blk or
>> other type.
>>> 3. User needs to look and dump at the health of the queues for debug
purpose.
>>> 4. During vdpa net device creation time, user may have to provide a
>>> MAC
>> address and/or VLAN.
>>> 5. User should be able to set/query some of the attributes for
>>> debug/compatibility check 6. When user wants to create vdpa device,
>>> it needs
>> to know which device supports creation.
>>> 7. User should be able to see the queue statistics of doorbells,
>>> wqes etc regardless of class type
>>
>> Note that wqes is probably not something common in all of the vendors.
> Yes. I virtq descriptors stats is better to monitor the virtqueues.
>
>>
>>> To address above requirements, there is a need of vendor agnostic
>>> tool, so
>> that user can create/config/delete vdpa device(s) regardless of the vendor.
>>> Hence,
>>> We should have a tool that lets user do it.
>>>
>>> Examples:
>>> -------------
>>> (a) List parent devices which supports creating vdpa devices.
>>> It also shows which class types supported by this parent device.
>>> In below command two parent devices support vdpa device creation.
>>> First is PCI VF whose bdf is 03.00:5.
>>> Second is PCI SF whose name is mlx5_sf.1
>>>
>>> $ vdpa list pd
>>
>> What did "pd" mean?
>>
> Parent device which support creation of one or more vdpa devices.
> In a system there can be multiple parent devices which may be support vdpa
creation.
> User should be able to know which devices support it, and when user creates a
vdpa device, it tells which parent device to use for creation as done in below
vdpa dev add example.
>>> pci/0000:03.00:5
>>> class_supports
>>> net vdpa
>>> virtbus/mlx5_sf.1
>>
>> So creating mlx5_sf.1 is the charge of devlink?
>>
> Yes.
> But here vdpa tool is working at the parent device identifier {bus+name}
instead of devlink identifier.
>
>
>>> class_supports
>>> net
>>>
>>> (b) Now add a vdpa device and show the device.
>>> $ vdpa dev add pci/0000:03.00:5 type net
>>
>> So if you want to create devices types other than vdpa on
>> pci/0000:03.00:5 it needs some synchronization with devlink?
> Please refer to FAQ-1, a new tool is not linked to devlink because vdpa will
evolve with time and devlink will fall short.
> So no, it doesn't need any synchronization with devlink.
> As long as parent device exist, user can create it.
> All synchronization will be within drivers/vdpa/vdpa.c This user
> interface is exposed via new netlink family by doing genl_register_family() with
new name "vdpa" in drivers/vdpa/vdpa.c.
Just to make sure I understand here.
Consider we had virtbus/mlx5_sf.1. Process A want to create a vDPA instance on
top of it but Process B want to create a IB instance. Then I think some
synchronization is needed at at least parent device level?
Likely but rdma device will be created either through
$ rdma link add command.
Or auto created by driver because there is only one without much configuration.
While vdpa device(s) for virtbus/mlx5_sf.1 will be created through vdpa subsystem.
And vdpa's synchronization will be contained within drivers/vdpa/vdpa.c
>
>>
>>> $ vdpa dev show
>>> vdpa0@pci/0000:03.00:5 type net state inactive maxqueues 8 curqueues
>>> 4
>>>
>>> (c) vdpa dev show features vdpa0
>>> iommu platform
>>> version 1
>>>
>>> (d) dump vdpa statistics
>>> $ vdpa dev stats show vdpa0
>>> kickdoorbells 10
>>> wqes 100
>>>
>>> (e) Now delete a vdpa device previously created.
>>> $ vdpa dev del vdpa0
>>>
>>> Design overview:
>>> -----------------------
>>> 1. Above example tool runs over netlink socket interface.
>>> 2. This enables users to return meaningful error strings in addition
>>> to code so
>> that user can be more informed.
>>> Often this is missing in ioctl()/configfs/sysfs interfaces.
>>> 3. This tool over netlink enables syscaller tests to be more usable
>>> like other
>> subsystems to keep kernel robust
>>> 4. This provides vendor agnostic view of all vdpa capable parent and
>>> vdpa
>> devices.
>>> 5. Each driver which supports vdpa device creation, registers the
>>> parent device
>> along with supported classes.
>>> FAQs:
>>> --------
>>> 1. Why not using devlink?
>>> Ans: Because as vdpa echo system grows, devlink will fall short of
>>> extending
>> vdpa specific params, attributes, stats.
>>
>>
>> This should be fine but it's still not clear to me the difference
>> between a vdpa netlink and a vdpa object in devlink.
>>
> The difference is a vdpa specific tool work at the parent device level.
> It is likely more appropriate to because it can self-contain everything needed
to create/delete devices, view/set features, stats.
> Trying to put that in devlink will fall short as devlink doesn’t have vdpa
definitions.
> Typically when a class/device subsystem grows, its own tool is wiser like
iproute2/ip, iproute2/tc, iproute2/rdma.
Ok, I see.
Thanks