Re: [ovirt-users] Re: Removal of dpdk
by Dominik Holler
Hi Florian,
thanks for your thoughts!
On Tue, Nov 3, 2020 at 3:21 PM Florian Schmid via Users <users(a)ovirt.org>
wrote:
> Hi Ales,
>
> what do you mean with "not maintained for a long time"?
>
The oVirt integration of dpdk was not maintained.
> DPDK is heavily developed and make the linux network extremely fast.
>
> I don't think, that SR-IOV can replace it,
>
The removal of dpdk is about removing the dpdk support from oVirt hosts
only.
We wonder if there is someone using dpdk to attach oVirt VMs to physical
NICs.
We are aware that many users use SR-IOV, especially for scenarios of
enabling a high count of Ethernet frames for VMs or requiring a low latency,
but we are not aware of users using dpdk to connect the oVirt VMs to the
physical NICs of the host.
> because packets must be still processed by the kernel, which is really
> slow and CPU demanding.
>
In SR-IOV the packets might be processed by the guest kernel, not but the
host kernel.
oVirt is focused on the host kernel, while the guest OS is managed by the
user of oVirt.
Did this explanation address your concerns?
BR Florian
>
> ------------------------------
> *Von: *"Ales Musil" <amusil(a)redhat.com>
> *An: *"Nir Soffer" <nsoffer(a)redhat.com>
> *CC: *"users" <users(a)ovirt.org>, "devel" <devel(a)ovirt.org>
> *Gesendet: *Dienstag, 3. November 2020 13:56:12
> *Betreff: *[ovirt-users] Re: Removal of dpdk
>
>
>
> On Tue, Nov 3, 2020 at 1:52 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
>> On Tue, Nov 3, 2020 at 1:07 PM Ales Musil <amusil(a)redhat.com> wrote:
>>
>>> Hello,
>>> we have decided to remove dpdk in the upcoming version of oVirt namely
>>> 4.4.4. Let us know if there are any concerns about this.
>>>
>>
>> Can you give more info why we want to remove this feature, and what is
>> the replacement for existing users?
>>
>> Nir
>>
>
> Sure,
> the feature was only experimental and not maintained for a long time. The
> replacement is to use SR-IOV
> which is supported by oVirt.
>
> Thanks,
> Ales
>
>
> --
>
> Ales Musil
>
> Software Engineer - RHV Network
>
> Red Hat EMEA <https://www.redhat.com>
>
> amusil(a)redhat.com IM: amusil
> <https://red.ht/sig>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3FHIRQKEEKL...
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZCAYTHVZPOZ...
>
>
4 years, 2 months
Libvirt driver iothread property for virtio-scsi disks
by Nir Soffer
The docs[1] say:
- The optional iothread attribute assigns the disk to an IOThread as defined by
the range for the domain iothreads value. Multiple disks may be assigned to
the same IOThread and are numbered from 1 to the domain iothreads value.
Available for a disk device target configured to use "virtio" bus and "pci"
or "ccw" address types. Since 1.2.8 (QEMU 2.1)
Does it mean that virtio-scsi disks do not use iothreads?
I'm experiencing a horrible performance using nested vms (up to 2 levels of
nesting) when accessing NFS storage running on one of the VMs. The NFS
server is using scsi disk.
My theory is:
- Writing to NFS server is very slow (too much nesting, slow disk)
- Not using iothreads (because we don't use virtio?)
- Guest CPU is blocked by slow I/O
Does this make sense?
[1] https://libvirt.org/formatdomain.html#hard-drives-floppy-disks-cdroms
Nir
4 years, 2 months
Re: [ovirt-users] Re: Removal of dpdk
by Ales Musil
On Tue, Nov 3, 2020 at 3:17 PM Florian Schmid <fschmid(a)ubimet.com> wrote:
> Hi Ales,
>
> what do you mean with "not maintained for a long time"?
> DPDK is heavily developed and make the linux network extremely fast.
>
> I don't think, that SR-IOV can replace it, because packets must be still
> processed by the kernel, which is really slow and CPU demanding.
>
> BR Florian
>
> ------------------------------
> *Von: *"Ales Musil" <amusil(a)redhat.com>
> *An: *"Nir Soffer" <nsoffer(a)redhat.com>
> *CC: *"users" <users(a)ovirt.org>, "devel" <devel(a)ovirt.org>
> *Gesendet: *Dienstag, 3. November 2020 13:56:12
> *Betreff: *[ovirt-users] Re: Removal of dpdk
>
>
>
The dpdk inside oVirt is not maintained, of course that dpdk as
technology/project is still developed.
>
> On Tue, Nov 3, 2020 at 1:52 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
>> On Tue, Nov 3, 2020 at 1:07 PM Ales Musil <amusil(a)redhat.com> wrote:
>>
>>> Hello,
>>> we have decided to remove dpdk in the upcoming version of oVirt namely
>>> 4.4.4. Let us know if there are any concerns about this.
>>>
>>
>> Can you give more info why we want to remove this feature, and what is
>> the replacement for existing users?
>>
>> Nir
>>
>
> Sure,
> the feature was only experimental and not maintained for a long time. The
> replacement is to use SR-IOV
> which is supported by oVirt.
>
> Thanks,
> Ales
>
>
> --
>
> Ales Musil
>
> Software Engineer - RHV Network
>
> Red Hat EMEA <https://www.redhat.com>
>
> amusil(a)redhat.com IM: amusil
> <https://red.ht/sig>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3FHIRQKEEKL...
>
--
Ales Musil
Software Engineer - RHV Network
Red Hat EMEA <https://www.redhat.com>
amusil(a)redhat.com IM: amusil
<https://red.ht/sig>
4 years, 2 months
Removal of dpdk
by Ales Musil
Hello,
we have decided to remove dpdk in the upcoming version of oVirt namely
4.4.4. Let us know if there are any concerns about this.
Thank you.
Regards,
Ales Musil
--
Ales Musil
Software Engineer - RHV Network
Red Hat EMEA <https://www.redhat.com>
amusil(a)redhat.com IM: amusil
<https://red.ht/sig>
4 years, 2 months
Re: [virt-devel] Timeout failures in the OST
by Nir Soffer
On Mon, Nov 2, 2020 at 2:27 PM Benny Zlotnik <bzlotnik(a)redhat.com> wrote:
Issues like this belong to oVirt devel mailing list.
> looks like live merge failed[1]:
> 2020-11-01 10:31:49,903+0100 ERROR (periodic/0) [virt.vm] (vmId='fcabfd2e-2937-4419-9b25-78fdd2b9c7c2') Unable to get watermarks for drive vdb: invalid argument: invalid path /rhev/data-center/mnt/blockSD/97b6175e-b6a9-419b-bd54-7c1e38c1bf71/images/fbb11a06-b8ef-4078-9530-978e7ca8ea0b/a911ad89-e461-4db4-88bf-a5d6590608b5 not assigned to domain (vm:1213)
This may be a bug in vdsm live merge flow, tring to monitor a volume
after the volume
was already removed, or it may be libvirt/qemu bug.
> ...
> 2020-11-01 10:31:53,138+0100 ERROR (jsonrpc/1) [virt.vm] (vmId='fcabfd2e-2937-4419-9b25-78fdd2b9c7c2') merge: libvirt does not support volume chain monitoring. Unable to perform live merge. drive: vdb, alias: ua-fbb11a06-b8ef-4078-9530-978e7ca8ea0b, chains: {} (vm:5411)
>
> libvirt logs report:
> ...
> 2020-11-01 09:30:50.021+0000: 40137: error : virProcessRunInFork:1161 : internal error: child reported (status=125):
> 2020-11-01 09:30:50.025+0000: 40137: error : virProcessRunInFork:1161 : internal error: child reported (status=125): internal error: child reported (status=125):
> 2020-11-01 09:30:50.025+0000: 40137: warning : qemuDomainSnapshotDiskUpdateSource:15582 : Unable to move disk metadata on vm vm0
> 2020-11-01 09:31:45.539+0000: 40134: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'blockdev-del': Node libvirt-6-format is in use
> 2020-11-01 09:31:45.539+0000: 40134: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'blockdev-del': Block device libvirt-6-storage is in use
> 2020-11-01 09:31:45.540+0000: 40134: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'blockdev-del': Node 'libvirt-7-format' is busy: node is used as backing hd of 'libvirt-6-format'
> 2020-11-01 09:31:45.541+0000: 40134: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'blockdev-del': Block device libvirt-7-storage is in use
> 2020-11-01 09:31:45.900+0000: 40133: error : qemuDomainGetBlockInfo:12272 : invalid argument: invalid path /rhev/data-center/mnt/blockSD/97b6175e-b6a9-419b-bd54-7c1e38c1bf71/images/fbb11a06-b8ef-4078-9530-978e7ca8ea0b/a911ad89-e461-4db4-88bf-a5d6590608b5 not assigned to domain
These smell like libvirt/qemu bug.
Is this reproducible with RHEL 8.3?
> Looks like the issue previously discussed in "[rhev-devel] Live storage migration instability in OST" two months ago has resurfaced
>
>
>
> Another issue seems to be the removal of the source disk:
> 2020-11-01 10:31:48,056+0100 ERROR (tasks/3) [storage.StorageDomainManifest] removed image dir: /rhev/data-center/mnt/192.168.202.2:_exports_nfs_share1/3ca0e492-45f2-4383-b149-439043408bce/images/_remove_me_fbb11a06-b8ef-4078-9530-978e7ca8ea0b can't be removed (fileSD:258)
> 2020-11-01 10:31:48,056+0100 ERROR (tasks/3) [storage.TaskManager.Task] (Task='70db80c2-076a-4ba1-a65d-821e6b5fe52c') Unexpected error (task:880)
> Traceback (most recent call last):
> File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 251, in purgeImage
> self.oop.os.rmdir(toDelDir)
> File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line 238, in rmdir
> self._iop.rmdir(path)
> File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 550, in rmdir
> return self._sendCommand("rmdir", {"path": path}, self.timeout)
> File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479, in _sendCommand
> raise OSError(errcode, errstr)
> OSError: [Errno 39] Directory not empty
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
> return fn(*args, **kargs)
> File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run
> return self.cmd(*self.argslist, **self.argsdict)
> File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
> return method(self, *args, **kwargs)
> File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1947, in purgeImage
> domain.purgeImage(sdUUID, imgUUID, volsByImg, discard)
> File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 855, in purgeImage
> self._manifest.purgeImage(sdUUID, imgUUID, volsImgs, discard)
> File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 259, in purgeImage
> raise se.ImageDeleteError("%s %s" % (imgUUID, str(e)))
> vdsm.storage.exception.ImageDeleteError: Could not remove all image's volumes: ('fbb11a06-b8ef-4078-9530-978e7ca8ea0b [Errno 39] Directory not empty',)
>
> But it's unclear what the leftover is
Maybe be leftover from previous failed LSM. I think we need a better
error message
here, it should list the files in the non-empty directory.
> [1] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/127...
>
>
> On Mon, Nov 2, 2020 at 1:44 PM Steven Rosenberg <srosenbe(a)redhat.com> wrote:
>>
>> Dear Benny,
>>
>> Thank you for your response.
>>
>> Here is the timeout engine log from one of the ps 45 failures [1].
>>
>> It seems like this timeout is related to the engine failing and the ost scripts are not designed to detect the fail, thus timing out:
>>
>> 2020-11-01 10:31:55,175+01 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-67) [live_storage_migration] Command id: '60b3f7fc-93db-48b2-82a1-8a93c47e18e1 failed child command status for step 'MERGE_STATUS'
>>
>>
>>
>>
>> [1] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/127...
>>
>> With Best Regards.
>>
>> Steven.
>>
>>
>>
>> On Mon, Nov 2, 2020 at 12:51 PM Benny Zlotnik <bzlotnik(a)redhat.com> wrote:
>>>
>>> Can you link to the relevant engine/vdsm logs?
>>> The timeout in the tests indicates that the desired state wasn't
>>> reached so test logs don't provide the information about what exactly
>>> happened
>>>
>>> On Sun, Nov 1, 2020 at 5:10 PM Steven Rosenberg <srosenbe(a)redhat.com> wrote:
>>> >
>>> > Dear virt-devel,
>>> >
>>> > We are currently experiencing many timeout failures in various patch sets for a gerrit issue 11395 [1].
>>> >
>>> > The timeouts occur intermittently and seem to be unrelated to the changes which are only in the 004 module [2] and should have only affected VM1 / Disk1.
>>> >
>>> > We could use some advice on addressing these issues as well as a review of the patch to ensure we can move this patch forward. The patch sets and relevant timeouts are as follows:
>>> >
>>> > PS 40:
>>> >
>>> > test_live_storage_migration – test 004 [3]
>>> >
>>> > PS 41:
>>> >
>>> > on test_verify_engine_backup – test 002 [4]
>>> >
>>> > PS 43:
>>> >
>>> > on test_virtual_machines - test 100 [5]
>>> >
>>> > PS 45:
>>> >
>>> > on test_live_storage_migration – 004 [6]
>>> >
>>> >
>>> >
>>> >
>>> > [1] https://gerrit.ovirt.org/#/c/111395/
>>> > [2] basic-suite-master/test-scenarios/004_basic_sanity.py
>>> > [3] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/126...
>>> > [4] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/126...
>>> > [5] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/127...
>>> > [6] https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/127...
>>> >
>>> > With Best Regards.
>>> >
>>> > Steven
>>> >
>>> >
>>>
4 years, 2 months