December 2020 - Users - oVirt List Archives

Upgrade from 4.4.3 to 4.4.4 (oVirt Node) - vdsmd.service/start failed with result 'dependency'
by Marco Fais 04 Feb '21

04 Feb '21

Hi all, I have just upgraded one of my oVirt nodes from 4.4.3 to 4.4.4. After the reboot, the 4.4.4 image is correctly loaded but vdsmd is not starting due to this error: vdsmd.service: Job vdsmd.service/start failed with result 'dependency'. Looks like it has a dependency on mom-vdsm, and this as well has a dependency issue: mom-vdsm.service: Job mom-vdsm.service/start failed with result 'dependency'. After some investigation looks like mom-vdsm has a dependency on ovsdb-server, and this is the unit creating the problem: ovs-delete-transient-ports.service: Starting requested but asserts failed. Assertion failed for Open vSwitch Delete Transient Ports Failed to start Open vSwitch Database Unit. Details below: -- Unit ovsdb-server.service has begun starting up. Dec 24 12:21:57 LAB-CNVirt-H04.ngv.eircom.net chown[13658]: /usr/bin/chown: cannot access '/var/run/openvswitch': No such file or directory Dec 24 12:21:57 LAB-CNVirt-H04.ngv.eircom.net ovs-ctl[13667]: /etc/openvswitch/conf.db does not exist ... (warning). Dec 24 12:21:57 LAB-CNVirt-H04.ngv.eircom.net ovsdb-tool[13714]: ovs|00001|lockfile|WARN|/etc/openvswitch/.conf.db.~lock~: failed to open lock file: Permission denied Dec 24 12:21:57 LAB-CNVirt-H04.ngv.eircom.net ovs-ctl[13667]: Creating empty database /etc/openvswitch/conf.db ovsdb-tool: I/O error: /etc/openvswitch/conf.db: failed to lock lockfile (Resource temporarily unavailable) Dec 24 12:21:57 LAB-CNVirt-H04.ngv.eircom.net ovsdb-tool[13714]: ovs|00002|lockfile|WARN|/etc/openvswitch/.conf.db.~lock~: failed to lock file: Resource temporarily unavailable Dec 24 12:21:57 LAB-CNVirt-H04.ngv.eircom.net ovs-ctl[13667]: [FAILED] Dec 24 12:21:57 LAB-CNVirt-H04.ngv.eircom.net systemd[1]: ovsdb-server.service: Control process exited, code=exited status=1 Dec 24 12:21:57 LAB-CNVirt-H04.ngv.eircom.net systemd[1]: ovsdb-server.service: Failed with result 'exit-code'. -- Subject: Unit failed Any suggestions? Thanks, Marco

4 6

Gluster volume slower then raid1 zpool speed
by Harry O 28 Jan '21

28 Jan '21

Hi, Can anyone help me with the performance on my 3 node gluster on zfs (it is setup with one arbiter) The performance on the single vm I have on it (with engine) is 50% worse then a single bare metal disk, on the writes. I have enabled "Optimize for virt store" I run 1Gbps 1500MTU network, could this be the write performance killer? Is this to be expected from a 2xHDD zfs raid one on each node, with 3xNode arbiter setup? Maybe I should move to raid 5 or 6? Maybe I should add SSD cache to raid1 zfs zpools? What are your thoughts? What to do for optimize this setup? I would like to run zfs with gluster and I can deal with a little performance loss, but not that much.

5 13

centos 7.9
by suporte＠logicworks.pt 28 Jan '21

28 Jan '21

Hello, Can I run ovirt 4.3.10.4-1.el7 over Centos 7.9 ? Thanks -- Jose Ferradeira http://www.logicworks.pt

5 4

New failure Gluster deploy: Set granual-entry-heal on --> Bricks down
by Charles Lam 27 Jan '21

27 Jan '21

Dear friends, Thanks to Donald and Strahil, my earlier Gluster deploy issue was resolved by disabling multipath on the nvme drives. The Gluster deployment is now failing on the three node hyperconverged oVirt v4.3.3 deployment at: TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 with: "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed." Specifically: TASK [gluster.features/roles/gluster_hci : Set granual-entry-heal on] ********** task path: /etc/ansible/roles/gluster.features/roles/gluster_hci/tasks/hci_volumes.yml:67 failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "engine", "granular-entry-heal", "enable"], "delta": "0:00:10.112451", "end": "2020-12-18 19:50:22.818741", "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "non-zero return code", "rc": 107, "start": "2020-12-18 19:50:12.706290", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "data", "granular-entry-heal", "enable"], "delta": "0:00:10.110165", "end": "2020-12-18 19:50:38.260277", "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "non-zero return code", "rc": 107, "start": "2020-12-18 19:50:28.150112", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} failed: [fmov1n1.sn.dtcorp.com] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": true, "cmd": ["gluster", "volume", "heal", "vmstore", "granular-entry-heal", "enable"], "delta": "0:00:10.113203", "end": "2020-12-18 19:50:53.767864", "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "non-zero return code", "rc": 107, "start": "2020-12-18 19:50:43.654661", "stderr": "", "stderr_lines": [], "stdout": "One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals\nVolume heal failed.", "stdout_lines": ["One or more bricks could be down. Please execute the command again after bringing all bricks online and finishing any pending heals", "Volume heal failed."]} Any suggestions regarding troubleshooting, insight or recommendations for reading are greatly appreciated. I apologize for all the email and am only creating this as a separate thread as it is a new, presumably unrelated issue. I welcome any recommendations if I can improve my forum etiquette. Respectfully, Charles

4 19

OVN and change of mgmt network
by Gianluca Cecchi 26 Jan '21

26 Jan '21

Hello, I previously had OVN running on engine (as OVN provider with northd and northbound and southbound DBs) and hosts (with OVN controller). After changing mgmt ip of hosts (engine has retained instead the same ip), I executed again on them the command: vdsm-tool ovn-config <ip_of_engine> <nel_local_ip_of_host> Now I think I have to clean up some things, eg: 1) On engine where I get these lines below systemctl status ovn-northd.service -l . . . Sep 29 14:41:42 ovmgr1 ovsdb-server[940]: ovs|00005|reconnect|ERR|tcp: 10.4.167.40:37272: no response to inactivity probe after 5 seconds, disconnecting Oct 03 11:52:00 ovmgr1 ovsdb-server[940]: ovs|00006|reconnect|ERR|tcp: 10.4.167.41:52078: no response to inactivity probe after 5 seconds, disconnecting The two IPs are the old ones of two hosts It seems that a restart of the services has fixed... Can anyone confirm if I have to do anything else? 2) On hosts (there are 3 hosts with OVN on ip 10.4.192.32/33/34) where I currently have this output [root@ov301 ~]# ovs-vsctl show 3a38c5bb-0abf-493d-a2e6-345af8aedfe3 Bridge br-int fail_mode: secure Port "ovn-1dce5b-0" Interface "ovn-1dce5b-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.192.32"} Port "ovn-ddecf0-0" Interface "ovn-ddecf0-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.192.33"} Port "ovn-fd413b-0" Interface "ovn-fd413b-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.168.74"} Port br-int Interface br-int type: internal ovs_version: "2.7.2" [root@ov301 ~]# The IPs of kind 10.4.192.x are ok. But there is a left-over of an old host I initially used for tests, corresponding to 10.4.168.74, that now doesn't exist anymore How can I clean records for 1) and 2)? Thanks, Gianluca

3 4

CentOS Stream support
by Michal Skrivanek 25 Jan '21

25 Jan '21

Hi all, we would like to ask about interest in community about oVirt moving to CentOS Stream. There were some requests before but it’s hard to see how many people would really like to see that. With CentOS releases lagging behind RHEL for months it’s interesting to consider moving to CentOS Stream as it is much more up to date and allows us to fix bugs faster, with less workarounds and overhead for maintaining old code. E.g. our current integration tests do not really pass on CentOS 8.1 and we can’t really do much about that other than wait for more up to date packages. It would also bring us closer to make oVirt run smoothly on RHEL as that is also much closer to Stream than it is to outdated CentOS. So..would you like us to support CentOS Stream? We don’t really have capacity to run 3 different platforms, would you still want oVirt to support CentOS Stream if it means “less support” for regular CentOS? There are some concerns about Stream being a bit less stable, do you share those concerns? Thank you for your comments, michal

5 8

CentOS 8 is dead
by Strahil Nikolov 23 Jan '21

23 Jan '21

Hello All, I'm really worried about the following news: https://blog.centos.org/2020/12/future-is-centos-stream/ Did anyone tried to port oVirt to SLES/openSUSE or any Debian-based distro ? Best Regards, Strahil Nikolov

21 44

[ANN] oVirt 4.4.4 is now generally available
by Sandro Bonazzola 22 Jan '21

22 Jan '21

oVirt 4.4.4 is now generally available The oVirt project is excited to announce the general availability of oVirt 4.4.4 , as of December 21st, 2020. This release unleashes an altogether more powerful and flexible open source virtualization solution that encompasses hundreds of individual changes and a wide range of enhancements across the engine, storage, network, user interface, and analytics, as compared to oVirt 4.3. Important notes before you install / upgrade Please note that oVirt 4.4 only supports clusters and data centers with compatibility version 4.2 and above. If clusters or data centers are running with an older compatibility version, you need to upgrade them to at least 4.2 (4.3 is recommended). Please note that in RHEL 8 / CentOS 8 several devices that worked on EL7 are no longer supported. For example, the megaraid_sas driver is removed. If you use Enterprise Linux 8 hosts you can try to provide the necessary drivers for the deprecated hardware using the DUD method (See the users’ mailing list thread on this at https://lists.ovirt.org/archives/list/users@ovirt.org/thread/NDSVUZSESOXEFJ… ) Documentation - If you want to try oVirt as quickly as possible, follow the instructions on the Download <https://ovirt.org/download/> page. - For complete installation, administration, and usage instructions, see the oVirt Documentation <https://ovirt.org/documentation/>. - For upgrading from a previous version, see the oVirt Upgrade Guide <https://ovirt.org/documentation/upgrade_guide/>. - For a general overview of oVirt, see About oVirt <https://ovirt.org/community/about.html>. What’s new in oVirt 4.4.4 Release? This update is the fourth in a series of stabilization updates to the 4.4 series. This release is available now on x86_64 architecture for: - Red Hat Enterprise Linux 8.3 - CentOS Linux (or similar) 8.3 - CentOS Stream (tech preview) This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for: - Red Hat Enterprise Linux 8.3 - CentOS Linux (or similar) 8.3 - oVirt Node (based on CentOS Linux 8.3) - CentOS Stream (tech preview) oVirt Node and Appliance have been updated, including: - oVirt 4.4.4: https://www.ovirt.org/release/4.4.4/ - Ansible 2.9.16: https://github.com/ansible/ansible/blob/stable-2.9/changelogs/CHANGELOG-v2.… - CentOS Linux 8 (2011): https://lists.centos.org/pipermail/centos-announce/2020-December/048207.html - Advanced Virtualization 8.3 See the release notes [1] for installation instructions and a list of new features and bugs fixed. Notes: - oVirt Appliance is already available for CentOS Linux 8 - oVirt Node NG is already available for CentOS Linux 8 Additional resources: - Read more about the oVirt 4.4.4 release highlights: https://www.ovirt.org/release/4.4.4/ - Get more oVirt project updates on Twitter: https://twitter.com/ovirt - Check out the latest project news on the oVirt blog: https://blogs.ovirt.org/ [1] https://www.ovirt.org/release/4.4.4/ [2] https://resources.ovirt.org/pub/ovirt-4.4/iso/ -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo(a)redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*

14 32

encrypted GENEVE traffic
by Pavel Nakonechnyi 18 Jan '21

18 Jan '21

Dear oVirt Community, From my understanding oVirt does not support Open vSwitch IPSEC tunneling for GENEVE traffic (which is described on pages http://docs.openvswitch.org/en/latest/howto/ipsec/ and http://docs.openvswitch.org/en/latest/tutorials/ipsec/) Are there plans to introduce such support? (or explicitly not to..) Is it possible to somehow manually configure such tunneling for existing virtual networks? (even in a limited way) Alternatively, is it possible to deploy oVirt on top of the tunneled (i.e. via VXLAN, IPSec) interfaces? This will allow to encrypt all management traffic. Such requirement arises when using oVirt deployment on third-party premises with untrusted network. Thank in advance for any clarifications. :) -- WBR, Pavel +32478910884

2 7

Re: Constantly XFS in memory corruption inside VMs
by Strahil Nikolov 15 Jan '21

15 Jan '21

Damn... You are using EFI boot. Does this happen only to EFI machines ? Did you notice if only EL 8 is affected ? Best Regards, Strahil Nikolov В неделя, 29 ноември 2020 г., 19:36:09 Гринуич+2, Vinícius Ferrão <ferrao(a)versatushpc.com.br> написа: Yes! I have a live VM right now that will de dead on a reboot: [root@kontainerscomk ~]# cat /etc/*release NAME="Red Hat Enterprise Linux" VERSION="8.3 (Ootpa)" ID="rhel" ID_LIKE="fedora" VERSION_ID="8.3" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:GA" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8" REDHAT_BUGZILLA_PRODUCT_VERSION=8.3 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8.3" Red Hat Enterprise Linux release 8.3 (Ootpa) Red Hat Enterprise Linux release 8.3 (Ootpa) [root@kontainerscomk ~]# sysctl -a | grep dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 30 vm.dirty_writeback_centisecs = 500 vm.dirtytime_expire_seconds = 43200 [root@kontainerscomk ~]# xfs_db -r /dev/dm-0 xfs_db: /dev/dm-0 is not a valid XFS filesystem (unexpected SB magic number 0xa82a0000) Use -F to force a read attempt. [root@kontainerscomk ~]# xfs_db -r /dev/dm-0 -F xfs_db: /dev/dm-0 is not a valid XFS filesystem (unexpected SB magic number 0xa82a0000) xfs_db: size check failed xfs_db: V1 inodes unsupported. Please try an older xfsprogs. [root@kontainerscomk ~]# cat /etc/fstab # # /etc/fstab # Created by anaconda on Thu Nov 19 22:40:39 2020 # # Accessible filesystems, by reference, are maintained under '/dev/disk/'. # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info. # # After editing this file, run 'systemctl daemon-reload' to update systemd # units generated from this file. # /dev/mapper/rhel-root / xfs defaults 0 0 UUID=ad84d1ea-c9cc-4b22-8338-d1a6b2c7d27e /boot xfs defaults 0 0 UUID=4642-2FF6 /boot/efi vfat umask=0077,shortname=winnt 0 2 /dev/mapper/rhel-swap none swap defaults 0 0 Thanks, -----Original Message----- From: Strahil Nikolov <hunter86_bg(a)yahoo.com> Sent: Sunday, November 29, 2020 2:33 PM To: Vinícius Ferrão <ferrao(a)versatushpc.com.br> Cc: users <users(a)ovirt.org> Subject: Re: [ovirt-users] Re: Constantly XFS in memory corruption inside VMs Can you check the output on the VM that was affected: # cat /etc/*release # sysctl -a | grep dirty Best Regards, Strahil Nikolov В неделя, 29 ноември 2020 г., 19:07:48 Гринуич+2, Vinícius Ferrão via Users <users(a)ovirt.org> написа: Hi Strahil. I’m not using barrier options on mount. It’s the default settings from CentOS install. I have some additional findings, there’s a big number of discarded packages on the switch on the hypervisor interfaces. Discards are OK as far as I know, I hope TCP handles this and do the proper retransmissions, but I ask if this may be related or not. Our storage is over NFS. My general expertise is with iSCSI and I’ve never seen this kind of issue with iSCSI, not that I’m aware of. In other clusters, I’ve seen a high number of discards with iSCSI on XenServer 7.2 but there’s no corruption on the VMs there... Thanks, Sent from my iPhone > On 29 Nov 2020, at 04:00, Strahil Nikolov <hunter86_bg(a)yahoo.com> wrote: > > Are you using "nobarrier" mount options in the VM ? > > If yes, can you try to remove the "nobarrrier" option. > > > Best Regards, > Strahil Nikolov > > > > > > > В събота, 28 ноември 2020 г., 19:25:48 Гринуич+2, Vinícius Ferrão <ferrao(a)versatushpc.com.br> написа: > > > > > > Hi Strahil, > > I moved a running VM to other host, rebooted and no corruption was found. If there's any corruption it may be silent corruption... I've cases where the VM was new, just installed, run dnf -y update to get the updated packages, rebooted, and boom XFS corruption. So perhaps the motion process isn't the one to blame. > > But, in fact, I remember when moving a VM that it went down during the process and when I rebooted it was corrupted. But this may not seems related. It perhaps was already in a inconsistent state. > > Anyway, here's the mount options: > > Host1: > 192.168.10.14:/mnt/pool0/ovirt/vm on > /rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 > (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,noshar > ecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,l > ocal_lock=none,addr=192.168.10.14) > > Host2: > 192.168.10.14:/mnt/pool0/ovirt/vm on > /rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 > (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,noshar > ecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,l > ocal_lock=none,addr=192.168.10.14) > > The options are the default ones. I haven't changed anything when configuring this cluster. > > Thanks. > > > > -----Original Message----- > From: Strahil Nikolov <hunter86_bg(a)yahoo.com> > Sent: Saturday, November 28, 2020 1:54 PM > To: users <users(a)ovirt.org>; Vinícius Ferrão > <ferrao(a)versatushpc.com.br> > Subject: Re: [ovirt-users] Constantly XFS in memory corruption inside > VMs > > Can you try with a test vm, if this happens after a Virtual Machine migration ? > > What are your mount options for the storage domain ? > > Best Regards, > Strahil Nikolov > > > > > > > В събота, 28 ноември 2020 г., 18:25:15 Гринуич+2, Vinícius Ferrão via Users <users(a)ovirt.org> написа: > > > > > > > > > Hello, > > > > I’m trying to discover why an oVirt 4.4.3 Cluster with two hosts and NFS shared storage on TrueNAS 12.0 is constantly getting XFS corruption inside the VMs. > > > > For random reasons VM’s gets corrupted, sometimes halting it or just being silent corrupted and after a reboot the system is unable to boot due to “corruption of in-memory data detected”. Sometimes the corrupted data are “all zeroes”, sometimes there’s data there. In extreme cases the XFS superblock 0 get’s corrupted and the system cannot even detect a XFS partition anymore since the magic XFS key is corrupted on the first blocks of the virtual disk. > > > > This is happening for a month now. We had to rollback some backups, and I don’t trust anymore on the state of the VMs. > > > > Using xfs_db I can see that some VM’s have corrupted superblocks but the VM is up. One in specific, was with sb0 corrupted, so I knew when a reboot kicks in the machine will be gone, and that’s exactly what happened. > > > > Another day I was just installing a new CentOS 8 VM for random reasons, and after running dnf -y update and a reboot the VM was corrupted needing XFS repair. That was an extreme case. > > > > So, I’ve looked on the TrueNAS logs, and there’s apparently nothing wrong on the system. No errors logged on dmesg, nothing on /var/log/messages and no errors on the “zpools”, not even after scrub operations. On the switch, a Catalyst 2960X, we’ve been monitoring it and all it’s interfaces. There are no “up and down” and zero errors on all interfaces (we have a 4x Port LACP on the TrueNAS side and 2x Port LACP on each hosts), everything seems to be fine. The only metric that I was unable to get is “dropped packages”, but I’m don’t know if this can be an issue or not. > > > > Finally, on oVirt, I can’t find anything either. I looked on /var/log/messages and /var/log/sanlock.log but there’s nothing that I found suspicious. > > > > Is there’s anyone out there experiencing this? Our VM’s are mainly CentOS 7/8 with XFS, there’s 3 Windows VM’s that does not seems to be affected, everything else is affected. > > > > Thanks all. > > > > _______________________________________________ > Users mailing list -- users(a)ovirt.org > To unsubscribe send an email to users-leave(a)ovirt.org Privacy > Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/VLYSE7HC > FNWTWFZZTL2EJHV36OENHUGB/ _______________________________________________ Users mailing list -- users(a)ovirt.org To unsubscribe send an email to users-leave(a)ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CZ5E55LJMA7Y5…

3 8